I know, that is pretty confusing. What we have, however, is a distorted, all-over-the-place, inadequate, confusing, and too broad a definition of “data scientist”. I do not know where we went wrong, but we have the equivalent of “bridge scientist” if you happen to work on a bridge, whether you design it, build it, resurface it, and who knows, maybe just drive across it.
Call it what it is
I think life would be better in all things data if we just called things what they are, like calling mathematicians by their function: number theorist, math teacher, mathematical modeler, algebraist, etc. How about datatician? Will that cause confusion with statistician?
Mimic Mathematical Sciences?
Maybe “data science” is okay and the problem is “data scientist”. We have “mathematical sciences” (note the plural), but few mathematicians would call themselves a “mathematical scientist”, even though that sounds pretty cool. The problem is it doesn’t describe what one does in the field of mathematics. Algebraist, however, says that one works in the mathematical area of algebraic structures. The mathematical sciences might looks like this (I borrowed these from the National Science Foundation):
- Number Theory
- Applied Mathematics
- Computational Mathematics
- Geometric Analysis
- Mathematical Biology
This is not an all-inclusive list. Note that statistics falls under mathematical sciences at the NSF. And “real” statisticians know from the mathematical statistics courses why this might be the case.
A Data Science Taxonomy?
Perhaps under “data science”, we eliminate the term “data scientist” and call out titles by what we do:
- Data architect
- Data processor
- Data miner
- Data modeler
- Data analyst
- Data explorer
- Database administrator
- Database developer
- Algorithm developer
Of course, the guru of data analysis and data modeling is the “statistician”. But perhaps statistician can be divided in like manner, as the mathematician: regression analyst, exploratory data analyst, etc.
Fun with Machine Learning
What about machine learning?
- Machine learning architect
- Neural networker
- Classification tree farmer
- Random forest ranger
- (now I am just being silly)
Somehow, “data scientist” does not cut it for me, and there are many in the industry and the field that have similar issues. I have read and been told that many organizations will no longer hire data scientists, partly because there is no standard for what a data scientist is or does. I do not think we will reach a standard, so let’s label a person who works with data by what they actually do. Then hire them accordingly.
Jeffrey Strickland, Ph.D.
Jeffrey Strickland, Ph.D., is the Author of Predictive Analytics Using R and a Senior Analytics Scientist with Clarity Solution Group. He has performed predictive modeling, simulation and analysis for the Department of Defense, NASA, the Missile Defense Agency, and the Financial and Insurance Industries for over 20 years. Jeff is a Certified Modeling and Simulation professional (CMSP) and an Associate Systems Engineering Professional (ASEP). He has published nearly 200 blogs on LinkedIn, is also a frequently invited guest speaker and the author of 20 books including:
- Operations Research using Open-Source Tools
- Discrete Event simulation using ExtendSim
- Crime Analysis and Mapping
- Missile Flight Simulation
- Mathematical Modeling of Warfare and Combat Phenomenon
- Predictive Modeling and Analytics
- Using Math to Defeat the Enemy
- Verification and Validation for Modeling and Simulation
- Simulation Conceptual Modeling
- System Engineering Process and Practices