Here is an actual LinkedIn sponsor ad:
“In just 12 weeks, you’ll learn the tools, techniques, and fundamental concepts you need to know to make an impact as a data scientist. During the course of the program, you’ll work through messy, real-world data sets to gain experience across the data science stack: data munging, exploration, modeling, validation, visualization, and communication.”
My Blue Heaven
I love the Steve Martin movie, “My Blue Heaven”. Steve play a New York City mafia lieutenant, Vincent ‘Vinnie’ Antonelli, in the Federal Witness Protection Program. The agent in charge is Barney Coopersmith, played by Rick Moranis. At any rate, Vinnie always has some scheme going. A line he uses frequently is “I know a guy…” In a scene with District Attorney Hannah Stubbs, played by Joan Cusack, Vinnie says, “I know a guy that could make you a priest for ten dollars.”
Quick Steps to becoming a Data Scientist
As I sit here writing this piece, I hear Vinnie saying, “I know a guy who can make you a data scientist for ten dollars.” You can skip college, avoid those statistics and mathematical statistics (that’s the theory behind statistics) courses and give Vinnie ten dollars (or 9.01 Euro or 636.15 Rupee or 157.15 Peso).
No Need for Statistics
There are some who say you do not need to know statistics in order to be a data scientist. Some say that all you need is a working knowledge of the tools (Python, R, SAS, etc.). Others say that the data speaks for itself—imagine that, talking data. Some say it is all about machine learning.
Machine learning algorithms are good and useful when traditional methods do not work. Translate that as, “Use statistical methods if they work, otherwise use machine learning algorithms.” You should never start solving a problem with machine learning algorithms. They are a next step.
You should never program, at least in data science, something that you do not understand. I can write code all day long without understanding the impact of the Central Limit Theorem, but at the end of the day, who knows what is or is not in my code? Oh yeah, the data will speak to me!
These days, everyone wants to go through the drive-through. We have drive-through fast food, fast pharmacies, fast coffee, and so on. Soon McDonalds will be offering a Certificate of Data Science. Even better, and this is free, by the power invested in me, I declare thee a data scientist. Too bad I have no power. Just send me 5337.00 Colón, or go to McDonalds.
 Data munging or data wrangling is loosely the process of manually converting or mapping data from one “raw” form into another format that allows for more convenient consumption of the data with the help of semi-automated tools.
Jeffrey Strickland, Ph.D.
Jeffrey Strickland, Ph.D., is the Author of Predictive Analytics Using R and a Senior Analytics Scientist with Clarity Solution Group. He has performed predictive modeling, simulation and analysis for the Department of Defense, NASA, the Missile Defense Agency, and the Financial and Insurance Industries for over 20 years. Jeff is a Certified Modeling and Simulation professional (CMSP) and an Associate Systems Engineering Professional (ASEP). He has published nearly 200 blogs on LinkedIn, is also a frequently invited guest speaker and the author of 20 books including:
- Operations Research using Open-Source Tools
- Discrete Event simulation using ExtendSim
- Crime Analysis and Mapping
- Missile Flight Simulation
- Mathematical Modeling of Warfare and Combat Phenomenon
- Predictive Modeling and Analytics
- Using Math to Defeat the Enemy
- Verification and Validation for Modeling and Simulation
- Simulation Conceptual Modeling
- System Engineering Process and Practices