I write a good bit of content about using open-source tools for analytics and operations research. However, my workhorse happens to be SAS. Actually, I use SAS Enterprise Guide (EG) and SAS Enterprise Miner (EM).
When to use open-source
I have argued both for and against the use of open-source tools and they certainly have their place. If you have a limited budget, open-source is a good path to journey down. Also, if you are teaching or are a student at a university, open-source seems like a logical option.
How I use SAS
I perform predictive modeling in the Financial and Insurance (FSI) industry. My method requires me to use SAS EG to retrieve variables from multiple data sets (between 10 and 20 sometimes), resulting in about 2000 variables. Once these variables are merged into one data set (for up to 11 million customers), I run an information value algorithm to determine which variables have the most predictive power for the response variable. After eliminating variables that cannot be used for marketing bank products (fair lending acts and so on), I import between 150 and 300 variables into SAS EM. In EM, I perform data partitioning, data imputations, and data transformations prior to running a model, like logistic regression. When I get an adequate model, I take the scoring code from EM and port it to EG, wrapped in a macro. I run the macro in EG to measure model performance. When I get a model that performs well enough against challenger models, I develop model production code in EG, which includes the EM scoring code.
Why I use SAS
SAS EG and EM are trusted by the FSI I service. I actually began using SAS in 1990 on a mainframe computer. It has since become the statistical software of choice by many industries for many years. With external model governance and validation, the models I have developed in SAS EG and EM have been stress tested and proven valid for use by the FSI. This does not imply that models created with open-source tools are any less valid, as I will discuss next.
Open-source and SAS
Once I build a predictive model in SAS, I attempt the build the same model in R, for example. The models in R are similar enough to show that the models in SAS are indeed valid models. For instance, I constructed an uplift model in SAS using logistic regression and an uplift model for the same acquisition situation using random forest in R. The overall net lift was identical, although the distribution among deciles was slightly different.
I will not argue that either SAS or open-source tools are better than the other. Instead, I will state that they each have their use in various situations. Yes, SAS is tried and true…but I will continue to use both.
Jeffrey Strickland, Ph.D.
Jeffrey Strickland, Ph.D., is the Author of Predictive Analytics Using R and a Senior Analytics Scientist with Clarity Solution Group. He has performed predictive modeling, simulation and analysis for the Department of Defense, NASA, the Missile Defense Agency, and the Financial and Insurance Industries for over 20 years. Jeff is a Certified Modeling and Simulation professional (CMSP) and an Associate Systems Engineering Professional (ASEP). He has published nearly 200 blogs on LinkedIn, is also a frequently invited guest speaker and the author of 20 books including:
- Operations Research using Open-Source Tools
- Discrete Event simulation using ExtendSim
- Crime Analysis and Mapping
- Missile Flight Simulation
- Mathematical Modeling of Warfare and Combat Phenomenon
- Predictive Modeling and Analytics
- Using Math to Defeat the Enemy
- Verification and Validation for Modeling and Simulation
- Simulation Conceptual Modeling
- System Engineering Process and Practices