Thinking Like a Data Scientist Part I: Understanding Where To Start

data scientistOne question I frequently get is: “How do I become a data scientist?”  Wow, tough question.  There are several new books that outline the different skills, capabilities and technologies that a data scientist is going to need to learn and eventually master.  I’ve read several of these books and am impressed with the depth of the content.

Unfortunately, these books spend the vast majority of their time reviewing and/or teaching things such as the data science processes (such as CRISP: Cross Industry Standard Process for Data Mining), and basic and advanced statistics, data mining and data visualization techniques and tools.

Yes, these are very important data science skills, but they are not nearly sufficient to make our data science teams effective.  The data science teams still need help from the business users – or subject matter experts (SME) – to understand the decisions the business is trying to make, the hypotheses that they want to test and the predictions that they need to produce in support of those decisions and hypotheses.  In essence, to improve the overall effectiveness of our data science teams, we need to teach the business users to think like a data scientist.

So the objective of this blog (which if successful, will make its way into my Big Data MBA curriculum for the University of San Francisco School of Management fall semester) is to define a process that helps business users to “think like a data scientist.”  

Thinking like A Data Scientist Process

The goal of the “thinking like a data scientist” process is to identify, brainstorm and/or uncover new variables that are better predictors of business performance.  But “business performance” of what?  Our key business initiative, of course.

Step 1:  Identify Key Business Initiative.  Would you expect anything different from me than starting with what’s important to the business?  So, how can you spot a key business initiative?

A key business initiative is characterized as:

  • Critical to the immediate-term performance of the organization
  • Documented (communicated either internally or publicly)
  • Cross-functional (involves more than one business function)
  • Owned/championed by a senior business executive
  • Has a measurable financial goal
  • Has a well-defined delivery timeframe (9 to 12 months)
  • Undertaken to delivery significant, compelling and/or distinguishable financial or competitive advantage

I am a big stickler about targeting business initiatives that are focused on the next 9 to 12 months.  Anything longer than 12 months can quickly digress into a “Battlestar Galactica” or “cure world hunger” project that may have incredible business value, but little chance of success.

For a refresher on how to identify an organizations key business initiatives, read my blog “Big Data MBA: Reading the Annual Report for Big Data Opportunities.”  That blog outlines how to leverage publicly available information (e.g., annual reports, analyst calls, executive speeches, company blogs, to uncover an organization’s key business initiatives.

For purposes of this exercise, I’m going to pretend that our client is Foot Locker, and that our target business initiative is “Improve Merchandising Effectiveness” as highlighted in their annual report (see Figure 1).

Figure 1: Identifying and Understanding Organization’s Key Business Initiatives

Figure 1: Identifying and Understanding Organization’s Key Business Initiatives

Step 2:  Identify Strategic Nouns.  Strategic nouns are the key business entities that either impact or are impacted by the organization’s key business initiative.  These strategic nouns are critical to our data scientist thinking process because these are the entities for which we want to uncover or gain new, actionable insights, and around which we will ultimately build our analytic profiles.  Examples of strategic nouns include customers, patients, students, employees, stores, products, medication, trucks, wind turbines, etc.

For the Foot Locker “Improve Merchandising Effectiveness” business initiative, the strategic nouns upon which we will focus are:

  • Customers
  • Products
  • Campaigns
  • Stores

Step 3:  Brainstorm Strategic Noun Questions. Probably the hardest part of this exercise – and maybe the hardest part of the “thinking like a data scientist” exercise – is to brainstorm the different questions that you want to ask in support of the targeted business initiative.  For this part of the exercise, we want the business users to brainstorm the business questions for each of the “strategic noun” questions from the perspectives of:

  • Descriptive Analytics:  Understandingwhat happened
  • Predictive Analytics:  Predictingwhat is likely to happen
  • Prescriptive Analytics:  Recommendingwhat to do next

See Figure 2 for an example of the evolution from Descriptive to Predictive to Prescriptive.

Figure 2: Evolution of The Analytic Questions

Figure 2: Evolution of The Analytic Questions

In our Foot Locker “Improve Merchandising Effectiveness” example, we want to brainstorm the “Customer” strategic noun questions as such:

Descriptive Analytics (Understanding what happened)

  • What customers are most receptive to what types of merchandising campaigns?
  • What are the characteristics of customers (e.g., age, gender, customer tenure, life stage, favorite sports) who are most responsive to merchandising offers?
  • Are there certain times of year where certain customers are more responsive?

Predictive Analytics (Predicting what will happen)

  • Which customers are most likely to respond to a Back to School event
  • Which customers are most likely to respond to a BOGOF offer?
  • Which customers are most likely to respond to a 50% off in-store markdown?

Prescriptive Analytics (Recommending what to do next)

  • What personalized offers (recommendations) should I deliver to Anne Smith to get her to come into the store?

Part II of “Thinking Like a Data Scientist” blog series will continue this “thinking like a data scientist” process and hopefully help us uncover new data sources and metrics that may be better predictors of business performance.

To learn more about EMC’s unique approach to leveraging Big Data to drive business value, please check out EMC’s Big Data Vision Workshop offering.

Bill Schmarzo

Authored by:
Bill Schmarzo

The moniker “Dean of Big Data” may have been applied in a light-hearted spirit, but Bill’s expertise around data analytics is no joke. After being deeply immersed in the world of big data for over 20 years, he shows no signs of coming up for air. Bill speaks frequently on the use of big data, with an engaging style that has gained him many accolades. He’s presented most recently at STRATA, The Data Science Summit and TDWI, and has written several white papers and articles about the application of big data and advanced analytics to drive an organization’s key business initiatives. Prior to joining Consulting as part of EMC Global Services, Bill co-authored with Ralph Kimball a series of articles on analytic applications, and was on the faculty of TDWI teaching a course on designing analytic applications.

Bill created the EMC Big Data Vision Workshop methodology that links an organization’s strategic business initiatives with supporting data and analytic requirements, and thus helps organizations wrap their heads around this complex subject.

Bill sets the strategy and defines offerings and capabilities for the Enterprise Information Management and Analytics within EMC Consulting, Global Services. Prior to this, he was the Vice President of Advertiser Analytics at Yahoo at the dawn of the online Big Data revolution.

Bill is the author of “Big Data: Understanding How Data Powers Big Business” published by Wiley.

©Bill Schmarzo, 2015. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Bill Schmarzo and with appropriate and specific direction to the original content.

4 replies »

  1. Bill, Thanks for bringing some clarity and sanity to a confused and insane subject. Most of what I see are things like, “Become a data scientist in 3 days”, “Learn this tool and you will be a data scientist”, …, “Pay me $100 and I will make you a data scientist.” How dare you introduce logic and good common sense into the equation–and thanks for doing it!

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s