The Seven Sins of Data Prep

Data preparation is often considered a necessary precursor to the “real” work found in visualizing or analyzing data, but this framing sells data prep short. The ways in which we cleanse and shape data for downstream use have significant bearing on our final analytic output, and cutting corners on data prep can run up a huge cost for companies.

According to a report from the Harvard Business Review, bad data costs the U.S. roughly $3 trillion per year. primarily due to the time involved in correcting data and the consequences of errors leaking through to customers. Below, we’ve outlined what we consider a “data prep sin” — or what will surely affect the end result for worse. Sin #1: Removing data Removing records containing incomplete, erroneous, outlying, or extraneous records is one of the most common transformations in data preparation. However, removing data can introduce bias or affect downstream results in meaningful ways..

Author: Sean Kandel


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s