Lists

The Seven Sins of Data Prep

Data preparation is often considered a necessary precursor to the “real” work found in visualizing or analyzing data, but this framing sells data prep short. The ways in which we cleanse and shape data for downstream use have significant bearing on our final analytic output, and cutting corners on data prep can run up a huge cost for companies.

According to a report from the Harvard Business Review, bad data costs the U.S. roughly $3 trillion per year. primarily due to the time involved in correcting data and the consequences of errors leaking through to customers. Below, we’ve outlined what we consider a “data prep sin” — or what will surely affect the end result for worse. Sin #1: Removing data Removing records containing incomplete, erroneous, outlying, or extraneous records is one of the most common transformations in data preparation. However, removing data can introduce bias or affect downstream results in meaningful ways..

Source: datanami.com
Author: Sean Kandel

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s