How Do We Define ‘Big Data’ And Just What Counts As A ‘Big Data’ Analysis?

In an era where almost everything is touted as being “big data” how do we define just what we mean by “big data” and what precisely counts as a “big data” analysis? Does merely keyword searching a multi-petabyte dataset count? Does using a date filter to extract a few million tweets from the full trillion-tweet archive count as “big data?”

Does running a hundred petabyte file server or merely storing a hundred petabyte backup count? What exactly should count as “big data” today? I used to open my data science talks back in 2013 by saying I had just run several hundred analyses the previous day over a 100-petabyte database totaling more than 30 trillion rows, with more than 200 indicators incorporated into the analysis. 

Author: Kalev Leetaru

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s