How Do We Define ‘Big Data’ And Just What Counts As A ‘Big Data’ Analysis?

In an era where almost everything is touted as being “big data” how do we define just what we mean by “big data” and what precisely counts as a “big data” analysis? Does merely keyword searching a multi-petabyte dataset count? Does using a date filter to extract a few million tweets from the full trillion-tweet archive count as “big data?”

Does running a hundred petabyte file server or merely storing a hundred petabyte backup count? What exactly should count as “big data” today? I used to open my data science talks back in 2013 by saying I had just run several hundred analyses the previous day over a 100-petabyte database totaling more than 30 trillion rows, with more than 200 indicators incorporated into the analysis. 

Author: Kalev Leetaru

