In an era where almost everything is touted as being “big data” how do we define just what we mean by “big data” and what precisely counts as a “big data” analysis? Does merely keyword searching a multi-petabyte dataset count? Does using a date filter to extract a few million tweets from the full trillion-tweet archive count as “big data?”
Does running a hundred petabyte file server or merely storing a hundred petabyte backup count? What exactly should count as “big data” today? I used to open my data science talks back in 2013 by saying I had just run several hundred analyses the previous day over a 100-petabyte database totaling more than 30 trillion rows, with more than 200 indicators incorporated into the analysis.
Author: Kalev Leetaru