One of the great ironies of the “big data” revolution is the way in which so much of the insight we draw from these massive datasets actually comes from small samples not much larger than the datasets we have always used.
A social media analysis might begin with a trillion tweets, use a keyword search to reduce that number to a hundred million tweets and then use a random sample of just 1,000 tweets to generate the final result presented to the user. As our datasets get ever larger, the algorithms and computing environments we use to analyze them have not grown accordingly, leaving our results to be less and less representative even as we have more and more data at our fingertips.
Author: Kalev Leetaru
Leave a Reply