The era of “big data” has been marked by a cataclysmic break from statistics. With the loss of the denominator across much of modern data science and a growing departure from the idea that the quality of our data influences the accuracy and representativeness of our results, we seem to have entered a “post-statistics” era of big data.
One of the key driving forces behind this transition has been the shift from open source tools and open data to opaque datasets processed through black box algorithms that make reproducibility and accuracy assessments impossible. In an era in which we no longer seem to care about the accuracy of our results, what does the future of data science hold in an increasingly proprietary world?
Author: Kalev Leetaru