As anyone who’s ever used a spreadsheet can attest, it’s often convenient to organize data into tables. But in the age of big data, those tables can be enormous, with millions or even hundreds of millions of rows.
One way to make big-data analysis computationally practical is to reduce the size of data tables — or matrices, to use the mathematical term — by leaving out a bunch of rows. The trick is that the remaining rows have to be in some sense representative of the ones that were omitted, in order for computations performed on them to yield approximately the right results.
Author: Larry Hardesty