Fishing in a lake and a data lake are much the same. Data scientists must not only go where the fish are for big data insights, but also find a way to quickly build the data pipeline that turns raw data into business results.
I learned at an early age when fishing with my buddies that it doesn’t matter how good of a fisherman you are—you’re not going to catch anything if you’re not fishing where the fish are. This same bit of advice extends to data lakes. Not even the best data scientists in the world can find insights in data lakes that are nothing but data swamps. But that’s what most data analysts are using today—swamps filled with databases, file systems, and Hadoop clusters containing vast amounts of siloed data, but no efficient way to find, prepare, and analyze that data.
Author: John Haddad