Big Fish Swim in the Data Lake

I’ve been known as something of a data lake detractor, deeply suspicious of its early “definition” by James Dixon, CTO of Pentaho in a 2010 blog as a place where “the contents… stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

I have elaborated elsewhere on the many problems with this description and its direct descendants, but there is also an underlying truth in Dixon’s statement of the problems of traditional data warehousing and his vision that the Hadoop ecosystem has a significant role to play in their solution.

Author: Barry Devlin


