So you’re filling your Hadoop cluster with reams of raw data, and your data analysts and scientists are champing at the bit to get started. Then the question hits you: How are you going to store all this data so they can actually use it?
The good news is Hadoop is one of the most cost-effective ways to store huge amounts of data. You can store all types of structured, semi-structure, and unstructured data within the Hadoop Distributed File System, and process it in a variety of ways using Hive, HBase, Spark, and many other engines. You have many choices when it comes to storing and processing data on Hadoop, which can be both a blessing and a curse.
Author: Alex Woodie