What is Hadoop? Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project. It is licensed under the Apache License 2.0.
Why Hadoop? Since there are a lot of low cost storage available, the key challenge for handling Big Data is how to read large amount of data fast. Hadoop is good for reading a big file in one shot. The default data block size (the smallest unit of data that a file system can store) of Hadoop is 64MB. The block size in disk is generally 4KB.
Author: Jenny Zhang