6 Things You Need To Know About Hadoop

What is Hadoop? Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project. It is licensed under the Apache License 2.0.

Why Hadoop? Since there are a lot of low cost storage available, the key challenge for handling Big Data is how to read large amount of data fast. Hadoop is good for reading a big file in one shot. The default data block size (the smallest unit of data that a file system can store) of Hadoop is 64MB. The block size in disk is generally 4KB.

Author: Jenny Zhang


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s