Articles

Big Data From B to A: The Hadoop Distributed Filesystem — HDFS

As data is significantly growing, storing large amounts of information across a network of machines becomes a necessity. Therefore, comes the need for a reliable system, called distributed filesystems, to control how data is stored and retrieved.

However, many challenges emerge with the implementation of such infrastructure, for instance, handling hardware failure without losing data. In this article, we’ll focus on Hadoop’s distributed filesystem — HDFS, its design, its architecture, and the data flow. The design of HDFS. The Hadoop Distributed File System (HDFS) is a distributed file system designed to: Run on commodity hardware.

Source: towardsdatascience.com
Author: Hajar Khizou

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s