As data is significantly growing, storing large amounts of information across a network of machines becomes a necessity. Therefore, comes the need for a reliable system, called distributed filesystems, to control how data is stored and retrieved.
However, many challenges emerge with the implementation of such infrastructure, for instance, handling hardware failure without losing data. In this article, we’ll focus on Hadoop’s distributed filesystem — HDFS, its design, its architecture, and the data flow. The design of HDFS. The Hadoop Distributed File System (HDFS) is a distributed file system designed to: Run on commodity hardware.
Author: Hajar Khizou