Friday, August 17, 2018

HDFS vs NFS

We always wonder why we need the HDFS when we have the native filesystem concepts. What makes us create a new file system? Do we have any drawbacks for NFS?

We will try to see this all things one by one.Let us have a small glance on the NFS.
Filesystem acts as a "guardian" for your whole data and meta data .It has all the necessary information about the file and folders that are kept within them.

1.) Filesystem plays a major role when we try to read or write files in our hard disk, all such kind of request is controlled by the filesystem.
2.) Filesystem controls the permission and security.
3.) Filesystem contains the metadata about the files, folders like size, owner, time and file type.


Examples :- FAT32,NTFS,EXA3,EXA4,XFS,HFS,HFS+

Since I have a windows machine.I can check the filesystem.



I have an NTFS (New Technology File System)  file system that was developed by Microsoft. It came after FAT32 which has a very small size around 4 GB.

When we came across the filesystem like the ext4 ,what we found that they do not have distributed nature of the filesystem. Precisely speaking the node 1 does not know what is available in node 2 and vice versa. They are more of a local file system.

The second thing that should be remembered that since the node 1 does not have the idea what is available in node 2. So, the replication of data is a cumbersome job here. We are practically, exposed to a data loss in such a scenario.

When we upload a file to HDFS. it will be automatically split into 128 MB fixed size blocks in the recent versions of Hadoop. HDFS takes care of placing the blocks in different nodes and also take care of replicating each block in more than one node. By default, HDFS replicates a block to 3 nodes.
HDFS by no means a replacement for your local filesystem. The operating system still relies on the local filesystem. In fact, the operating system does not care about the presence of HDFS. One more interesting thing. HDFS should still go through ext4 to save the blocks in the storage.

The true power of HDFS is that it is spread across all the nodes in your cluster and it has a distributed view of the cluster and hence it knows how to construct the bigger data set from blocks. Whereas the ext4 does not have a distributed view and only has a local view will only have the idea about the blocks in storage it is managing.





1 comment:

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...