Tuesday, June 26, 2018

HDFS - Replication Factor

Replication Factor is a property in Hadoop ecosystem that signifies the number of replicas of a block that can be kept in the cluster. We already know that the data in the  Hadoop distributed system is stored in the commodity hardware. To maintain a high data availability, we need to have the replicas of the data. The default replication factor in HDFS is 3.

When the number of replicas of the data is below the replication factor, then it is called under-replicated blocks.
                             Replication < Replication Factor

Example :- Let us assume that we have 10 TB storage and the data size is 4 TB. The default replication factor is 3. In such a case , all the blocks does not have a replica. Such scenarios are where we ave under replicated data.

When the number of replicas are more than the replication factor , then it is called over replicated blocks.
                            Replication  > Replication Factor

Over Replicated blocks normally occurs when crashed data nodes become alive.
We can check the file details like replication factor, corrupt files , Under replicated blocks and many more details with the help of the  hdfs fsck  /  command. The / here denotes  the root directory.
The replication factor details are available in the file hdfs-site.xml which is available in the Hadoop Installation Directory.

<property>
<name>dfs.replication<name>
<value>3<value>
<description>Block Replication<description>
<property>


We can set the replication factor on the per file basis via below commands:-
hadoop fs -setrep -w 2  /dir/


 



No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...