Data is Future: HDFS - Rack Awareness

Sunday, July 1, 2018

HDFS - Rack Awareness

The placement of Data node in the Hadoop ecosystem plays a pivotal role . It has a huge impact on the performance of the Hadoop File System.

What is Rack Awareness ?

Rack Awareness is more of understanding the cluster topology.It explains how the different nodes are distributed across the cluster.Normally , the latency between the datanodes of the same rack will be lesser compare to the those which are in the different racks.
In a Hadoop Cluster HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster.
In large clusters of Hadoop, in order to improve network traffic while reading/writing HDFS files, NameNode chooses data nodes which are on the same rack or a near by rack to read/write request (client node).

Rack Configuration :-

Hadoop handles the management of data nodes inside the racks using the rack id's.The hadoop daemons obtain the ip's of the datanodes inside the cluster slaves by invoking a JAVA class. The Topology information is obtained in the form of the 'myrack/myhost' .
Suppose we have an address of '192.168.1.23/192.1.1.52' .Here 192.168.1.23 refers to the rack id and 1921.1.52 refers to the individual host identifier.

Replica Placement via Rack Awareness :-

The position of DataNodes inside the Rack plays an important role on the performance and reliability of the HDFS.No more than one replica is placed on the same node and no more than than two replicas are placed on the same rack.This configuration is mainly done to avoid the data loss during the rack failure.The aggregate bandwith between nodes on the same rack is much greater than that between nodes on different racks.

Data is Future

Sunday, July 1, 2018

HDFS - Rack Awareness

No comments:

Post a Comment

Delta Lake - Time Travel