Data is Future: Hadoop : Architecture

Friday, March 16, 2018

Hadoop : Architecture

Hadoop employs a master-slave architecture for both distributed storage and distributed computation.

In layman terms , The master-slave architecture means there will be some authority who will divide the work, do some checks at regular interval,keep the log of work that has been assigned to it's employee.In the same way, hadoop keep track of all the data division, processing,job tracking with the help of some of it's components.

On a fully running cluster , we can have some daemons that run on this cluster and completes a master slave architecture in hadoop.

Suppose we have a cluster of 100 nodes then ,97 will work in slave mode and act as a data node.Rest 3 will be Name Node ,Secondary Name Node and Job Tracker.It can also depend upon the configuration of the individual cluster.

Five daemons that are present in the Hadoop :-
1.) Name Node
2.) Data Node
3.) Secondary Name Node
4.) Job Tracker
5.)Task tracker

Name Node :- Hadoop employs a master/slave architecture for both distributed storage and distributed computation. The distributed storage system is called the Hadoop File System , or HDFS. The NameNode is the master of HDFS that directs the slave DataNode daemons to perform the low-level I/O tasks. The NameNode is the bookkeeper of HDFS;

it keeps track of how your files are broken down into file blocks, which nodes store those blocks, and the overall health of the distributed filesystem.It is the single point of failure. Normally, the server having the Name Node does not contain any data or computation process.

Data Node:- When you want to read or write a HDFS fi le, the fi le is broken into blocks and the NameNode will tell your client which DataNode each block resides in.Your client communicates directly with the DataNode daemons to process the local files corresponding to the blocks. Furthermore, a DataNode may communicate with other DataNodes to replicate its data blocks for redundancy.DataNodes are constantly reporting to the NameNode.

Upon initialization, each of the DataNodes informs the NameNode of the blocks it’s currently storing

Secondary Name Node :-The Secondary NameNode (SNN) is an assistant daemon for monitoring the state of the cluster HDFS. Like the NameNode, each cluster has one SNN, and it typically resides on its own machine as well. No other DataNode or TaskTracker daemons run on the same server. The SNN differs from the NameNode in that this process doesn’t receive or record any real-time changes to HDFS. Instead, it communicates with the NameNode to take snapshots of the HDFS metadata at intervals defined by the cluster configuration.

Job Tracker :-The JobTracker daemon is the liaison between your application and Hadoop. Once you submit your code to your cluster, the JobTracker determines the execution plan by determining which fi les to process, assigns nodes to different tasks, and monitors all tasks as they’re running. Should a task fail, the JobTracker will automatically relaunch the task, possibly on a different node, up to a predefi ned limit of retries. There is only one JobTracker daemon per Hadoop cluster. It’s typically run on a server as a master node of the cluster.

Task Tracker:- Each TaskTracker is responsible for executing the individual tasks that the JobTracker assigns. Although there is a single TaskTracker per slave node,One responsibility of the TaskTracker is to constantly communicate with the JobTracker. If the JobTracker fails to receive a heartbeat from a TaskTracker within a specifi ed amount of time, it will assume the TaskTracker has crashed and will resubmit the corresponding tasks to other nodes in the cluster.

How to see the deamons in Hadoop environemt ?

once you start the Hadoop by typing : start-all.sh

All the daemons will be up and running .We can verify this with the help of the jps command.

Data is Future

Friday, March 16, 2018

Hadoop : Architecture

No comments:

Post a Comment

Delta Lake - Time Travel