Saturday, May 25, 2019

Hadoop - High Availability

In the Hadoop 1.0 cluster, the name node was a single point of failure. Name
node failure gravely impacted the complete cluster availability. Taking down
the name node for maintenance or upgrades meant that the entire cluster was unavailable during that time. The HDFS High Availability (HA) feature
introduced with Hadoop 2.0 addresses this problem.

Now you can have two name nodes in a cluster in an active-passive
configuration: One node is active at a time, and the other node is in standby
mode. The active and standby name nodes remain synchronized. If the active
name node fails, the standby name node takes over and promotes itself to the
active state.

In other words, the active name node is responsible for serving all client
requests, whereas the standby name node simply acts as a passive name node
—it maintains enough state to provide a fast failover to act as the active name
node if the current active name node fails.

This allows a fast failover to the standby name node if an active name node
crashes, or a graceful failing over to the standby name node by the Hadoop administrator for any planned maintenance.



In this implementation method, the file system namespace and edit log are
maintained on a shared storage device (for example, a Network File System
[NFS] mount from a NAS [Network Attached Storage]). Both the active name
node and the passive or standby name node have access to this shared
storage, but only the active name node can write to it; the standby name node
can only read from it, to synchronize its own copy of file system namespace.

When the active name node performs any changes to the file system
namespace, it persists the changes to the edit log available on the shared
storage device; the standby name node constantly applies changes logged by
the active name node in the edit log from the shared storage device to its own
copy of the file system namespace. When a failover happens, the standby
name node ensures that it has fully synchronized its file system namespace
from the changes logged in the edit log before it can promote itself to the role
of active name node.

The possibility of a “split-brain” scenario exists, in which both the name nodes
take control of writing to the edit log on the same shared storage device at the
same time. This results in data loss or corruption, or some other unexpected
result. To avoid this scenario, while configuring high availability with shared
storage, you can configure a fencing method for a shared storage device. Only
one name node is then able to write to the edit log on the shared storage
device at one time. During failover, the shared storage devices gives write
access to the new active name node (earlier, the standby name node) and
revokes the write access from the old active name node (now the standby
name node), allowing it to only read the edit log to synchronize its copy of the
file system namespace.

Heartbeat signal from the Data Node :- Whatever method you choose for
implementing high availability for name nodes (based on shared storage or
quorum-based storage using the Quorum Journal Manager), you must configure
all the data nodes in the cluster to send heartbeat signals and block-reports to
both the active name node and the standby name node. This ensures that the
standby name node also has up-to-date information on the location of blocks in
the cluster. This helps with faster failover.

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...