Tuesday, May 28, 2019

Hadoop - Federation

In prior HDFS structure , we can have only one namespace .This configuration
allows a single NameNode and in the event of failure the complete cluster will
go out of the service . We need to restart the Namenode or bring another
Namenode to get back the cluster.




Limitations of the current architecture :-
NameSpace Scalability :- The namespace is not scalable like datanode.
Scaling in HDFS cluster is horizontally by adding datanodes. But we can’t add
more namespace to an existing cluster. We can scale namespace vertically on a
single namenode.
Performance
:- Hadoop entire performance depends on the throughput of the
namenode. An operation of current file system depends on the throughput of a
single namenode. NameNode at present supports 60,000 concurrent tasks.
Upcoming MapReduce will have support for more than 1,00,000 concurrent
tasks. And this will need more namenode.

Tightly coupled Namespace and Datastorage :- Namespace layer and
storage layer are tightly coupled. It makes alternate implementation of
namenode difficult. And it restricts other services to use block storage directly.
Isolation :- In general the HDFS deployments are available on a multi-tenant
environment where a single cluster is shared by multiple organizations. In this
setup a separate namespace is not possible for one application or one
organization

Hadoop federation allows scaling the name service horizontally. It uses independent namenodes are federated i.e. they don’t require inter
coordination . These datanodes are used as common storage by all the
namenodes. Each datanode is registered with all the namenodes in the cluster.
These datanodes send periodic reports and responds to the commands from
the name nodes. We have a block pool which is a set of blocks that belong to a
single namespace. In a cluster, the datanodes stores blocks for all the block
pools. Each block pool is managed independently. This enables the name space
to generate block ids for new blocks without informing other namespaces. If
one namenode fails for any reason, the datanode keeps on serving from other
namenodes.



Benefits of Hadoop federation :-

Scalability and Isolation – Multiple namenodes horizontally scales up in the
file system namespace. This actually separates namespace volumes for users
and categories of application and provides an absolute isolation.

Generic Storage Service – The block level pool abstraction allows the
architecture to build new file systems on top of block storage. We can easily
build new applications on the block storage layer without using the file system
interface. Customized categories of block pool can also be built which are
different from the default block pool.

Simple Design – Namenodes and namespaces are independent of each other.
There is hardly any scenario which requires changing the existing name nodes.

Each name node is built to be robust. Federation is also backward compatible.
It easily integrates with the existing single node deployments which work
without any configuration changes.

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...