Hadoop is an open
source Technology stack provided by the Apache software Foundation,
now why are there major vendors out here so it is open source, it’s
should be completely free and why do we have all these companies
like Cloudera, Hortonworks, MAPR, the IBM the Microsoft.
The answer for this
question is however even though the major components of Hadoop and
the Hadoop itself is going to be completely open source and free, it
is not so easy for large companies to rely on the open source
framework unless they deploy a large number of administrators and
engineers to fit everything together and make sure Hadoop cluster is
running smoothly and
troubleshoot
whenever there is a down time. It's not so easy.
What these companies
do? The Cloudera, Hortonworks, MAPR and other big players, what they try to do is that
they basically use the same open source Hadoop from Apache and they
try to bundle everything
together and try to add wrapper on the top of that, which is
basically their management tools and a lot of GUI based
troubleshooting and management tools and to certain extent also
automate the cluster setup and troubleshooting. It not completely
automated but to a large extent the cluster setup and troubleshooting
is very very easy with a few numbers of administrators in place.
Actually, not all
the companies , specially the mid-sized ones cannot afford to have
an army of engineer working 24
x7.Performance that's going to be a herculean task for a lot of
mid-sized companies and they have a budget crunch and the number of
resources what they hire is going to be strictly limited. In such
cases these vendors come to rescue while they have tools at their
disposal
of the administrator
where managing the cluster , fine tuning the cluster , optimizing it
and troubleshooting it is becoming really easy because of the lot of
tools what this company try to bundle along with the open source
Hadoop which is provided by Apache. So they might also add some of
their own touch to the open source apache Hadoop but it will be
mostly the open source one.
Cloudera,
Hortonworks, MAPR these are the three major players , now cloudera
and Hortonworks are merged. The foot print of Cloudera is pretty high
over here.
Hadoop is not a
full-fledged distributed operating system. It’s a piece of software
which is sitting on top of an existing standalone desktop operating
system, and some of the common operating system what Hadoop supports
are mostly the flavors of Linux. So we have the CentOS, the Oracle
Linux, the Ubuntu.
No comments:
Post a Comment