Sunday, May 10, 2020

Hadoop - Why Hadoop need vendors like Cloudera,Hortonworks and MAPR ?


Hadoop is an open source Technology stack provided by the Apache software Foundation, now why are there major vendors out here so it is open source, it’s should be completely free and why do we have all these companies like Cloudera, Hortonworks, MAPR, the IBM the Microsoft.


 

The answer for this question is however even though the major components of Hadoop and the Hadoop itself is going to be completely open source and free, it is not so easy for large companies to rely on the open source framework unless they deploy a large number of administrators and engineers to fit everything together and make sure Hadoop cluster is running smoothly and
troubleshoot whenever there is a down time. It's not so easy.

What these companies do? The Cloudera, Hortonworks, MAPR and other big players, what they try to do is that they basically use the same open source Hadoop from Apache and they try to bundle everything together and try to add wrapper on the top of that, which is basically their management tools and a lot of GUI based troubleshooting and management tools and to certain extent also automate the cluster setup and troubleshooting. It not completely automated but to a large extent the cluster setup and troubleshooting is very very easy with a few numbers of administrators in place.

Actually, not all the companies , specially the mid-sized ones cannot afford to have an army of engineer working 24 x7.Performance that's going to be a herculean task for a lot of mid-sized companies and they have a budget crunch and the number of resources what they hire is going to be strictly limited. In such cases these vendors come to rescue while they have tools at their disposal
of the administrator where managing the cluster , fine tuning the cluster , optimizing it and troubleshooting it is becoming really easy because of the lot of tools what this company try to bundle along with the open source Hadoop which is provided by Apache. So they might also add some of their own touch to the open source apache Hadoop but it will be mostly the open source one.

Cloudera, Hortonworks, MAPR these are the three major players , now cloudera and Hortonworks are merged. The foot print of Cloudera is pretty high over here.

Hadoop is not a full-fledged distributed operating system. It’s a piece of software which is sitting on top of an existing standalone desktop operating system, and some of the common operating system what Hadoop supports are mostly the flavors of Linux. So we have the CentOS, the Oracle Linux, the Ubuntu.

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...