Thursday, April 30, 2020

Big Data - Challenges of parallel computing

The parallel way of trying to solve a problem by the parallel processing or grid processing was known by the term called Super Computing.This idea of super computing is pretty old and date backs to 1950s and 1960s .

The major vendors of supercomputers include the IBM, the FUJITSU, the CRAY , the INTEL and have a dozen of companies to actually add up to the list of super computer vendors throughout the globe.

Basically super computer is nothing but a cluster of computers which is interconnected by means of networking hard drive. You can see the picture
out here where we have the various computing resources , actually is going
to be stacked in the rack and you have the network engineers and the system administrators have to physically wire them up and enable them to actually communicate over a network.

Now having spoken about super computers , what are the use cases of super computers? What was the real need for super computer and who are the companies or who are the organizations who are actually using super computers?

Mostly supercomputers were restricted to University research labs and research lab owned by individual organization mainly , they were used in the areas of Computational fluid dynamics research Bioinformatics and a lot more.






The general purpose operating system like Framework did not exist for parallel computing need, meaning if a company actually is selling supercomputer it did not have a ready to sell off the shelf operating system that can be readily installed as a super computer goes live. It was not as simple as that. The companies procuring the supercomputers well locked to specific vendors for Hardware support. Suppose you are buying the super computer from IBM than you have to go back to IBM for any kind of hardware support. The high initial cost of hardware the supercomputer literally cost at millions of dollars . and you have to develop the custom software for each individual use case.

For example, if your organization has procured supercomputer you have to write a full-fledged operating system.However, the basic Framework support probably could be available to an open source, a source but for most part you will have to actually customize it for your use cases.

So heavily you have to depend upon the internal software engineering team to tailor the software for each kind of problem what you want to solve using the super computer. This actually let to the high cost of software maintenance, upgrades, bug fixes, and everything has to be taken care in house.
It is not simple to actually scale the cluster horizontally. Meaning if you want to actually increase the computing capacity of a supercomputer or the storage capacity in your supercomputer, you cannot do it very easily. You require some kind of support from the super computer vendors itself. So these were all the challenges of super computing.

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...