Saturday, June 6, 2020

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as well. Here, the job essentially consists of a set of program or programs that needs to manipulate a certain piece of data.So, a job is usually associated with certain parameters like one important thing of a job is that it is going to be a set of programs. Here in the Hadoop ecosystem, it’s going to be a Map Reduce program typically written in Java. However, there are also non-Java based jobs which can be submitted into the Hadoop cluster which we shall discuss in detail later.




The important parameters which actually identify or make up a job is going to be the programs and the piece of data on which the program has to work on , which is going to be the input file path and the output file or the output directory to which the output of the program’s execution needs to be stored. So, a job in a Hadoop ecosystem is going to be a combination of all the three.

When we specifically speak about a job in terms of a Java MapReduce job, the programs are going to be submitted into the cluster in the form of a jar file which is going to be a packaging of all the classes which constitute the program which needs to execute on a particular set of data.

The two major features what Hadoop actually provides to the users.

The first important feature is a distributed file system which is called as the Hadoop Distributed File System or the HDFS. Basically, once you submitted into the Hadoop cluster it’s going to be residing on multiple data nodes where the original file would be divided into smaller pieces. So, this service actually is called as the Hadoop Distributed File System.

Second most important feature of Hadoop is that it provides a robust parallel processing framework which is called as a MapReduce Framework. So, once you submit a job into the Hadoop cluster, the program which is supposed to execute on a piece of data is now going to be running on multi machines wherever the data corresponding to the input file on which the Hadoop program is supposed to manipulate. So, the two core services or the two core features what Hadoop actually comes very handy is that the Hadoop Distributed File System and then the Parallel Processing Framework called as a MapReduce. p { margin-bottom: 0.25cm; line-height: 115% }

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...