Friday, July 20, 2018

Map Reduce - Introduction

Map Reduce is a software framework for writing applications which processes vast amounts of data in parallel and on large clusters of commodity hardware in a reliable and fault tolerant manner.
The dataset is primarily splitted  into the small chunks which are processed by the map task in a completely independent manner.The output of the map task is fed as the input to the reduce task.On a very high level , we can see that Map Reduce task is driven by only two processes :-

1.) Mapper
2.) Reducer

 

The Map Reduce framework works mainly on <key,value> pairs.The input to the mapper will be in the form of <Key,Value> and the input of the reducer will be also in the form of <Key,Value> pair.The complete set of these tasks is handled by the Hadoop framework, mainly Job Tracker and Task Tracker.
On a high level , the below processes takes place in a map reduce job :-
⦁    The task starts with Mapper part and finishes with Reducer.In Mapper phase , the input data is   read and processed.
⦁    The mapper phase outputs the key , value pairs as an intermediate output.
⦁    The output of the mapper acts as an input to the Reducer.
⦁    The final output of the reducer is the aggregated value of all the mappers.

Advantages of Map Reduce:-
1.) Parallel Programming :- One of the biggest advantages of Map reduce program is that it allows parallel take on these divided tasks, such that they run entire programs in less time.programming.It breaks the program into multiple tasks .Parallel processing allows multiple processors to take on these divided tasks, such that they run entire programs in less time.
2.)Memory Requirements: -Next biggest advantage of using MapReduce is that it does not require very high memory like other Hadoop ecosystem's components.  MapReduce can work with minimal amount of memory and give results very fast.
3.)Data Locality:-  Instead of moving data to the processing unit, we are moving processing unit to the data in the MapReduce Framework.  In the traditional system, we used to bring data to the processing unit and process it. MapReduce allows us to overcome above issues by bringing the processing unit to the data.

Disadvantages of Map Reduce :-

1.) OLTP processing :- The Map reduce programming is not suitable for the OLTP processing .
2.) Latency :- The Map reduce programs are relatively slower as it is designed to process large data , different format and structures.

1 comment:

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...