Data is Future: July 2018

Saturday, July 28, 2018

Map Reduce - Combiner

Combiner takes output from Map, creates Key-value pairs and passes these to Reducer..It is an optional class and Combiner does the task similar to reduce but it is done on the Map machine itself.sometimes,it is also referred as "mini-reducer".

On a larger data sets , the mapper produces an enormous amount of data to the reducer for further processing.While transferring this large set of data from mapper to reducer led to a network congestion This particular problem is handled by Hadoop framework with the help of the class called Combiner.

Let us suppose we have a file named as 'file.txt' having following lines :-
"life is life is.new is new is"
Map Reduce without combiner:--

The number of key generated during the mapper phase is 4 + 4 = 8 keys .This 8 keys need to be transferred to the Reducer for further processing. The number of keys during the reducer phase is 3.

When we apply the combiner , we can observe the the difference in the number of keys that will be moved from mapper to reducer.

Map Reduce with combiner:--

In this case , the number of key that will be processed by the Reducer will be 2 + 2 = 4 .We need only 4 keys as compare to 8 in the previous case.This will reduce the lots of network congestion while moving the data from mapper to reducer.

Hence , the combiner plays a vital role :-
⦁    In reducing the network congestion while transferring the data.
⦁    It decreases the amount of data that needed to be processed by the reducer.
⦁    Boost the overall performance of the Map Reduce jobs.

Saturday, July 28, 2018

Map Reduce - Combiner

Delta Lake - Time Travel