Data is Future: Map Reduce -Mapper

Saturday, July 21, 2018

Map Reduce -Mapper

In the last blog post we have gone through the basic introduction of the Map reduce .In this post we will try to understand the Mapper.What are the inputs given to the mapper ? How are the value gets generated inside the mapper ? What should be the number of mappers ?

We will see the basic flow of the Mapper :-

The above mentioned flow describes what are the steps inside the mapper.The input which we receives will get converted into the form of <Key, value> pair. The <Key,value> pair is partitioned and later sorted on the basis of the key. However ,we should also note that the no of mapper depends on the number of input split that we have.

Regarding input split , we will go through in our next post in details. for the time being , we should know that it is the logical partition .Each input split is converted into the corresponding mapper job.we should note that the number of input split does not depend upon the number of blocks.

RecordReader’s responsibility is to keep reading/converting data into key-value pairs until the end of the file. Byte offset (unique number) is assigned to each line present in the file by RecordReader. Further, this key-value pair is sent to the mapper. The output of the mapper program is called as intermediate data (key-value pairs which are understandable to reduce).

The last thing is the number of mapper , it can be determine by the below formula :-

No. of Mapper= {(total data size)/ (input split size)}

Let us suppose we have a data set of 1 TB and Input split size is 100mb
then we have (1000*1000)/100= 10,000 mappers.

Data is Future

Saturday, July 21, 2018

Map Reduce -Mapper

No comments:

Post a Comment

Delta Lake - Time Travel