Saturday, March 10, 2018

Map vs FlatMap In Pyspark

Map() :- Return a new distributed dataset formed by passing each element of the source through a function.

Map can be considered as 1: 1 relationship in spark It means each element is associated with it's corresponding element.

The input and output will be RDD for this function.



FlatMap():-Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).

The input and output will be RDD for this function also.

It flattens multiple list into single list. 

Python Code :-

OUTPUT OF MAP:-



OUTPUT OF FLATMAP:-


Difference between MAP and FLATMAP :-

map() output is an RDD whereas flatMap() output is RDD containing elements of all iterators.

You can find the datafile and related code on my github id :-
 https://github.com/sangam92/Spark_tutorials

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...