Data is Future: Map vs FlatMap In Pyspark

Saturday, March 10, 2018

Map vs FlatMap In Pyspark

Map() :- Return a new distributed dataset formed by passing each element of the source through a function.

Map can be considered as 1: 1 relationship in spark It means each element is associated with it's corresponding element.

The input and output will be RDD for this function.

FlatMap():-Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).

The input and output will be RDD for this function also.

It flattens multiple list into single list.

Python Code :-

OUTPUT OF MAP:-

OUTPUT OF FLATMAP:-

Difference between MAP and FLATMAP :-

map() output is an RDD whereas flatMap() output is RDD containing elements of all iterators.

You can find the datafile and related code on my github id :-
https://github.com/sangam92/Spark_tutorials

Data is Future

Saturday, March 10, 2018

Map vs FlatMap In Pyspark

No comments:

Post a Comment

Delta Lake - Time Travel