Map() :- Return a new distributed dataset formed by passing each element of the source through a function.
Map can be considered as 1: 1 relationship in spark It means each element is associated with it's corresponding element.
The input and output will be RDD for this function.
data:image/s3,"s3://crabby-images/0f23d/0f23d4b9024bd78f7135f04463bfd9e5d83448e1" alt=""
FlatMap():-Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The input and output will be RDD for this function also.
It flattens multiple list into single list.
Python Code :-
OUTPUT OF MAP:-
data:image/s3,"s3://crabby-images/6967b/6967b4752b853baf664a5399caa8c7679f5f9fd4" alt=""
OUTPUT OF FLATMAP:-
data:image/s3,"s3://crabby-images/5ad9d/5ad9d8ea543aa51e8ef8537aeb20e3796596f398" alt=""
Difference between MAP and FLATMAP :-
map() output is an RDD whereas flatMap() output is RDD containing elements of all iterators.
You can find the datafile and related code on my github id :-
https://github.com/sangam92/Spark_tutorials
Map can be considered as 1: 1 relationship in spark It means each element is associated with it's corresponding element.
The input and output will be RDD for this function.
FlatMap():-Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
The input and output will be RDD for this function also.
It flattens multiple list into single list.
Python Code :-
OUTPUT OF MAP:-
OUTPUT OF FLATMAP:-
Difference between MAP and FLATMAP :-
map() output is an RDD whereas flatMap() output is RDD containing elements of all iterators.
You can find the datafile and related code on my github id :-
https://github.com/sangam92/Spark_tutorials
No comments:
Post a Comment