Data is Future: Spark

Saturday, April 28, 2018

Spark - Normal Join

RDD supports different kinds of join.We will see one by one each of them.

Normal Join :- It outputs all the data from both the RDD based upon the common key present in them.

Example :- Suppose RDD 'GOALS' has the list of football player with their respective number of goals and the RDD 'MATCH' contains the number of matches played by them.

RDD GOALS

Player Name             Goals Scored
Messi         71
Ronaldo        77
Pele                               59
Zidane            42
Drogba                           27

RDD MATCH

Player Name             Matches Played
Messi            163
Ronaldo                         171
Pele                              142
Zidane                             91
Roonie                           183

.
The joining happens on the basis of the Key ,in case the key is available in both the RDD then the resulting output will have their corresponding values.

Output :-

Messi             71        163
Ronaldo          77        171
Pele               59        142
Zidane            42          91

Python Code Snippet :-

Data is Future

Saturday, April 28, 2018

Spark - Normal Join

No comments:

Post a Comment

Delta Lake - Time Travel