Saturday, April 28, 2018

Spark - Normal Join


RDD supports different kinds of join.We will see one by one each of them.

Normal Join :-  It outputs all the data from both the RDD based upon the common key present in them.

Example :-  Suppose RDD  'GOALS' has the list of football player with their respective number of goals and the RDD 'MATCH' contains the number of matches played by them.

RDD GOALS

Player Name             Goals Scored
Messi                             71
Ronaldo                          77
Pele                               59
Zidane                            42
Drogba                           27

RDD MATCH

Player Name             Matches Played
Messi                            163
Ronaldo                         171
Pele                              142
Zidane                             91
Roonie                           183

.
The joining happens on the basis of the Key ,in case the key is available in both the RDD then the resulting output will have their  corresponding values.

Output :-

Messi             71        163
Ronaldo          77        171
Pele               59        142
Zidane            42          91

Python Code Snippet  :-

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...