Data is Future: Spark - countByValue

Tuesday, February 19, 2019

Spark - countByValue

Spark has a special way to count the unique values in the RDD using the countByValue() operation.

CountByValue is an action .

This action will always returns the unique values from a RDD.

We should call this function only when we have a small dataset as it will bring the complete dataset into the driver and will lead to performance issue.

Example :- We have a sample data country.txt having the name of different countries .we need to count the occurence of each country.

Output :-

1 comment:

Jackie Co KadJuly 22, 2020 at 1:27 AM

Great Article
Final Year Projects in Python

Python Training in Chennai

FInal Year Project Centers in Chennai

Python Training in Chennai

ReplyDelete
Replies

Add comment

Tuesday, February 19, 2019

Spark - countByValue

1 comment:

Delta Lake - Time Travel