Tuesday, August 6, 2019

Spark - Basic Statistics


We have already gone through the tutorial on Measure of Central Tendency.Now we will do it’s implementation in Pyspark.We need to import statistics module from pyspark.mlib.stat.


Once the spark job is submitted , we will get the below output as the result.

The below code is available in my Github library :- https://github.com/sangam92/Spark_tutorials

No comments:

Post a Comment

Delta Lake - Time Travel

  Time Travel allows you to query, restore, or compare data from a previous version of a Delta table. Delta Lake automatically keeps tra...