Sunday, December 23, 2018

Spark - DataFrames 2


Spark DataFrames 2

In the previous blog , we have gone through the basic of dataframes and also created a dataframe from the sample test.csv file.In this post, we go ahead and will see how to do different operations on dataframes.

We will see the complete program and find out the different operations that can be done on the dataframes.

We will start the program by reading the csv file and try to display the count of  rows in the dataframes.


After the submitting the spark jobs , we will get the below output.






  Display the number of the columns and their name.


describe operation is use to calculate the summary statistics of numerical column(s) in DataFrame. If we don’t specify the name of columns it will calculate summary statistics for all numerical columns present in DataFrame.

 

 

     









Selecting specific columns in the dataframes .

The specific columns in a dataframe can be selected by invoking the dataframe and specifying the required columns.

 

 


         







  Displaying the statistics of a specific column.
   



No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...