In
my last blog,
we have gone through the dataframes and some of their operations.we
should note that we need to save the dataframes for further
operations.
In
this blog post , we will understand how
to save the dataframes (Generic/Manual) form.
Generic
Load/Save function
The
default data format that is used in loading and saving the data
source is paraquet.we can save the dataframe in parquet format by
giving the dataframe name. Let us check the code for the same.
#saving
the dataframes in the default location
read_file.select("name","age").write.save("dataframe_save.parquet",format=”parquet”)
We
can change the default settings
and can save the dataframes
in other format like csv.
Let
us start with our previous code that we have written for spark dataframes2. The code is available in my github repository .
Code
Snippet :
from
pyspark import SparkConf,SparkContext
from
pyspark.sql import SQLContext
conf=SparkConf().setAppName("dataframe")
sc=SparkContext(conf=conf)
sqlcontext=
SQLContext(sc)
read_file=sqlcontext.read.csv('/home/hduser/sangam/test.csv',header='true')
read_file.show()
print("The
number of rows in the file are ",read_file.count())
read_file.head(2)
#Below
command describe the no of columns in the dataframe and the
respective columns.
print("no
of columns and name of the columns",len(read_file.columns),read_file.columns)
#provides
the complete statistics of the numerical columns available in the
dataframe
read_file.describe().show()
#Provides
the statistics of a particular column
read_file.describe('salary').show()
#Select
specific column from the dataframe
read_file.select('salary','age').show()
#saving
the dataframes in the default location
read_file.select("name","age").write.save("dataframe_save.csv",format="csv")
After
submitting the code , we can get the output in our default location
in a directory called dataframe_save.csv
once
we enter
the directory , we will get the file
“part-00000-e90b751f-b7b9-4093-93b0-b014ef2012a8.csv”
All
the related code is available in my github repository
:-https://github.com/sangam92/Spark_tutorials
No comments:
Post a Comment