Wednesday, January 9, 2019

Spark - Reading JSON File

We have understood the spark capability to read the CSV file in our previous post. In this blog post , we will learn the how spark reads the JSON files and convert it into data frames.

Step 1:- We will import all the necessary packages and do the required configuration which is almost common in most of the Spark Program.

from pyspark import SparkConf,SparkContext
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("read_json")
sc = SparkContext(conf=conf)
sqlcontext = SQLContext(sc)


Step 2:- We will try to read the JSON file with the command read.json

read_json = sqlcontext.read.json('/home/hduser/sangam/employee.json')

Step 3:- We can display the records and schemas .

read_json.show()
read_json.printSchema()


Step 4:- We can even select the columns which we like to display.

read_json.select("id","age").show()

Once the JSON file is converted into dataframes , all the operation will be similar to what we did it for the dataframes.

Complete Code Snippet :-






Output :- 








The complete code file and data file is available in my GitHub repository.





3 comments:

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...