Monday, October 29, 2018

Spark - Reading data from hive via pyspark

Hive and Pyspark are the two components lying on the top of the  Hadoop framework and will allow us to connect and fetch the data.we will start by creating a simple test table in the Hive.

1.) creating a table in Hive.

We will check the database using below command.

       show  databases;



This command will display all the databases available inside the Hive.We can choose any of the databases using the ‘use’ command.

2.) use  databasename;

In our case , the database name is spark_hive.so the command will be

       use spark_hive;



3.) Now we are in the required database.we can create our table.



4.) Once the table has been created in the Hive, we can fetch the data from the spark using the below code.




5.) We can run this code by running the spark-submit command.

              spark-submit filename.py

6.) Once the process will get executed , we will get the below output on our terminal.So, we can connect and fetch the data from hive via spark through this simple set of commands.



No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...