Hive and Pyspark are the two components lying on the top of the Hadoop framework and will allow us to connect and fetch the data.we will start by creating a simple test table in the Hive.
1.) creating a table in Hive.
We will check the database using below command.
show databases;
This command will display all the databases available inside the Hive.We can choose any of the databases using the ‘use’ command.
2.) use databasename;
In our case , the database name is spark_hive.so the command will be
use spark_hive;
3.) Now we are in the required database.we can create our table.
4.) Once the table has been created in the Hive, we can fetch the data from the spark using the below code.
5.) We can run this code by running the spark-submit command.
spark-submit filename.py
6.) Once the process will get executed , we will get the below output on our terminal.So, we can connect and fetch the data from hive via spark through this simple set of commands.
1.) creating a table in Hive.
We will check the database using below command.
show databases;
This command will display all the databases available inside the Hive.We can choose any of the databases using the ‘use’ command.
2.) use databasename;
In our case , the database name is spark_hive.so the command will be
use spark_hive;
3.) Now we are in the required database.we can create our table.
4.) Once the table has been created in the Hive, we can fetch the data from the spark using the below code.
5.) We can run this code by running the spark-submit command.
spark-submit filename.py
6.) Once the process will get executed , we will get the below output on our terminal.So, we can connect and fetch the data from hive via spark through this simple set of commands.
No comments:
Post a Comment