Data is Future: Sqoop - Architecture

Sunday, September 9, 2018

Sqoop - Architecture

Sqoop takes the data directly from the RDBMS system and dumps into HDFS and also do the vice versa.The client submits the request which is internally divided into several sub tasks.
Map Task is the sub task, which imports part of data to the Hadoop Ecosystem. Collectively, all Map tasks imports the whole data.The data is divided into the chunk to provide the high performance and parallelism.

The Hadoop exports also works in similar fashion that of import.In export, the data from HDFS is moved into the RDBMS system.When we submit our Job, it is mapped into Map Tasks which brings the chunk of data from HDFS. These chunks are exported to a structured data destination. Combining all these exported chunks of data, we receive the whole data at the destination, which in most of the cases is an RDBMS (MYSQL/Oracle/SQL Server).

Normally,Reduce phase is required , however in this case the we are not doing any kind of aggregation .Hence , it is not recommended to use the reducer in this phase.Map job launch multiple mappers depending on the number defined by user.

We should note that every mapper creates a separate connection with the RDBMS using JDBC and the idea is to improve the performance to a larger extent.

Data is Future

Sunday, September 9, 2018

Sqoop - Architecture

No comments:

Post a Comment

Delta Lake - Time Travel