Apache Sqoop is an open source tool that can be used to extract data from the structured data store into Hadoop for further processing.Sqoop is one of the tools that brings the connectivity beween Hadoop and the traditional databases. Sqoop actually stands for SQL and Hadoop.
Sqoop can be used to import data from different relational databases like Oracle, My Sql , Netezza etc into the Hadoop distributed filesystem.We can do any complex calculation and keep back the data into the RDBMS system.
Why we need an integration between Hadoop and RDBMS ?
Sqoop can be used to import data from different relational databases like Oracle, My Sql , Netezza etc into the Hadoop distributed filesystem.We can do any complex calculation and keep back the data into the RDBMS system.
Why we need an integration between Hadoop and RDBMS ?
- Hadoop is mainly used to process unstructured or semi-structured data such as web server logs. We will be using MapReduce to achieve that objective. However, maybe our reference or master data such as products, customers, locations, server information etc. is stored in a relational database. So we would need to bring in the reference or the master data into Hadoop to perform more meaningful analysis.
- Suppose we have some scenario to decide if loan applications should be approved or not and this also takes a lot of time. In such cases, instead of performing these CPU intensive processing on our RDBMS, we can actually outsource that to Hadoop and then get the results back into your RDBMS. So, we could replace our ETL or RDBMS with ETL on Hadoop.
- We can use Hadoop for cheap storage ,most probably for the archived or historic data.In such cases , data is exported from RDBMS into HDFS.
No comments:
Post a Comment