Saturday, May 9, 2020

Big Data - Hadoop vs RDBMS









Traditional RDBMS is schema on write whereas your Hadoop is schema on read while writing the data there is no checking , basically you can dump all kinds of data, only when you want to perform the analysis you can actually impose the schema. That's why it calls scheme on read.
Coming back to the processing, it's gone be both interactive  and batch processing is supported for traditional RDBMS, whereas in Hadoop it’s strictly batch processing. However, there are third party tools can sit on top of Hadoop that can query the HDFS data to give up close to real time interactive
processing but not as close as what the traditional RDBMS provides.

RDBMS is strictly used for storing structured data whereas Hadoop can store all kinds of data structured, semi- structured and unstructured and everything.

Typically your existing traditional RDBMS systems can start choking if data exceeds of few terabytes . But in your Hadoop there is literally no limit. We say it can handle data up to petabytes and can go beyond that also.

And the scaling model when it comes to data storage it's kind of like nonlinear meaning it's not easy to horizontally scale your traditional RDBMS system when your data loads are increasing exponentially, whereas your Hadoop can very smoothly expand in terms of its data storage capability without affecting the query performance or without affecting the data analysis performance. That is the key feature out here. So it’s kind of like smooth linear scalability in terms of data storage can be achieved using Hadoop.

Schema yes in your traditional data base management system the data has to strictly comply with schema specification you cannot change this during run time but whereas the Hadoop is big relief here is that it is very very accommodating to the changing schema needs.
And a Traditional RDBMS systems are computational intensive. While in Hadoop It’s like simple pieces which actually built together and work in large numbers, so it is more data intensive. In the sense it is data intensive, it's also CPU intensive or it's also computational intensive but when compared to traditional RDBMS system, all these computation does not happen on a single server. It happens on multiple machines which are distributed in a cluster. That's the major difference out here.

We have Shared file storage in our traditions RDBMS based systems but in Hadoop we actually move the code to the data.

We would discuss about this part of what is the concept of moving the code to the data when we discuss about the MapReduce Framework when we go forward.

No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...