Traditional RDBMS
is schema on write whereas your Hadoop is schema on read while
writing the data there is no checking , basically you can dump all
kinds of data, only when you want to perform the analysis you can
actually impose the schema. That's why it calls scheme on read.
Coming back to the
processing, it's gone be both interactive and batch processing is
supported for traditional RDBMS, whereas in Hadoop it’s strictly
batch processing. However, there are third party tools can sit on top
of Hadoop that can query the HDFS data to give up close to real time
interactive
processing but not
as close as what the traditional RDBMS provides.
RDBMS is strictly
used for storing structured data whereas Hadoop can store all kinds
of data structured, semi- structured and unstructured and everything.
Typically your
existing traditional RDBMS systems can start choking if data exceeds
of few terabytes . But in your Hadoop there is literally no limit. We
say it can handle data up to petabytes and can go beyond that also.
And the scaling
model when it comes to data storage it's kind of like nonlinear
meaning it's not easy to horizontally scale your traditional RDBMS
system when your data loads are increasing exponentially, whereas
your Hadoop can very smoothly expand in terms of its data storage
capability without affecting the query performance or without
affecting the data analysis performance. That is
the key feature out here. So it’s kind of like smooth linear
scalability in terms of data storage can be achieved using Hadoop.
Schema yes in your
traditional data base management system the data has to strictly
comply with schema specification you cannot change this during run
time but whereas the Hadoop is big relief here is that it is very
very accommodating to the changing schema needs.
And a Traditional
RDBMS systems are computational intensive. While in Hadoop It’s
like simple pieces which actually built together and work in large
numbers, so it is more data intensive. In the sense it is data
intensive, it's also CPU intensive or it's also computational
intensive but when compared to traditional RDBMS system, all these
computation does not happen on a single server. It happens on
multiple machines which are distributed in a cluster. That's the
major difference out here.
We have Shared file
storage in our traditions RDBMS based systems but in Hadoop we
actually move the code to the data.
We would discuss
about this part of what is the concept of moving the code to the data
when we discuss about the MapReduce Framework when we go forward.
No comments:
Post a Comment