Apache
Kafka is a data extraction tool for real-time streaming data. It is
also distributed, partitioned, and a replicated system for extracting
real time streaming data extraction. Kafka was developed and open
sourced by LinkedIn in 2011. It is getting widely adopted by
companies for efficient real time streaming of data.
Kafka
is being used by tens of thousands of organizations, including
over a third of the Fortune 500 companies. It’s among the fastest
growing open source projects and has spawned an immense ecosystem
around it. It’s at the heart of a movement towards managing and
processing streams of data.
Kafka
is often compared to a couple of existing technology
categories: enterprise messaging systems, big data systems like
Hadoop, and data integration or ETL tools. Each of these comparisons
has some validity but also falls a little short.
Kafka
is like a messaging system in that it lets you publish and subscribe
to streams of messages. In this way, it is similar to products like
ActiveMQ, RabbitMQ, IBM’sMQSeries, and other
products. But even with these similarities, Kafka has a number of
core differences from traditional messaging systems that make it
another kind of animal entirely.
Here
are the big three differences: first, it works as a modern
distributed system that runs as a cluster and can scale to handle all
the applications in even the most massive of companies. Rather than
running dozens of individual messaging brokers, hand wired to
different apps, this lets you have a central platform that can scale
elastically to handle all the streams of data in a company.
Secondly,
Kafka is a true storage system built to store data for as long as you
might like. This has huge advantages in using it as a connecting
layer as it provides real delivery guarantees—its data is
replicated, persistent, and can be kept around as long as we like
No comments:
Post a Comment