Saturday, December 8, 2018

Kafka - Introduction

Apache Kafka is a data extraction tool for real-time streaming data. It is also distributed, partitioned, and a replicated system for extracting real time streaming data extraction. Kafka was developed and open sourced by LinkedIn in 2011. It is getting widely adopted by companies for efficient real time streaming of data.

Kafka is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies. It’s among the fastest growing open source projects and has spawned an immense ecosystem around it. It’s at the heart of a movement towards managing and processing streams of data.

Kafka is often compared to a couple of existing technology categories: enterprise messaging systems, big data systems like Hadoop, and data integration or ETL tools. Each of these comparisons has some validity but also falls a little short.

Kafka is like a messaging system in that it lets you publish and subscribe to streams of messages. In this way, it is similar to products like ActiveMQ, RabbitMQ, IBM’sMQSeries, and other products. But even with these similarities, Kafka has a number of core differences from traditional messaging systems that make it another kind of animal entirely.

Here are the big three differences: first, it works as a modern distributed system that runs as a cluster and can scale to handle all the applications in even the most massive of companies. Rather than running dozens of individual messaging brokers, hand wired to different apps, this lets you have a central platform that can scale elastically to handle all the streams of data in a company. 

Secondly, Kafka is a true storage system built to store data for as long as you might like. This has huge advantages in using it as a connecting layer as it provides real delivery guarantees—its data is replicated, persistent, and can be kept around as long as we like








No comments:

Post a Comment

Hadoop - What is a Job in Hadoop ?

In the field of computer science , a job just means a piece of program and the same rule applies to the Hadoop ecosystem as wel...