A path through the world of Event Streaming — Kafka part 1

Apache Kafka is a distributed data store optimized for ingesting and processing streaming data in real-time. Streaming data is data that is continuously generated by thousands of data sources, which typically send the data records in simultaneously. A streaming platform needs to handle this constant influx of data, and process the data sequentially and incrementally.

In Short, Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you to pass messages from one end-point to another. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss.

  1. A topic is identified within a kafka server uniquely by its name.
  2. We can have as many topics as we want.
  3. Topic is split into partitions.
  1. A topic can have many partitions.
  2. Within each partition each message gets an number ( in incremental order) known as offset.
  3. Each data inside the partition will be ordered as per insertion.
  4. The offset will be always in incremental order and the already used number is never reused.
  5. Offsets have meaning only within the specific partition. Offset 1 of partition 2 is entirely different from Offset 1 of partition 0.
  6. Order is guaranteed within each partition but we can’t ensure order across partition.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store