Getting Started With Kafka

Before going into details of how we can configure Kafka on Windows lets first see what Kafka is all about.

What is Kafka?

Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.

Like other messaging systems it has producers writing data / messages to so called topics and has consumer reading data from these topics.But since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.Unlike other most of the messaging systems, it does not block producers.


Topic and Commit Logs

A topic is means of sharing messages between producer and consumer.For each topic, Kafka cluster maintains a distributed partitioned log (an ordered set of messages). Each message in a partition is assigned a unique offset.Kafka does not keep track of which messages have been read and retain only unread messages.Rather it keeps all the messages for a specified period of time.For example if the log retention is set to two days, then message will be available for consumption for the two days, after which it will be discarded to free up space.


Because Kafka does not track acknowledgements and messages per consumer it can handle many thousands of consumers with very little performance impact. Kafka even handles batch consumers—processes that wake up once an hour to consume all new messages from a queue—without affecting system throughput or latency.

Normally consumer reads the data sequentially i.e. reads the data in same order in which it arrives but consumer can control the position and can read the data in any order by supplying the appropriate offset ID.

Quick start on Windows

Start ZooKeeper

Edit config\ to change dataDir property.For demo purpose looks like below



To start ZooKeeper run below command

.\bin\windows\zookeeper-server-start.bat .\config\

Start Kafka Broker

Make sure that in config\ zookeeper.connect is set properly to connect with ZooKeeper.

Each Kafka broker coordinates with other Kafka broker using ZooKeeper.

Producer and Consumers are notified by ZooKeeper about presence of new Kafka broker or failure of any Kafka broker.Also Producers interact with ZooKeeper to identify the lead broker for the topic in cases where there are multiple brokers(usually the case in any Production environment).For HA Kafka tries to replicate the message to multiple brokers. ZooKeeper hold the responsibility of electing the lead broker.Brokers persists the topic state into the ZooKeeper so that all the brokers are in sync.

Consumers interact with ZooKeeper to get the topic state like which Kafka broker holds the message.


You can change the log.dirs property as per below


To start Kafka broker run below command

.\bin\windows\kafka-server-start.bat .\config\

Create Kafka Topic

Run below command

.\bin\windows\kafka-topics.bat –create –zookeeper localhost:2181 –replication-factor 1 –partition 1 –topic tradesTopic

You can check the above created topic using below command

.\bin\windows\kafka-topics.bat –list –zookeeper localhost:2181

Java producer to connect and publish messages to Kafka Topic

In your java code you need to define following properties –

  • — this should be same as value of port defined in config\
  • serializer.class — This class defines the encoder to serialize a message to Kafka Message.You can define your own Serializer class as well.
  • groupid —  uniquely identifies a set of consumers within the same consumer group.

For more details on different properties, please refer to below link –

Follow below link to get sample Kafka producer code  from my github repository –

Sample Kafka Producer Code

Java client to consume message from Kafka client

In Java client application to connect to Kafka cluster and consume messages from Kafka topic you need to define below properties –

  • zookeeper.connect —  this should be same as value of zookeeper.connect defined in config\
  • groupid —  uniquely identifies a set of consumers within the same consumer group.

To get sample code please refer to below link –

Sample Kafka Consumer Code

You can download sample project from my below github repository –

Also I have created sample project using Kafka,Spark Streaming and Cassandra, which can be downloaded from below repository –

Different ways to setup Kafka Cluster

I will cover the details about these different cluster setups along with Spark Streaming + Kafka use case in some other post.Till then keep exploring the WORLD OF BIGDATA !!!