Getting Started With Kafka


Before going into details of how we can configure Kafka on Windows lets first see what Kafka is all about.

What is Kafka?

Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.

Like other messaging systems it has producers writing data / messages to so called topics and has consumer reading data from these topics.But since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.Unlike other most of the messaging systems, it does not block producers.

pc_kafka

Topic and Commit Logs

A topic is means of sharing messages between producer and consumer.For each topic, Kafka cluster maintains a distributed partitioned log (an ordered set of messages). Each message in a partition is assigned a unique offset.Kafka does not keep track of which messages have been read and retain only unread messages.Rather it keeps all the messages for a specified period of time.For example if the log retention is set to two days, then message will be available for consumption for the two days, after which it will be discarded to free up space.

kafka_1

Because Kafka does not track acknowledgements and messages per consumer it can handle many thousands of consumers with very little performance impact. Kafka even handles batch consumers—processes that wake up once an hour to consume all new messages from a queue—without affecting system throughput or latency.

Normally consumer reads the data sequentially i.e. reads the data in same order in which it arrives but consumer can control the position and can read the data in any order by supplying the appropriate offset ID.

Quick start on Windows

Start ZooKeeper

Edit config\zookeeper.properties to change dataDir property.For demo purpose zookeeper.properties looks like below

dataDir=F:\\kafka\\tmp\\zookeeper

clientPort=2181

To start ZooKeeper run below command

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties

Start Kafka Broker

Make sure that in config\server.properties zookeeper.connect is set properly to connect with ZooKeeper.

Each Kafka broker coordinates with other Kafka broker using ZooKeeper.

Producer and Consumers are notified by ZooKeeper about presence of new Kafka broker or failure of any Kafka broker.Also Producers interact with ZooKeeper to identify the lead broker for the topic in cases where there are multiple brokers(usually the case in any Production environment).For HA Kafka tries to replicate the message to multiple brokers. ZooKeeper hold the responsibility of electing the lead broker.Brokers persists the topic state into the ZooKeeper so that all the brokers are in sync.

Consumers interact with ZooKeeper to get the topic state like which Kafka broker holds the message.

zookeeper.connect=localhost:2181

You can change the log.dirs property as per below

log.dirs=F:\\kafka\\tmp\\kafka-logs

To start Kafka broker run below command

.\bin\windows\kafka-server-start.bat .\config\server.properties

Create Kafka Topic

Run below command

.\bin\windows\kafka-topics.bat –create –zookeeper localhost:2181 –replication-factor 1 –partition 1 –topic tradesTopic

You can check the above created topic using below command

.\bin\windows\kafka-topics.bat –list –zookeeper localhost:2181

Java producer to connect and publish messages to Kafka Topic

In your java code you need to define following properties –

  • metadata.broker.list — this should be same as value of port defined in config\server.properties.
  • serializer.class — This class defines the encoder to serialize a message to Kafka Message.You can define your own Serializer class as well.
  • groupid —  uniquely identifies a set of consumers within the same consumer group.

For more details on different properties, please refer to below link –

http://kafka.apache.org/07/configuration.html

Follow below link to get sample Kafka producer code  from my github repository –

Sample Kafka Producer Code

Java client to consume message from Kafka client

In Java client application to connect to Kafka cluster and consume messages from Kafka topic you need to define below properties –

  • zookeeper.connect —  this should be same as value of zookeeper.connect defined in config\server.properties.
  • groupid —  uniquely identifies a set of consumers within the same consumer group.

To get sample code please refer to below link –

Sample Kafka Consumer Code

You can download sample project from my below github repository –

https://github.com/veejayendraa/kafka-basic-examples

Also I have created sample project using Kafka,Spark Streaming and Cassandra, which can be downloaded from below repository –

https://github.com/veejayendraa/tradeAnalytics

Different ways to setup Kafka Cluster

I will cover the details about these different cluster setups along with Spark Streaming + Kafka use case in some other post.Till then keep exploring the WORLD OF BIGDATA !!!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s