Showing posts with label KafkaAndSpark. Show all posts
Showing posts with label KafkaAndSpark. Show all posts

Saturday, February 3, 2018

Important Kafka Commands

Important Kafka Commands

MindTelligent has developed an open source Kafka Administration Framework. This framework monitors the health of Kafka And Zookeeper nodes. It also ships a bunch of important Kafka commands.


These commands can be executed from $KAFKA_HOME/bin directory


Command to Set the Kafka Topic retention Period for 10 days:


./kafka-topics.sh --zookeeper localhost:2181 --alter --topic mindtelligent_topic  --config retention.ms=864000000

Command to Set Kafka Kafka partitions to 6 :


./kafka-topics.sh --zookeeper localhost:2181 --alter --topic mindtelligent_topic --partitions 6

Command to View offsets for the Kafka Consumer Group and instances for a consumer group:


./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group mindtelligent_topic_group


Command to list all Kafka Consumer Groups across all topics:

./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list


Command to set the Kafka Offset to earliest:


./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group mindtelligent_topic_group --reset-offsets --to-earliest --topic mindtelligent_topic --execute


Command to set the Kafka Offset to Latest:


./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group mindtelligent_topic_group --reset-offsets --to-latest --topic mindtelligent_topic --execute

List Kafka Topic:


./kafka-topics.sh --zookeeper localhost:2181 --list



Describe a Kafka Topic:


./kafka-topics.sh --zookeeper localhost:2181 --describe --topic mindtelligent_topic

Purge a Kafka Topic:


./kafka-topics.sh --zookeeper localhost:2181 --alter --topic mindtelligent_topic--config retention.ms=1000

Delete a Kafka Topic: 

./kafka-topics.sh --zookeeper localhost:2181 --delete --topic mindtelligent_topic

Get Number of Messages in a Kafka Toipc:


./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic mindtelligent_topic--time -1 --offsets 1 | awk -F ":" '{sum += $3} END {print sum}'


Get the earliest offset still in a topic:

./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic mindtelligent_topic  --time -2

Get the latest offset still in a topic:

./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic mindtelligent_topic --time -1




Monday, October 16, 2017

Apache Kafka and Apache Spark: A "Data Science Match" made in heaven.

Kafka is a publish-subscribe messaging system that provides a reliable Spark Streaming
source. The Kafka project introduced a new consumer API between versions 0.8 and 0.10, so there are 2 separate corresponding Spark Streaming packages available. The API provides one-to-one mapping between Kafka's partition and the DStream generated RDDs partition along with access to metadata and offset.


The following diagram shows end-to-end integration with Kafka, consuming messages from it, doing simple to complex windowing ETL, and pushing the desired output to various sinks such as memory, console, file, databases, and back to Kafka itself.


An overview of what our end-to-end integration will look like.


Following set of properties will need to be added to Spark Streaming API to integrate Kafka with Spark as a Source

bootstrap.servers: This describes the host and port of Kafka server(s) separated by a comma.

key.deserializer: This is the name of the class to deserialize the key of the messages from Kafka.

value.deserializer: This refers to the class that deserializes the value of the message.

group.id: This uniquely identifies the group of consumer.

auto.offset.reset: This is used messages are consumed from a topic in Kafka, but does not have initial offset in Kafka or if the current offset does not exist anymore on the server then one of the following options helps.








Amazon Sagemaker Studio

Amazon SageMaker Studio is an integrated development environment (IDE) for machine learning that provides everything data scientists and dev...