Apache Kafka

From air
Jump to navigation Jump to search

Apache Kafka is Publish-Subscribe messaging rethought as a distributed commit log.

http://kafka.apache.org/

Clients en Perl, Python, Node.js, C, C++, Scala ...: https://cwiki.apache.org/confluence/display/KAFKA/Clients

First steps with Kafka

see Quickstart

cd kafka

Launch Zookeeper

./bin/zookeeper-server-start.sh ./config/zookeeper.properties

Launch Kafka server

./bin/kafka-server-start.sh ./config/server.properties

Create a topic

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Launch Kafka console producer

./bin/kafka-console-producer.sh --broker-list localhost:2181 --topic test

Launch Kafka console consumer

./bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test

Info on topic

./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test

Replicated topics

# config/server-1.properties:
    broker.id=1
    port=9093
    log.dir=/tmp/kafka-logs-1
# config/server-2.properties:
    broker.id=2
    port=9094
    log.dir=/tmp/kafka-logs-2

Launch extra servers

./bin/kafka-server-start.sh config/server-1.properties &
./bin/kafka-server-start.sh config/server-2.properties &

Create a replicated topic

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic


Launch Kafka console producer

./bin/kafka-console-producer.sh --broker-list localhost:2181 --topic my-replicated-topic

Launch Kafka console consumer

./bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic my-replicated-topic

Produce and Consume messages with Node.js

TODO

Download master zipfile from https://github.com/SOHU-Co/kafka-node/ (or git clone https://github.com/SOHU-Co/kafka-node.git)

npm install kafka-node-master
cd kafka-node-master

cd example
node topics.js
node producer.js
node consumer.js

Voir InfluxDB#Apache_Kafka_to_InfluxDB pour archiver les messages reçus dans une base InfluxDB.

Produce and Consume messages with Python

TODO

Stop all

Stop all

./bin/kafka-server-stop.sh
./bin/zookeeper-server-stop.sh

Extra

Launch Zookeeper shell

./bin/zookeeper-shell.sh localhost:2181


Un peu plus

Livre

livre OReilly “Kafka : The Definitive Guide”, https://www.confluent.io/resources/kafka-definitive-guide-preview-edition/

Kafka UI

http://docs.datamountaineer.com/en/latest/ui.html#install


Kafka REST Proxy

provides a RESTful interface to a Kafka cluster. The API is not documented with Swagger (ie OpenAPI).

Kafka Connect

https://drive.google.com/file/d/0B_0n2CoDWpWQbkVsSUZ2SC1aQkk/view?usp=sharing

permet de connecter Kafka à des sources et puits d'info : File, MySQL, ELK, HDFS, …

it provides out-of-the-box features like configuration management, offset storage, parallelization, error handling, support for different data types, and standard management REST APIs. (Chapitre 7 du livre OReilly “Kafka : The Definitive Guide”)

Confluent recense et liste des connectors open-source et commerciaux : https://www.confluent.io/product/connectors/

Stream Reactor : Un gros projet opensource de connecteurs (MQTT, InfluxDB, Azure DocumentDB, MongoDB, Blockchain…) écrits en Scala. Utilise un DSL de requêtage http://docs.datamountaineer.com/en/latest/kcql.html#kcql

Cluster Replication with Kafka

Kafka Streams

Canevas de Event Stream Processing basé sur Kafka et Kafka Connect

Schema Registry

http://docs.confluent.io/current/schema-registry/docs/intro.html

Apache Avro est le serialisateur de prédilection de l’écosystême Kafka.

Confluent propose un registry des schemas Avro utilisés.

Le Schema Registry est en dehors du projet Apache Kafka.