Apache Kafka Installation
Overview
This section describes the deployment process of a Kafka
cluster using the Kafka Raft protocol with ACL and SSL/TLS mechanisms on Ubuntu Server 20.04.
Apache Kafka — is a distributed streaming platform designed for real-time data processing and transmission. Kafka
doesn't establish direct connections between producers and consumers, nor does it validate transmitted data.
The Kafka
cluster supports both horizontal and vertical scaling, providing flexibility and resilience under increasing loads.
- Raft - a consensus algorithm ensuring data consistency in distributed systems
- Kafka RAFT -
Kafka's
implementation of Raft, enabling self-management of cluster metadata (topics, partitions, brokers)
Core Kafka
Concepts:
- Topic - the fundamental data organization unit in
Kafka
. Each topic has a unique cluster-wide name - Partition - topics are divided into partitions - ordered, immutable message logs (fixed-size files) enabling scaling and parallel processing. Consumers read partition data in the same order it was written by producers
- Offset - each message added to a partition receives a 64-bit monotonically increasing identifier assigned by the broker. Each partition has one leader replica and one or more follower replicas
- Replica -
Kafka
implements partition replication between brokers for data durability, ensuring multiple copies of the same record exist across different cluster nodes. Replica and partition counts are defined during topic creation or viaKafka
configuration - Producer - application publishing records to topics
- Consumer - application reading records from topics
- Consumer Group - consumer groups collaboratively reading records from topics
A Kafka
cluster divides nodes into two roles - controllers (Controller Nodes) and brokers (Broker Nodes).
Controllers participate in the Raft consensus. The number of controllers must always be odd.
For a test environment, one controller is sufficient; for production - at least 3 (allowing failure of one node). For high-load clusters, 5 controllers are sufficient (allowing failure of two nodes). Among controllers, there is always only one leader. The leader is selected through "voting" by controller nodes - a quorum is required (majority vote). For a cluster of 3 controllers, quorum = 2; for 5 controllers, quorum = 3. The leader manages metadata (topics, partitions, etc.). If the leader fails, the remaining nodes initiate a voting process to select a new leader. Brokers store data and handle requests.
The node role is determined by the entry process.roles=controller
or process.roles=broker
in the cluster configuration file.
It's possible to combine roles process.roles=controller,broker
- the Kafka
node will perform both roles. Strongly not recommended for production environments.
- ACL (Access Control Lists) - access control lists. ACL in
Kafka
defines which clients have rights to perform certain operations - creating/deleting topics, writing/reading from topics, etc. - SSL/TLC - encryption and authentication. SSL provides secure connection and client authentication. ACL defines which actions are permitted for the client
These mechanisms are also used in inter-broker communication.
To enable ACL in Kafka
, the following parameters must be defined in the Kafka
configuration files - controller.properties
, broker.properties
:
Parameter | Description |
---|---|
ssl.keystore.location= | Path to the keystore containing the server's certificate and private key. |
ssl.keystore.password= | Password for accessing the keystore. |
ssl.truststore.location= | Path to the truststore containing trusted root and intermediate CA certificates. |
ssl.truststore.password= | Password for accessing the truststore. |
ssl.client.auth= |
Configures mutual authentication (mTLS). Accepts the following values: |
authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer | Enables ACL functionality. |
super.users= | Specifies |
allow.everyone.if.no.acl.found=False | Defines ACL behavior when no explicit rules exist for a topic. |
ssl.principal.mapping.rules= | Maps SSL certificate CN to |
Configuration example in the article Kafka Cluster Deployment.