Apache Kafka Installation
Overview
This section describes the deployment process of a Kafka cluster using the Kafka Raft protocol with ACL and SSL/TLS mechanisms on Ubuntu Server 20.04.
Apache Kafka — is a distributed streaming platform designed for real-time data processing and transmission. Kafka doesn't establish direct connections between producers and consumers, nor does it validate transmitted data.
The Kafka cluster supports both horizontal and vertical scaling, providing flexibility and resilience under increasing loads.
- Raft - a consensus algorithm ensuring data consistency in distributed systems
- Kafka RAFT -
Kafka'simplementation of Raft, enabling self-management of cluster metadata (topics, partitions, brokers)
Core Kafka Concepts:
- Topic - the fundamental data organization unit in
Kafka. Each topic has a unique cluster-wide name - Partition - topics are divided into partitions - ordered, immutable message logs (fixed-size files) enabling scaling and parallel processing. Consumers read partition data in the same order it was written by producers
- Offset - each message added to a partition receives a 64-bit monotonically increasing identifier assigned by the broker. Each partition has one leader replica and one or more follower replicas
- Replica -
Kafkaimplements partition replication between brokers for data durability, ensuring multiple copies of the same record exist across different cluster nodes. Replica and partition counts are defined during topic creation or viaKafkaconfiguration - Producer - application publishing records to topics
- Consumer - application reading records from topics
- Consumer Group - consumer groups collaboratively reading records from topics
A Kafka cluster divides nodes into two roles - controllers (Controller Nodes) and brokers (Broker Nodes).
Controllers participate in the Raft consensus. The number of controllers must always be odd.
For a test environment, one controller is sufficient; for production - at least 3 (allowing failure of one node). For high-load clusters, 5 controllers are sufficient (allowing failure of two nodes). Among controllers, there is always only one leader. The leader is selected through "voting" by controller nodes - a quorum is required (majority vote). For a cluster of 3 controllers, quorum = 2; for 5 controllers, quorum = 3. The leader manages metadata (topics, partitions, etc.). If the leader fails, the remaining nodes initiate a voting process to select a new leader. Brokers store data and handle requests.
The node role is determined by the entry process.roles=controller or process.roles=broker in the cluster configuration file.
It's possible to combine roles process.roles=controller,broker - the Kafka node will perform both roles. Strongly not recommended for production environments.
- ACL (Access Control Lists) - access control lists. ACL in
Kafkadefines which clients have rights to perform certain operations - creating/deleting topics, writing/reading from topics, etc. - SSL/TLC - encryption and authentication. SSL provides secure connection and client authentication. ACL defines which actions are permitted for the client
These mechanisms are also used in inter-broker communication.
To enable ACL in Kafka, the following parameters must be defined in the Kafka configuration files - controller.properties, broker.properties:
| Parameter | Description |
|---|---|
ssl.keystore.location= | Path to the keystore containing the server's certificate and private key. |
ssl.keystore.password= | Password for accessing the keystore. |
ssl.truststore.location= | Path to the truststore containing trusted root and intermediate CA certificates. |
ssl.truststore.password= | Password for accessing the truststore. |
ssl.client.auth= |
Configures mutual authentication (mTLS). Accepts the following values: |
authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer | Enables ACL functionality. |
super.users= | Specifies |
allow.everyone.if.no.acl.found=False | Defines ACL behavior when no explicit rules exist for a topic. |
ssl.principal.mapping.rules= | Maps SSL certificate CN to |
Configuration example in the article Kafka Cluster Deployment.