Skip to main content
Version: 5.1

Apache Kafka Installation

Overview

This section describes the deployment process of a Kafka cluster using the Kafka Raft protocol with ACL and SSL/TLS mechanisms on Ubuntu Server 20.04.

Apache Kafka — is a distributed streaming platform designed for real-time data processing and transmission. Kafka doesn't establish direct connections between producers and consumers, nor does it validate transmitted data. The Kafka cluster supports both horizontal and vertical scaling, providing flexibility and resilience under increasing loads.

  • Raft - a consensus algorithm ensuring data consistency in distributed systems
  • Kafka RAFT - Kafka's implementation of Raft, enabling self-management of cluster metadata (topics, partitions, brokers)

Core Kafka Concepts:

  • Topic - the fundamental data organization unit in Kafka. Each topic has a unique cluster-wide name
  • Partition - topics are divided into partitions - ordered, immutable message logs (fixed-size files) enabling scaling and parallel processing. Consumers read partition data in the same order it was written by producers
  • Offset - each message added to a partition receives a 64-bit monotonically increasing identifier assigned by the broker. Each partition has one leader replica and one or more follower replicas
  • Replica - Kafka implements partition replication between brokers for data durability, ensuring multiple copies of the same record exist across different cluster nodes. Replica and partition counts are defined during topic creation or via Kafka configuration
  • Producer - application publishing records to topics
  • Consumer - application reading records from topics
  • Consumer Group - consumer groups collaboratively reading records from topics

A Kafka cluster divides nodes into two roles - controllers (Controller Nodes) and brokers (Broker Nodes). Controllers participate in the Raft consensus. The number of controllers must always be odd.

For a test environment, one controller is sufficient; for production - at least 3 (allowing failure of one node). For high-load clusters, 5 controllers are sufficient (allowing failure of two nodes). Among controllers, there is always only one leader. The leader is selected through "voting" by controller nodes - a quorum is required (majority vote). For a cluster of 3 controllers, quorum = 2; for 5 controllers, quorum = 3. The leader manages metadata (topics, partitions, etc.). If the leader fails, the remaining nodes initiate a voting process to select a new leader. Brokers store data and handle requests.

The node role is determined by the entry process.roles=controller or process.roles=broker in the cluster configuration file. It's possible to combine roles process.roles=controller,broker - the Kafka node will perform both roles. Strongly not recommended for production environments.

  • ACL (Access Control Lists) - access control lists. ACL in Kafka defines which clients have rights to perform certain operations - creating/deleting topics, writing/reading from topics, etc.
  • SSL/TLC - encryption and authentication. SSL provides secure connection and client authentication. ACL defines which actions are permitted for the client

These mechanisms are also used in inter-broker communication.

To enable ACL in Kafka, the following parameters must be defined in the Kafka configuration files - controller.properties, broker.properties:

ParameterDescription
ssl.keystore.location=

    Path to the keystore containing the server's certificate and private key.

ssl.keystore.password=

    Password for accessing the keystore.

ssl.truststore.location=

    Path to the truststore containing trusted root and intermediate CA certificates.

ssl.truststore.password=

    Password for accessing the truststore.

ssl.client.auth=

    Configures mutual authentication (mTLS). Accepts the following values:

    • none - clients are not required to provide certificates
    • requested - server requests client certificate but maintains connection if not provided
    • required - client must provide certificate or connection is terminated
authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer

    Enables ACL functionality.

super.users=

    Specifies Kafka administrator users who can perform any cluster operations regardless of ACL settings.

allow.everyone.if.no.acl.found=False

    Defines ACL behavior when no explicit rules exist for a topic. True:allows all users access if no ACL rules exist. False: denies all access unless explicit ACL rules exist. Doesn't affect users listed in super.user.

ssl.principal.mapping.rules=

    Maps SSL certificate CN to Kafka, usernames for authorization and ACL rule enforcement.

Configuration example in the article Kafka Cluster Deployment.