Configuring ClickHouse Keeper
ClickHouse Keeper is used for coordinating replication and managing metadata between nodes in a ClickHouse cluster.
Configuration is only required if your cluster consists of multiple nodes. If ClickHouse is installed on a single server, using ClickHouse Keeper is not necessary.
Configuring ClickHouse Keeper
Basic Configuration of a ClickHouse Cluster
The ClickHouse Keeper configuration file is located in the /etc/clickhouse-server/keeper_config.xml
directory. The following configuration can be used for basic setup of the main ClickHouse Keeper parameters:
<clickhouse>
<keeper_server>
<tcp_port>2181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/keeper/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/keeper/snapshots</snapshot_storage_path>
<coordination_settings>
<session_timeout_ms>30000</session_timeout_ms>
<operation_timeout_ms>10000</operation_timeout_ms>
<dead_session_check_period_ms>5000</dead_session_check_period_ms>
</coordination_settings>
</keeper_server>
</clickhouse>
After making changes to the configuration file, start the service and make sure it started successfully using the following commands:
sudo systemctl enable clickhouse-keeper
sudo systemctl start clickhouse-keeper
sudo systemctl status clickhouse-keeper
Configuration of a Distributed ClickHouse Cluster
The ClickHouse Keeper configuration file is usually located at /etc/clickhouse-server/config.d/keeper_config.xml
.
Example configuration file:
<keeper_server>
<log_storage_path>/var/lib/clickhouse/keeper/</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/keeper/</snapshot_storage_path>
<tcp_port>9181</tcp_port>
<tcp_port_secure>9443</tcp_port_secure>
<server_id>1</server_id>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>clickhousen1.domain.com</hostname>
<port>9444</port>
</server>
<server>
<id>2</id>
<hostname>clickhousen2.domain.com</hostname>
<port>9444</port>
</server>
<server>
<id>3</id>
<hostname>clickhousen3.domain.com</hostname>
<port>9444</port>
</server>
</raft_configuration>
</keeper_server>
Parameter description:
-
<log_storage_path>
and<snapshot_storage_path>
- paths for storing ClickHouse Keeper logs and snapshots. -
<tcp_port>
and<tcp_port_secure>
- ports for connections between ClickHouse Keeper nodes. -
<server_id>
- unique identifier of the server in the ClickHouse Keeper cluster.noteEach node in the cluster must be assigned a unique
server_id
to avoid conflicts. This number should be unique and correspond to the ordinal number of the node in the cluster. -
<coordination_settings>
- parameters that define the coordination behavior between nodes. -
<operation_timeout_ms>
- timeout for coordination operations, the value is specified in milliseconds.
After making changes to the configuration file, start the service and make sure it started successfully using the following commands:
sudo systemctl enable clickhouse-keeper
sudo systemctl start clickhouse-keeper
sudo systemctl status clickhouse-keeper
Integrating ClickHouse Keeper with ClickHouse
For ClickHouse nodes to start using ClickHouse Keeper, you need to configure the appropriate parameters in the ClickHouse configuration file (/etc/clickhouse-server/config.xml
) on each ClickHouse node. Add the following lines to the zookeeper
section:
<zookeeper>
<node>
<host>127.0.0.1</host> <!-- IP address or hostname of the ClickHouse Keeper server -->
<port>2181</port> <!-- Port of the ClickHouse Keeper server -->
</node>
</zookeeper>
After making changes to the configuration files, restart the ClickHouse services using the following command:
sudo systemctl restart clickhouse-server
Now ClickHouse will use ClickHouse Keeper for cluster coordination and management. You can use the following command to check that ClickHouse Keeper is working:
clickhouse-client --query "SELECT * FROM system.zookeeper WHERE path = '/'"
The command output should contain information about the ClickHouse Keeper status.
Using ClickHouse Keeper for Replication Coordination
ClickHouse Keeper is used to coordinate replication between nodes. You need to specify the parameters for connecting to ClickHouse Keeper in the configuration file. The configuration file is located at: /etc/clickhouse-keeper/keeper_config.xml
.
Example configuration:
<clickhouse>
<clickhouse_keeper>
<node index="1">
<host>keeper1</host>
<port>9181</port>
</node>
<node index="2">
<host>keeper2</host>
<port>9181</port>
</node>
<node index="3">
<host>keeper3</host>
<port>9181</port>
</node>
</clickhouse_keeper>
</clickhouse>
ClickHouse Keeper configuration parameters:
<clickhouse_keeper>
- This block in theconfig.xml
file indicates that ClickHouse should use ClickHouse Keeper for replication coordination and cluster state management.<node>
- Each node within the<clickhouse_keeper>
block represents a separate server or virtual machine running ClickHouse Keeper:index
- A unique identifier for each node in the configuration. The index helps the system distinguish between different nodes.<host>
- The hostname or IP address of the server running ClickHouse Keeper.<port>
- The port on which ClickHouse Keeper accepts connections.
Configuring Shards and Replicas
In ClickHouse, it is recommended to store the shard and replica configuration in a separate file, which is located in the /etc/clickhouse-server/conf.d
directory. An example of the configuration itself is shown below:
<clickhouse>
<remote_servers>
<cluster_1S_2R>
<shard>
<replica>
<host> ... </host> <!-- Cluster host address -->
<port>9000</port>
</replica>
<replica>
<host> ... </host> <!-- Cluster host address -->
<port>9000</port>
</replica>
</shard>
</cluster_1S_2R>
</remote_servers>
</clickhouse>
This file contains information about the cluster, including shard and replica settings. It specifies the nodes that are part of the cluster and how the data is distributed between them.
The contents of the file are based on the topology of your cluster and should include the IP addresses or hostnames of all servers in the cluster.
Each replica stores a complete copy of its shard's data. Replicas of the same shard must be located on different hosts to ensure fault tolerance. Replicas of different shards can be located on the same host, as they store different parts of the data.
Configuring Macros
To configure replicated tables on the cluster, you need to define the <macros>
section in the configuration file of each ClickHouse server. This section specifies unique values for the shard and replica of the server. These macros are used for automatic substitution in table creation commands and simplify replication setup.
<macros>
- The configuration section that defines unique identifiers for the current server. These are used when creating and configuring replicated tables in the cluster.<replica>
- The replica identifier. Unique for each replica within a shard. For example, if you have two servers with replicas of the same shard, one server will have replica=1 and the other will have replica=2.<shard>
- The identifier of the shard to which the replica belongs. One shard can contain multiple replicas, but all replicas of the same shard must have the same shard identifier, and the replicas themselves must be different.
For example, the macros for host-1
in the cluster_1S_2R
cluster would look like this:
<clickhouse>
<macros>
<replica>1</replica>
<shard>1</shard>
</macros>
</clickhouse>
Restart ClickHouse for the configuration changes to take effect:
sudo systemctl restart clickhouse-server