Basic ClickHouse Configuration
Main Configuration Settings
The main ClickHouse configuration files are located in the /etc/clickhouse-server/
directory:
/etc/clickhouse-server/config.xml
- the main server configuration file./etc/clickhouse-server/users.xml
- the user configuration and access rights file.
Additionally, ClickHouse uses directories for convenient configuration management:
/etc/clickhouse-server/conf.d/
- designed for additional server configuration files. You can add files with extensions to the base server configuration to this directory to avoid modifying the mainconfig.xml
./etc/clickhouse-server/users.d/
- used for user configuration management. You can add files with user configurations to this directory, making it easier to manage access rights and user settings.
All additional configurations must be in .xml
format, just like the main configuration file.
Configuring Server Parameters
For ClickHouse to operate stably and efficiently, it's important to configure the server parameters correctly.
Logging
Logging helps monitor ClickHouse's operation and identify potential problems.
<clickhouse>
<logger>
<level>information</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>100M</size>
<count>10</count>
</logger>
</clickhouse>
The location of the main logs is determined by the <log>
(logs with all messages) and <errorlog>
(logs containing errors) parameters. The <size>
and <count>
parameters define log rotation settings, and <level>
sets the logging level.
Connection Ports
To configure network access to ClickHouse, you need to open the necessary ports and configure the firewall to restrict access. Use the following configuration to configure the main ports:
<clickhouse>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
</clickhouse>
- Port for HTTP (
<http_port>
). Port 8123 is used by default for HTTP requests. - Port for TCP (
<tcp_port>
). Port 9000 is used for TCP connections, which ClickHouse uses to work with clients and other servers.
You can also configure SSL for TCP to encrypt data transmitted between clients and the server. In this case, it is recommended to use a different port, such as 9440, for encrypted connections. By default, it is recommended to allow access to the following ports:
9000/tcp
- for the ClickHouse native TCP protocol8123/tcp
- for the ClickHouse HTTP interface
To make ports accessible to external clients, use the following commands: For Debian-based systems using ufw
:
sudo ufw allow 9000/tcp # Open port for ClickHouse TCP connections
sudo ufw allow 8123/tcp # Open port for ClickHouse HTTP interface
sudo ufw reload # Apply changes
For RHEL-based systems using firewalld
:
sudo firewall-cmd --zone=public --add-port=9000/tcp --permanent # Open port for ClickHouse TCP connections
sudo firewall-cmd --zone=public --add-port=8123/tcp --permanent # Open port for ClickHouse HTTP interface
sudo firewall-cmd --reload # Apply changes
Full list of ports used by ClickHouse:
Port | Description |
---|---|
8123 | Standard HTTP port. |
8443 | Standard HTTP SSL/TLS port. |
9000 | Native protocol port (also known as ClickHouse TCP protocol). Used by ClickHouse applications and processes such as clickhouse-server , clickhouse-client , and native ClickHouse tools. Used for inter-server communication for distributed queries. |
9004 | MySQL emulation port. |
9005 | PostgreSQL emulation port (also used for secure connection if SSL is enabled for ClickHouse). |
9009 | Inter-server communication port for low-level data access. Used for data exchange, replication, and inter-server communication. |
9010 | SSL/TLS for inter-server communication. |
9011 | PROXYv1 native protocol port. |
9019 | JDBC bridge port. |
9100 | gRPC port. |
9181 | Recommended ClickHouse Keeper port. |
9234 | Recommended ClickHouse Keeper Raft port (also used for secure communication if <secure>1</secure> is enabled). |
9363 | Standard port for Prometheus metrics. |
9281 | Recommended Secure SSL port for ClickHouse Keeper. |
9440 | SSL/TLS native protocol port. |
42000 | Standard port for Graphite. |
Data File Paths
Correctly configuring data paths helps organize the storage of data, temporary files, and user data.
An example configuration is shown below:
<clickhouse>
<path>/var/lib/clickhouse/</path>
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
<clickhouse>
- Main data storage path (
<path>
). It is recommended to use a dedicated partition or disk for data storage, especially if you are working with large volumes. - Temporary file path (
<tmp_path>
). Temporary files are often created when executing large queries or sorting data. It is recommended to specify a path to a disk with high write speed and sufficient free space.
Configuring Users and Access Rights
Proper configuration of users and access rights in ClickHouse is essential for security and managing data access.
User Configuration
Users in ClickHouse are configured using additional configuration files in the /etc/clickhouse-server/users.d
directory. Additional configuration files allow you to add new users, set passwords, and define from which networks a user can access the ClickHouse server.
Example user configuration:
<clickhouse>
<users>
<default>
<password>your_password</password>
<networks>
<ip>::/0</ip>
</networks>
<profile>default</profile>
</default>
</users>
</clickhouse>
Parameter description:
-
<default>
- user name. By default, ClickHouse has a default user with basic settings. You can create new users by adding additional blocks with user names instead of default. -
<password>
- user password. The password can be set:-
In plain text - the password is written in its original form, for example,
<password>MyStrongP@ssWoRd</password>
. -
As a hash - instead of the password itself, its hash is specified (e.g., SHA256). For this, instead of the
<password>
tag, you can use tags such as<password_sha256_hex>
or<password_double_sha1_hex>
, for example:<password_sha256_hex>c2a1f9160a14a9d2e0eced4b5cf5998e5eebc54cf45b8e3742d2a9c2b1f23368</password_sha256_hex>
-
-
<networks>
- this block defines from which IP addresses or networks the user can connect to ClickHouse.<ip>::/0</ip>
- allows access from all IP addresses. This is suitable for a test environment, but in production, it is recommended to restrict access to specific networks or IP addresses for increased security.For example, to allow access only from the local network, you can use:
<ip>192.168.0.0/24</ip>
-
<profile>
- defines limits and rights for the user, such as resource usage limits, access to databases, tables, etc.
Access Rights Configuration
Access rights in ClickHouse are managed through user profiles and the rights assigned to these profiles. You can restrict access to databases, tables, and also limit the list of operations a user can perform (e.g., SELECT, INSERT, DROP).
For example, the configuration below sets a limit on the maximum memory usage and the type of operations the user can perform - read-only operations:
<clickhouse>
<profiles>
<restricted_user>
<max_memory_usage>1000000000</max_memory_usage>
<readonly>1</readonly>
</restricted_user>
</profiles>
</clickhouse>
You can read more about profiles in the official ClickHouse documentation. After making changes, restart the ClickHouse server to apply the settings:
sudo systemctl restart clickhouse-server
Data Storage Optimization
Data Compression
ClickHouse can actively use compression, which allows for a several-fold reduction in the volume of stored data. An example configuration is shown below:
<clickhouse>
<compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>zstd</method>
</case>
</compression>
</clickhouse>
Parameter description:
<min_part_size>
- defines the minimum size of the data part (in bytes) to which the specified compression scheme will be applied. It is usually worth leaving this parameter at 10000000000 (10 GB) if you have large amounts of data. For smaller amounts of data, this parameter can be reduced. However, in most cases, it is better to leave the default value, as it provides a good balance between compression ratio and performance.<min_part_size_ratio>
- defines the minimum ratio of the data part size to the size of the entire partition at which the compression scheme will be applied. A value of 0.01 means that compression will be applied if the data part size is at least 1% of the total partition volume. This value can be left at its default, as it is well balanced for most cases.<method>
- defines the compression method to be used for the data. In this example, thezstd
(Zstandard) method is used, which provides a high compression ratio with acceptable performance.
Recommendations for using compression methods:
zstd
- a versatile and efficient method suitable for most scenarios.lz4
- used if maximum data processing speed is important with a lower compression ratio.
If you have specific compression requirements (e.g., a higher compression ratio or a trade-off in favor of speed), you can consider other methods such as deflate
or gzip
. But in most cases, zstd
is the best choice for ClickHouse.