Skip to main content

Basic ClickHouse Configuration

Main Configuration Settings

The main ClickHouse configuration files are located in the /etc/clickhouse-server/ directory:

  • /etc/clickhouse-server/config.xml - the main server configuration file.
  • /etc/clickhouse-server/users.xml - the user configuration and access rights file.

Additionally, ClickHouse uses directories for convenient configuration management:

  • /etc/clickhouse-server/conf.d/ - designed for additional server configuration files. You can add files with extensions to the base server configuration to this directory to avoid modifying the main config.xml.
  • /etc/clickhouse-server/users.d/ - used for user configuration management. You can add files with user configurations to this directory, making it easier to manage access rights and user settings.

All additional configurations must be in .xml format, just like the main configuration file.

Configuring Server Parameters

For ClickHouse to operate stably and efficiently, it's important to configure the server parameters correctly.

Logging

Logging helps monitor ClickHouse's operation and identify potential problems.

<clickhouse>
<logger>
<level>information</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>100M</size>
<count>10</count>
</logger>
</clickhouse>

The location of the main logs is determined by the <log> (logs with all messages) and <errorlog> (logs containing errors) parameters. The <size> and <count> parameters define log rotation settings, and <level> sets the logging level.

Connection Ports

To configure network access to ClickHouse, you need to open the necessary ports and configure the firewall to restrict access. Use the following configuration to configure the main ports:

<clickhouse>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
</clickhouse>
  • Port for HTTP (<http_port>). Port 8123 is used by default for HTTP requests.
  • Port for TCP (<tcp_port>). Port 9000 is used for TCP connections, which ClickHouse uses to work with clients and other servers.

You can also configure SSL for TCP to encrypt data transmitted between clients and the server. In this case, it is recommended to use a different port, such as 9440, for encrypted connections. By default, it is recommended to allow access to the following ports:

  • 9000/tcp - for the ClickHouse native TCP protocol
  • 8123/tcp - for the ClickHouse HTTP interface

To make ports accessible to external clients, use the following commands: For Debian-based systems using ufw:

sudo ufw allow 9000/tcp  # Open port for ClickHouse TCP connections
sudo ufw allow 8123/tcp # Open port for ClickHouse HTTP interface
sudo ufw reload # Apply changes

For RHEL-based systems using firewalld:

sudo firewall-cmd --zone=public --add-port=9000/tcp --permanent  # Open port for ClickHouse TCP connections
sudo firewall-cmd --zone=public --add-port=8123/tcp --permanent # Open port for ClickHouse HTTP interface
sudo firewall-cmd --reload # Apply changes

Full list of ports used by ClickHouse:

PortDescription
8123Standard HTTP port.
8443Standard HTTP SSL/TLS port.
9000Native protocol port (also known as ClickHouse TCP protocol). Used by ClickHouse applications and processes such as clickhouse-server, clickhouse-client, and native ClickHouse tools. Used for inter-server communication for distributed queries.
9004MySQL emulation port.
9005PostgreSQL emulation port (also used for secure connection if SSL is enabled for ClickHouse).
9009Inter-server communication port for low-level data access. Used for data exchange, replication, and inter-server communication.
9010SSL/TLS for inter-server communication.
9011PROXYv1 native protocol port.
9019JDBC bridge port.
9100gRPC port.
9181Recommended ClickHouse Keeper port.
9234Recommended ClickHouse Keeper Raft port (also used for secure communication if <secure>1</secure> is enabled).
9363Standard port for Prometheus metrics.
9281Recommended Secure SSL port for ClickHouse Keeper.
9440SSL/TLS native protocol port.
42000Standard port for Graphite.

Data File Paths

Correctly configuring data paths helps organize the storage of data, temporary files, and user data.

An example configuration is shown below:

<clickhouse>
<path>/var/lib/clickhouse/</path>
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
<clickhouse>
  • Main data storage path (<path>). It is recommended to use a dedicated partition or disk for data storage, especially if you are working with large volumes.
  • Temporary file path (<tmp_path>). Temporary files are often created when executing large queries or sorting data. It is recommended to specify a path to a disk with high write speed and sufficient free space.

Configuring Users and Access Rights

Proper configuration of users and access rights in ClickHouse is essential for security and managing data access.

User Configuration

Users in ClickHouse are configured using additional configuration files in the /etc/clickhouse-server/users.d directory. Additional configuration files allow you to add new users, set passwords, and define from which networks a user can access the ClickHouse server.

Example user configuration:

<clickhouse>
<users>
<default>
<password>your_password</password>
<networks>
<ip>::/0</ip>
</networks>
<profile>default</profile>
</default>
</users>
</clickhouse>

Parameter description:

  • <default> - user name. By default, ClickHouse has a default user with basic settings. You can create new users by adding additional blocks with user names instead of default.

  • <password> - user password. The password can be set:

    • In plain text - the password is written in its original form, for example, <password>MyStrongP@ssWoRd</password>.

    • As a hash - instead of the password itself, its hash is specified (e.g., SHA256). For this, instead of the <password> tag, you can use tags such as <password_sha256_hex> or <password_double_sha1_hex>, for example:

      <password_sha256_hex>c2a1f9160a14a9d2e0eced4b5cf5998e5eebc54cf45b8e3742d2a9c2b1f23368</password_sha256_hex>
  • <networks> - this block defines from which IP addresses or networks the user can connect to ClickHouse. <ip>::/0</ip> - allows access from all IP addresses. This is suitable for a test environment, but in production, it is recommended to restrict access to specific networks or IP addresses for increased security.

    For example, to allow access only from the local network, you can use:

    <ip>192.168.0.0/24</ip>
  • <profile> - defines limits and rights for the user, such as resource usage limits, access to databases, tables, etc.

Access Rights Configuration

Access rights in ClickHouse are managed through user profiles and the rights assigned to these profiles. You can restrict access to databases, tables, and also limit the list of operations a user can perform (e.g., SELECT, INSERT, DROP).

For example, the configuration below sets a limit on the maximum memory usage and the type of operations the user can perform - read-only operations:

<clickhouse>
<profiles>
<restricted_user>
<max_memory_usage>1000000000</max_memory_usage>
<readonly>1</readonly>
</restricted_user>
</profiles>
</clickhouse>

You can read more about profiles in the official ClickHouse documentation. After making changes, restart the ClickHouse server to apply the settings:

sudo systemctl restart clickhouse-server

Data Storage Optimization

Data Compression

ClickHouse can actively use compression, which allows for a several-fold reduction in the volume of stored data. An example configuration is shown below:

<clickhouse>
<compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>zstd</method>
</case>
</compression>
</clickhouse>

Parameter description:

  • <min_part_size> - defines the minimum size of the data part (in bytes) to which the specified compression scheme will be applied. It is usually worth leaving this parameter at 10000000000 (10 GB) if you have large amounts of data. For smaller amounts of data, this parameter can be reduced. However, in most cases, it is better to leave the default value, as it provides a good balance between compression ratio and performance.
  • <min_part_size_ratio> - defines the minimum ratio of the data part size to the size of the entire partition at which the compression scheme will be applied. A value of 0.01 means that compression will be applied if the data part size is at least 1% of the total partition volume. This value can be left at its default, as it is well balanced for most cases.
  • <method> - defines the compression method to be used for the data. In this example, the zstd (Zstandard) method is used, which provides a high compression ratio with acceptable performance.

Recommendations for using compression methods:

  • zstd - a versatile and efficient method suitable for most scenarios.
  • lz4 - used if maximum data processing speed is important with a lower compression ratio.

If you have specific compression requirements (e.g., a higher compression ratio or a trade-off in favor of speed), you can consider other methods such as deflate or gzip. But in most cases, zstd is the best choice for ClickHouse.