Skip to main content

Operating System Update

Abbreviations and variables used in the article

  1. OS - operating system, e.g., Astra Linux
  2. DB - database
  3. USER_ADMIN - username with sufficient privileges, often admin
  4. SERVER_HOST - IP address or domain name of one of the OpenSearch cluster servers (not necessarily the one being worked on)
  5. PATH_SSL - directory where certificates are located, usually /app/opensearch/config

Updating the OS on the server

SAF components, in most cases, do not affect system packages, so updating the OS should not affect the product's functionality. The general approach to updating the OS can be broken down into the following steps and is best performed sequentially on each server where SAF components are installed:

  1. Ensure that the updated OS is on the supported list for SAF.
  2. Review the list of changes during the update.
  3. When working with OpenSearch nodes, disable allocation before starting work.
  4. Stop SAF components.
  5. Update the OS and reboot the server.
  6. Start SAF components, verify functionality, and enable allocation.

It is recommended to test the update on a test server. Alternatively, you can update the OS on the self-monitoring server (selfmon) - this server usually includes almost all SAF components used on the main servers.

The recommended order for updating nodes is as follows:

  1. OpenSearch cluster nodes with the Data role and cold data distribution type.
  2. OpenSearch cluster nodes with the Data role and warm/hot data distribution type.
  3. OpenSearch cluster nodes with the Master role and other roles.
  4. OpenSearch Dashboards servers.
  5. Logstash servers.
  6. Other SAF servers.

More details on the update steps can be found below.

Supported OS List

In most cases, SAF will work on popular operating systems, especially if the OS update does not involve a major version, so updating the OS does not often affect functionality.

When updating major OS versions, it is worth checking some SAF components more carefully, in particular, ElasticBeats.

For example, FileBeat 7.10.12 does not support Ubuntu Server 22.04, but it works stably on version 20.04. More details can be found on the official documentation support matrix page.

List of Changes During Update

The main components of the SAF system, in most cases, do not use system packages. Below is a list of things to pay attention to:

  1. Pay attention to the orchestration (management, Ansible) server. In particular, how Ansible itself is installed. When installing from system packages, some Ansible playbooks may stop working when updating versions.
  2. When using Ansible Semaphore on the Ansible server, you may need to check the docker container (if used), as well as the database used, for example, the PostgreSQL version.
  3. Data collection servers may use scripts that utilize system utilities, such as Python. If the update is from Python 2.7 to Python 3.6, the scripts will most likely not work.
  4. Changes in security policies and security utilities. For example, it is worth carefully reviewing changes in SELinux, ACL, and the firewall.

Actions on the OpenSearch Cluster

To disable allocation on the cluster, run the following command in the developer console (Navigation Menu -> Settings -> Dev Console) of the SAF web interface:

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "none"
}
}

Alternatively, you can execute the request from the command line:

curl -XPUT -k -u $USER_ADMIN https://$SERVER_HOST:9200/_cluster/settings?pretty \
-H "Content-Type: application/json" \
-d '{"persistent":{"cluster.routing.allocation.enable": "none"}}'

You can omit the user and key -u, but then you need to specify the super admin certificate and key. The command will look like this:

curl -XPUT -k  https://$SERVER_HOST:9200/_cluster/settings?pretty \
-H "Content-Type: application/json" \
-d '{"persistent":{"cluster.routing.allocation.enable": "none"}}' \
--cacert $PATH_SSL/ca-cert.pem \
--cert $PATH_SSL/admin-cert.pem \
--key $PATH_SSL/admin-key.pem

To enable allocation (enable after connecting the node to the cluster), execute the command above, but replace none with all, for example, from the developer console (Navigation Menu -> Settings -> Dev Console) of the web interface:

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}

Alternatively, you can execute the request from the command line:

curl -XPUT -k -u $USER_ADMIN https://$SERVER_HOST:9200/_cluster/settings?pretty \
-H "Content-Type: application/json" \
-d '{"persistent":{"cluster.routing.allocation.enable": "all"}}'

To check the cluster health, you can use the following command in the developer console:

GET _cluster/health

Alternatively, from the command line:

curl -k -u $USER_ADMIN https://$SERVER_HOST:9200/_cluster/health?pretty

To check the list of nodes and their roles, you can use the following command:

GET _cat/nodes

From the command line:

curl -XPUT -k -u $USER_ADMIN https://$SERVER_HOST:9200/_cat/nodes

To view OpenSearch node parameters, in particular the data distribution type (routing_mode), use the following command:

GET _cat/nodeattrs

From the command line:

curl -XPUT -k -u $USER_ADMIN https://$SERVER_HOST:9200/_cat/nodeattrs

Disabling Services Before Updating the OS

The main SAF services are managed using systemctl:

systemctl stop opensearch
systemctl stop opensearch-dashboards
systemctl stop logstash
systemctl stop safBeatManager
systemctl stop safBeat