Skip to main content

Excluding node with data role

To exclude a node from the cluster, follow these steps:

Determining the node ID

You need to obtain the ID of the node that is planned for exclusion. To do this, execute the command below:

GET _cat/nodes?v&full_id&h=name,id,ip

Excluding the node from allocation

Using the node ID, you need to exclude the node from allocation with the following command:

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._id": ["<node_id>"]
}
}
Please note!

In OpenSearch version 2.13, a bug was discovered where the command above would execute but nothing would happen. When specifying multiple nodes for exclusion, list their IDs in a single line, like this:

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._id": "<node_id_1>,<node_id_2>,<node_id_3>"
}
}

where <node_id> is the identifier of the node to be excluded.

Waiting for shard migration

After excluding the node from allocation, the shard migration process will begin. To check if the process has completed, you can use the command below (look at the relocating_shards parameter, which should be 0):

GET _cluster/health

Checking for allocation errors

After migrating shards from the excluded node, it is necessary to check for allocation errors—whether all shards have correctly migrated and initialized. You can check this using the command:

GET _cluster/allocation/explain

If allocation errors are present

To correctly exclude the node, it is necessary to fix all allocation errors. Errors can range from incorrect index settings (often related to the number of replicas) to shard corruption or all found shard copies being outdated.

After executing the command provided above, a JSON object with error information will be displayed (look at allocate_explanation and node_allocation_decisions.deciders.explanation).

Checking node settings

It is necessary to check the list of plugins on the node being removed and ensure that disconnecting the node from the cluster will not affect the functionality of Search Anywhere Framework.

If the configuration file opensearch.yml on the node to be excluded contains the setting node_with_sme: true, then you need to check the other nodes with the data role (hot and warm, this parameter is not configured on cold nodes) to see if this setting is present. If it is not, you need to configure it on all data nodes.

You can check where each plugin is installed using the command:

GET _cat/plugins

Disabling allocation

After checking the settings of the node being removed, it is necessary to disable cluster allocation to avoid damaging or losing data:

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "none"
}
}

Stopping the service and backing up the configuration of the excluded node

After disabling cluster allocation, it is safe to stop the OpenSearch process on the excluded node:

systemctl stop opensearch

For safety, it is necessary to back up the directory containing the OpenSearch configurations:

sudo cp -r /app/opensearch/config /app/os_conf_backup

Enabling cluster allocation

It is necessary to re-enable cluster allocation:

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}


And then wait for the cluster to reach the green status:

GET _cluster/health

Clearing exclusions

After the cluster becomes green, you need to clear the list of excluded node IDs:

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._id": null
}
}