Excluding node with data
role
To exclude a node from the cluster, follow these steps:
Determining the node ID
You need to obtain the ID of the node that is planned for exclusion. To do this, execute the command below:
GET _cat/nodes?v&full_id&h=name,id,ip
Excluding the node from allocation
Using the node ID, you need to exclude the node from allocation with the following command:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._id": ["<node_id>"]
}
}
In OpenSearch version 2.13, a bug was discovered where the command above would execute but nothing would happen. When specifying multiple nodes for exclusion, list their IDs in a single line, like this:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._id": "<node_id_1>,<node_id_2>,<node_id_3>"
}
}
where <node_id>
is the identifier of the node to be excluded.
Waiting for shard migration
After excluding the node from allocation, the shard migration process will begin. To check if the process has completed, you can use the command below (look at the relocating_shards
parameter, which should be 0):
GET _cluster/health
Checking for allocation errors
After migrating shards from the excluded node, it is necessary to check for allocation errors—whether all shards have correctly migrated and initialized. You can check this using the command:
GET _cluster/allocation/explain
To correctly exclude the node, it is necessary to fix all allocation errors. Errors can range from incorrect index settings (often related to the number of replicas) to shard corruption or all found shard copies being outdated.
After executing the command provided above, a JSON object with error information will be displayed (look at allocate_explanation
and node_allocation_decisions.deciders.explanation
).
Checking node settings
It is necessary to check the list of plugins on the node being removed and ensure that disconnecting the node from the cluster will not affect the functionality of Search Anywhere Framework
.
If the configuration file opensearch.yml
on the node to be excluded contains the setting node_with_sme: true
, then you need to check the other nodes with the data
role (hot
and warm
, this parameter is not configured on cold
nodes) to see if this setting is present. If it is not, you need to configure it on all data
nodes.
You can check where each plugin is installed using the command:
GET _cat/plugins
Disabling allocation
After checking the settings of the node being removed, it is necessary to disable cluster allocation to avoid damaging or losing data:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "none"
}
}
Stopping the service and backing up the configuration of the excluded node
After disabling cluster allocation, it is safe to stop the OpenSearch process on the excluded node:
systemctl stop opensearch
For safety, it is necessary to back up the directory containing the OpenSearch configurations:
sudo cp -r /app/opensearch/config /app/os_conf_backup
Enabling cluster allocation
It is necessary to re-enable cluster allocation:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}
And then wait for the cluster to reach the green
status:
GET _cluster/health
Clearing exclusions
After the cluster becomes green
, you need to clear the list of excluded node IDs:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._id": null
}
}