Reindex data
Reindexing is the process of copying documents from one index to another. It is used when you need to change the data structure (field mapping), create a backup, update immutable index settings, or rename an index.
When creating temporary indexes (e.g., for data reindexing, mapping changes, or backups), set the number of replicas (number_of_replicas) to 0 for the duration of the reindexing operation. After completion: restore the original value for the number of replicas (number_of_replicas).
Reindexing All Documents
To copy all documents from one index to another, follow these steps:
1. Create the Target Index
First, you need to create a target index with the required field structure (mapping) and settings. You can define them manually or copy them from the source index.
PUT <index-name>
{
"mappings": {
... // Specify the desired mapping
},
"settings": {
... // Specify the desired settings
}
}
2. Execute the Reindex Operation
Then, execute the _reindex request to copy all documents from the source index to the target index.
POST _reindex
{
"source":{
"index":"source"
},
"dest":{
"index":"<index-name>"
}
}
If the target index was not created beforehand, the _reindex operation will automatically create it with the default configuration, which may not meet your requirements.
Selective Reindexing of Documents
The _reindex operation allows you to copy not an entire index, but only a specific subset of documents that match a search query.
Example: Copying Documents Based on a Condition
The following request will copy into the target index only those documents where the field field_name contains the value text.
POST _reindex
{
"source":{
"index":"source",
"query": {
"match": {
"field_name": "text"
}
}
},
"dest":{
"index":"<index-name>"
}
}
A complete list of available operations is provided in the OpenSearch official documentation.
Merging Multiple Indexes
The _reindex operation allows you to merge documents from multiple source indexes into a single target index. To do this, you need to specify the source indexes as a list.
Example: Merging Two Indexes
The following request will copy all documents from the source_1 and source_2 indexes into the destination index.
POST _reindex
{
"source":{
"index":[
"source_1",
"source_2"
]
},
"dest":{
"index":"destination"
}
}
Ensure that the number of shards in the source and target indexes matches. Otherwise, the operation may fail.
Transforming Documents During Reindexing
The _reindex operation allows not only copying but also transforming data during the transfer. Two approaches can be used for this: inline scripts or ingest pipelines.
Method 1: Using Scripts
For simple transformations directly within the reindex request, use the script section. The recommended scripting language is Painless.
Example: Incrementing the value of a numeric field account.number by 1 for each document.
POST _reindex
{
"source":{
"index":"source"
},
"dest":{
"index":"<index-name>"
},
"script":{
"lang":"painless",
"source":"ctx._account.number++"
}
}
Method 2: Using an Ingest Pipeline
You can also use an ingest pipeline to transform documents during reindexing.
- Create a pipeline with specific processors. A wide variety of processors is available for use in pipelines
Example of a pipeline where:
splitsplits thetextfield by spaces and saves the result into a new field wordscriptin Painless calculates the length ofwordand saves the result in a new field word_countremovedeletes thetestfield
PUT _ingest/pipeline/pipeline-test
{
"description": "Transforms a text field into a list. Calculates the length of the 'word' field and saves it in the new 'word_count' field. Removes the 'test' field.",
"processors": [
{
"split": {
"field": "text",
"separator": "\\s+",
"target_field": "word"
}
},
{
"script": {
"lang": "painless",
"source": "ctx.word_count = ctx.word.length"
}
},
{
"remove": {
"field": "test"
}
}
]
}
- Execute reindexing with the pipeline.
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "<index-name>",
"pipeline": "pipeline-test"
}
}
Updating Documents in the Current Index
To update data directly in the current index without creating a new one, use the update_by_query operation.
Operation specifics:
- the operation is executed using the
POSTmethod - it can only work with one index at a time
POST <index_name>/_update_by_query
If you run this command without parameters, it will increment the version number for all documents in the specified index.
Source Index Parameters
| Parameter | Acceptable Values | Description | Required |
|---|---|---|---|
index | String | The name of the source index. Multiple indexes can be specified as a list. | YES |
max_docs | Integer | The maximum number of documents to reindex. | NO |
query | Object | The search query for selecting documents during the reindexing operation. | NO |
size | Integer | The number of documents to reindex. | NO |
slice | String | Specifies manual or automatic parallelization (slicing) to speed up the reindexing process. | NO |
Target Index Parameters
| Parameter | Acceptable Values | Description | Required |
|---|---|---|---|
index | String | The name of the target index. | YES |
version_type | Enum | Version control type for the indexing operation. Valid values: internal, external, external_gt, external_gte. | NO |