Skip to main content
Version: 5.2

Reindex data

Reindexing is the process of copying documents from one index to another. It is used when you need to change the data structure (field mapping), create a backup, update immutable index settings, or rename an index.

Performance Recommendation!

When creating temporary indexes (e.g., for data reindexing, mapping changes, or backups), set the number of replicas (number_of_replicas) to 0 for the duration of the reindexing operation. After completion: restore the original value for the number of replicas (number_of_replicas).


Reindexing All Documents

To copy all documents from one index to another, follow these steps:

1. Create the Target Index

First, you need to create a target index with the required field structure (mapping) and settings. You can define them manually or copy them from the source index.

PUT <index-name>
{
"mappings": {
... // Specify the desired mapping
},
"settings": {
... // Specify the desired settings
}
}

2. Execute the Reindex Operation

Then, execute the _reindex request to copy all documents from the source index to the target index.

POST _reindex
{
"source":{
"index":"source"
},
"dest":{
"index":"<index-name>"
}
}
Important!

If the target index was not created beforehand, the _reindex operation will automatically create it with the default configuration, which may not meet your requirements.


Selective Reindexing of Documents

The _reindex operation allows you to copy not an entire index, but only a specific subset of documents that match a search query.

Example: Copying Documents Based on a Condition

The following request will copy into the target index only those documents where the field field_name contains the value text.

POST _reindex
{
"source":{
"index":"source",
"query": {
"match": {
"field_name": "text"
}
}
},
"dest":{
"index":"<index-name>"
}
}
Important!

A complete list of available operations is provided in the OpenSearch official documentation.


Merging Multiple Indexes

The _reindex operation allows you to merge documents from multiple source indexes into a single target index. To do this, you need to specify the source indexes as a list.

Example: Merging Two Indexes

The following request will copy all documents from the source_1 and source_2 indexes into the destination index.

POST _reindex
{
"source":{
"index":[
"source_1",
"source_2"
]
},
"dest":{
"index":"destination"
}
}
Important!

Ensure that the number of shards in the source and target indexes matches. Otherwise, the operation may fail.


Transforming Documents During Reindexing

The _reindex operation allows not only copying but also transforming data during the transfer. Two approaches can be used for this: inline scripts or ingest pipelines.

Method 1: Using Scripts

For simple transformations directly within the reindex request, use the script section. The recommended scripting language is Painless.

Example: Incrementing the value of a numeric field account.number by 1 for each document.

POST _reindex
{
"source":{
"index":"source"
},
"dest":{
"index":"<index-name>"
},
"script":{
"lang":"painless",
"source":"ctx._account.number++"
}
}

Method 2: Using an Ingest Pipeline

You can also use an ingest pipeline to transform documents during reindexing.

  1. Create a pipeline with specific processors. A wide variety of processors is available for use in pipelines

Example of a pipeline where:

  • split splits the text field by spaces and saves the result into a new field word
  • script in Painless calculates the length of word and saves the result in a new field word_count
  • remove deletes the test field
PUT _ingest/pipeline/pipeline-test
{
"description": "Transforms a text field into a list. Calculates the length of the 'word' field and saves it in the new 'word_count' field. Removes the 'test' field.",
"processors": [
{
"split": {
"field": "text",
"separator": "\\s+",
"target_field": "word"
}
},
{
"script": {
"lang": "painless",
"source": "ctx.word_count = ctx.word.length"
}
},
{
"remove": {
"field": "test"
}
}
]
}
  1. Execute reindexing with the pipeline.
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "<index-name>",
"pipeline": "pipeline-test"
}
}

Updating Documents in the Current Index

To update data directly in the current index without creating a new one, use the update_by_query operation.

Operation specifics:

  • the operation is executed using the POST method
  • it can only work with one index at a time
Example
POST <index_name>/_update_by_query
Important!

If you run this command without parameters, it will increment the version number for all documents in the specified index.


Source Index Parameters

ParameterAcceptable ValuesDescriptionRequired
indexStringThe name of the source index. Multiple indexes can be specified as a list.YES
max_docsIntegerThe maximum number of documents to reindex.NO
queryObjectThe search query for selecting documents during the reindexing operation.NO
sizeIntegerThe number of documents to reindex.NO
sliceStringSpecifies manual or automatic parallelization (slicing) to speed up the reindexing process.NO

Target Index Parameters

ParameterAcceptable ValuesDescriptionRequired
indexStringThe name of the target index.YES
version_typeEnumVersion control type for the indexing operation. Valid values: internal, external, external_gt, external_gte.NO