Skip to content

[BUG] Data loss during primary relocation for remote-backed indexes  #6214

@ashking94

Description

@ashking94

Describe the bug
During primary-primary relocation, encountering data loss when indexing is happening at high TPS. This specifically is starting after initiateTracking happens for the new primary shard. A subset of docs are missing after relocation completes. Also noticing that after relocation handoff is completed, indexing landing on new primary shard uses seq no that has been assigned after initiateTracking happened.

To Reproduce
Step 1 - Register repo for segments and translogs.

In environment variables -

path.repo=/usr/share/opensearch/repo

Run below curl -

curl -X PUT "localhost:9200/_snapshot/rem?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/usr/share/opensearch/repo/rem"
  }
}'

curl -X PUT "localhost:9200/_snapshot/seg?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/usr/share/opensearch/repo/seg"
  }
}'

Step 2 - Create index

curl -X PUT "localhost:9200/test-index?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "replication.type" : "SEGMENT",
    "index.remote_store.enabled": true,
    "index.remote_store.repository" : "seg",
    "index.remote_store.translog.enabled" : true,
    "index.remote_store.translog.repository" : "rem",
    "index.translog.durability" : "async",
    "refresh_interval": "1000s"
  }
}
'

Step 3 - Index docs and trigger relocation immediately just after starting index

Index -

for i in {1..1000}
do                
   curl --location --request POST "localhost:9202/test-index/_doc" \
    --header 'Content-Type: application/json' \
    --data-raw "{
      \"name\":\"abc${i}\"
    }"
    echo "$i\n"
done

Relocate -

curl -XPUT localhost:9201/test-index/_settings -H 'Content-Type: application/json' -d '    
{
  "index.routing.allocation.include._name": "opensearch-node1"
}'

Expected Behaviour
Docs count should be exactly how many docs were indexed

Metadata

Metadata

Assignees

Labels

Storage:DurabilityIssues and PRs related to the durability frameworkbugSomething isn't workingv2.6.0'Issues and PRs related to version v2.6.0'

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions