-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the bug
Across different nodes, the combination of primary term, translog generation has to be unique for the translog metadata file name.
There is a bug where the old primary can still upload a translog metadata file which has same primary term and generation which is generated as part of the relocation handoff by the new primary. This happens when there is any internal or background flush triggered around the same time as the relocation handoff but just before the primary mode becomes false on the old primary. In the cases where we found the issue, the internal flush was triggered due to no writes happening in last 5 mins on a shard and the relocation happening around the same time as of the internal flush.
OpenSearch/server/src/main/java/org/opensearch/index/shard/IndexShard.java
Lines 2679 to 2701 in fe2d585
| public void flushOnIdle(long inactiveTimeNS) { | |
| Engine engineOrNull = getEngineOrNull(); | |
| if (engineOrNull != null && System.nanoTime() - engineOrNull.getLastWriteNanos() >= inactiveTimeNS) { | |
| boolean wasActive = active.getAndSet(false); | |
| if (wasActive) { | |
| logger.debug("flushing shard on inactive"); | |
| threadPool.executor(ThreadPool.Names.FLUSH).execute(new AbstractRunnable() { | |
| @Override | |
| public void onFailure(Exception e) { | |
| if (state != IndexShardState.CLOSED) { | |
| logger.warn("failed to flush shard on inactive", e); | |
| } | |
| } | |
| @Override | |
| protected void doRun() { | |
| flush(new FlushRequest().waitIfOngoing(false).force(false)); | |
| periodicFlushMetric.inc(); | |
| } | |
| }); | |
| } | |
| } | |
| } |
To Reproduce
This is very difficult to reproduce and shows up at very high scale. However, we can still attempt to reproduce by creating mutliple indexes and triggering the relocation around the 5th minute of no write on the shard.
Expected behavior
The old primary must not upload once the control reaches the handoff stage.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
- OS: [e.g. iOS]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.