[CI] InternalEngineTests.testLookupSeqNoByIdInLucene fails after prune ID merge policy

My PR build failed:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/193/testReport/org.elasticsearch.index.engine/InternalEngineTests/testLookupSeqNoByIdInLucene/

with:
```
java.lang.NullPointerException
	at __randomizedtesting.SeedInfo.seed([DDA237697902E2F7:858F7759E359C12]:0)
	at org.elasticsearch.index.engine.InternalEngineTests.lambda$testLookupSeqNoByIdInLucene$50(InternalEngineTests.java:4004)
	at org.elasticsearch.index.engine.InternalEngineTests.testLookupSeqNoByIdInLucene(InternalEngineTests.java:4031)
```

I can easily reproduce this with the seed DDA237697902E2F7. I cut down the test case to following (which indexes a doc, does a refresh, deletes the doc and then merges, loosing the ability to lookup the doc/seqno by ID):
```
    public void testLookupSeqNoByIdInLucene2() throws Exception {
        Settings.Builder settings = Settings.builder()
            .put(defaultSettings.getSettings())
            .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true);
        final IndexMetaData indexMetaData = IndexMetaData.builder(defaultSettings.getIndexMetaData()).settings(settings).build();
        final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(indexMetaData);
        Map<String, Engine.Operation> latestOps = new HashMap<>(); // id -> latest seq_no
        try (Store store = createStore();
             InternalEngine engine = createEngine(config(indexSettings, store, createTempDir(), newMergePolicy(), null))) {
            final ParsedDocument doc = EngineTestCase.createParsedDoc("23", null);
            engine.index(new Engine.Index(EngineTestCase.newUid(doc), doc, 1, primaryTerm.get(),
                1, null, Engine.Operation.Origin.REPLICA, threadPool.relativeTimeInMillis(), -1, true, UNASSIGNED_SEQ_NO, 0L));
            engine.refresh("test");
            engine.delete(new Engine.Delete(doc.type(), doc.id(), EngineTestCase.newUid(doc), 3, primaryTerm.get(),
                1, null, Engine.Operation.Origin.REPLICA, threadPool.relativeTimeInMillis(), UNASSIGNED_SEQ_NO, 0L));
            try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
                logger.info("before merge: " + searcher.reader().numDocs() + ", " + searcher.reader().maxDoc());
            }
            engine.forceMerge(true);
            try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
                logger.info("after merge: " + searcher.reader().numDocs() + ", " + searcher.reader().maxDoc());
                DocIdAndSeqNo docIdAndSeqNo = VersionsAndSeqNoResolver.loadDocIdAndSeqNo(searcher.reader(), newUid("23"));
                assertNotNull(docIdAndSeqNo);
            }
        }
    }
```

I tried disabling the new `PrunePostingsMergePolicy` and this made the problem go away. As far as I can see, we do use this lookup by ID in `InternalEngine.planIndexingAsNonPrimary` and `planDeletionAsNonPrimary` in the case where the seqNo received is below local checkpoint.

I am unsure if this part can be removed now and simply always treat all ops below local checkpoint as stale? Therefore raising this issue to gather input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] InternalEngineTests.testLookupSeqNoByIdInLucene fails after prune ID merge policy #42979

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[CI] InternalEngineTests.testLookupSeqNoByIdInLucene fails after prune ID merge policy #42979

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions