-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[CI] InternalEngineTests.testLookupSeqNoByIdInLucene fails after prune ID merge policy #42979
Copy link
Copy link
Closed
Labels
:Distributed/EngineAnything around managing Lucene and the Translog in an open shard.Anything around managing Lucene and the Translog in an open shard.>test-failureTriaged test failures from CITriaged test failures from CIv7.3.0v8.0.0-alpha1
Description
My PR build failed:
with:
java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([DDA237697902E2F7:858F7759E359C12]:0)
at org.elasticsearch.index.engine.InternalEngineTests.lambda$testLookupSeqNoByIdInLucene$50(InternalEngineTests.java:4004)
at org.elasticsearch.index.engine.InternalEngineTests.testLookupSeqNoByIdInLucene(InternalEngineTests.java:4031)
I can easily reproduce this with the seed DDA237697902E2F7. I cut down the test case to following (which indexes a doc, does a refresh, deletes the doc and then merges, loosing the ability to lookup the doc/seqno by ID):
public void testLookupSeqNoByIdInLucene2() throws Exception {
Settings.Builder settings = Settings.builder()
.put(defaultSettings.getSettings())
.put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true);
final IndexMetaData indexMetaData = IndexMetaData.builder(defaultSettings.getIndexMetaData()).settings(settings).build();
final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(indexMetaData);
Map<String, Engine.Operation> latestOps = new HashMap<>(); // id -> latest seq_no
try (Store store = createStore();
InternalEngine engine = createEngine(config(indexSettings, store, createTempDir(), newMergePolicy(), null))) {
final ParsedDocument doc = EngineTestCase.createParsedDoc("23", null);
engine.index(new Engine.Index(EngineTestCase.newUid(doc), doc, 1, primaryTerm.get(),
1, null, Engine.Operation.Origin.REPLICA, threadPool.relativeTimeInMillis(), -1, true, UNASSIGNED_SEQ_NO, 0L));
engine.refresh("test");
engine.delete(new Engine.Delete(doc.type(), doc.id(), EngineTestCase.newUid(doc), 3, primaryTerm.get(),
1, null, Engine.Operation.Origin.REPLICA, threadPool.relativeTimeInMillis(), UNASSIGNED_SEQ_NO, 0L));
try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
logger.info("before merge: " + searcher.reader().numDocs() + ", " + searcher.reader().maxDoc());
}
engine.forceMerge(true);
try (Searcher searcher = engine.acquireSearcher("test", Engine.SearcherScope.INTERNAL)) {
logger.info("after merge: " + searcher.reader().numDocs() + ", " + searcher.reader().maxDoc());
DocIdAndSeqNo docIdAndSeqNo = VersionsAndSeqNoResolver.loadDocIdAndSeqNo(searcher.reader(), newUid("23"));
assertNotNull(docIdAndSeqNo);
}
}
}
I tried disabling the new PrunePostingsMergePolicy and this made the problem go away. As far as I can see, we do use this lookup by ID in InternalEngine.planIndexingAsNonPrimary and planDeletionAsNonPrimary in the case where the seqNo received is below local checkpoint.
I am unsure if this part can be removed now and simply always treat all ops below local checkpoint as stale? Therefore raising this issue to gather input.
Reactions are currently unavailable
Metadata
Metadata
Labels
:Distributed/EngineAnything around managing Lucene and the Translog in an open shard.Anything around managing Lucene and the Translog in an open shard.>test-failureTriaged test failures from CITriaged test failures from CIv7.3.0v8.0.0-alpha1
Type
Fields
Give feedbackNo fields configured for issues without a type.