Fix CriteriaBasedCodec to work with delegate codec#20442
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR refactors the codec architecture by introducing CriteriaBasedPostingsFormat as a new SPI provider to handle postings format with bucket-level attribute tracking. CriteriaBasedCodec is simplified to delegate postings format handling while unregistered from the Codec SPI. Tests validate the new grouping functionality across multiple scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Writer as IndexWriter
participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
participant FieldsConsumer as CriteriaBasedFieldsConsumer
participant DelegateFormat as Delegate PostingsFormat
participant SegmentState as SegmentWriteState
Writer->>CriteriaBasedPostingsFormat: fieldsConsumer(state)
CriteriaBasedPostingsFormat->>DelegateFormat: fieldsConsumer(state)
DelegateFormat-->>FieldsConsumer: FieldsConsumer instance
CriteriaBasedPostingsFormat->>FieldsConsumer: wrap delegate
FieldsConsumer->>FieldsConsumer: setAttributes on segment<br/>(bucket, delegate codec)
FieldsConsumer->>FieldsConsumer: set bucket on _id field<br/>if bucket present
FieldsConsumer->>DelegateFormat: write(fields)
DelegateFormat-->>FieldsConsumer: success
FieldsConsumer->>SegmentState: recordAttribute(DELEGATE_CODEC_KEY)
sequenceDiagram
participant Reader as IndexReader
participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
participant SegmentState as SegmentReadState
participant CodecRegistry as Codec Registry
participant DelegateFormat as Delegate PostingsFormat
Reader->>CriteriaBasedPostingsFormat: fieldsProducer(state)
CriteriaBasedPostingsFormat->>SegmentState: readAttribute(DELEGATE_CODEC_KEY)
SegmentState-->>CriteriaBasedPostingsFormat: codec name
CriteriaBasedPostingsFormat->>CodecRegistry: lookup(codec name)
CodecRegistry-->>CriteriaBasedPostingsFormat: PostingsFormat
CriteriaBasedPostingsFormat->>DelegateFormat: fieldsProducer(state)
DelegateFormat-->>CriteriaBasedPostingsFormat: FieldsProducer instance
CriteriaBasedPostingsFormat-->>Reader: FieldsProducer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In
`@server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java`:
- Around line 118-130: In merge(MergeState mergeState, NormsProducer norms) the
code assumes mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD)
is non-null which can cause an NPE; instead iterate over mergeState.fieldInfos
to find the first FieldInfos instance whose
fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null, read its
getAttribute(BUCKET_NAME) with a null-check, and only then call
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute) and
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME,
attribute); if no non-null fieldInfo is found, skip setting the attribute.
Ensure you still call delegateFieldsConsumer.merge(mergeState, norms) as before.
In
`@server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java`:
- Around line 41-69: The test creates a ByteBuffersDirectory (Directory dir =
new ByteBuffersDirectory()) but never closes it; wrap the Directory in a
try-with-resources that encloses the IndexWriter and IndexReader usage in
testBasicFunctionality (and apply the same change to the other three test
methods) so the Directory is closed after the reader; ensure the IndexReader is
opened/closed inside the Directory try block (or nest the reader/writer
try-with-resources inside the Directory try-with-resources) to guarantee the
reader is closed before the Directory is closed.
In `@server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java`:
- Around line 3816-3853: In testCriteriaBasedGrouping the test reuses the same
document id ("1") for every iteration which causes older grouping buckets to be
removed/merged and makes the final bucket assertion flaky; update the loop in
testCriteriaBasedGrouping so each ParsedDocument created by testParsedDocument
has a unique _id (for example append the loop index i or the groupingCriteria to
the id) when calling engine.index(indexForDoc(doc)) so that live docs exist in
each bucket and the Set<String> attributes assertion reliably contains all three
grouping criteria.
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
77-80: Fail fast whenDELEGATE_CODEC_KEYis missing (asserts are off in prod).Line 79 relies on
assert, which won’t run in production; a missing attribute will instead fail later with a less actionable error. Consider an explicit null check with a clear exception.♻️ Suggested refactor
- assert state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY) != null; - return PostingsFormat.forName(state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY)).fieldsProducer(state); + String delegateName = state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY); + if (delegateName == null) { + throw new IllegalStateException( + "Missing " + DELEGATE_CODEC_KEY + " in segmentInfo for " + state.segmentInfo.name + ); + } + return PostingsFormat.forName(delegateName).fieldsProducer(state);
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.mdserver/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.javaserver/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.javaserver/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codecserver/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormatserver/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.javaserver/src/test/java/org/opensearch/index/engine/InternalEngineTests.javatest/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java
💤 Files with no reviewable changes (1)
- server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.
Applied to files:
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.javaserver/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.javaserver/src/test/java/org/opensearch/index/engine/InternalEngineTests.java
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.javaserver/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
🧬 Code graph analysis (3)
server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
CriteriaBasedPostingsFormat(32-145)
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (2)
server/src/main/java/org/opensearch/index/CriteriaBasedMergePolicy.java (1)
CriteriaBasedMergePolicy(30-82)server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
CriteriaBasedCodec(20-49)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
CriteriaBasedCodec(20-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: gradle-check
- GitHub Check: Mend Security Check
🔇 Additional comments (9)
CHANGELOG.md (1)
35-35: Changelog entry looks good.Captures the fix in the correct section.
server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat (1)
3-3: Provider registration is correct.This enables SPI discovery for the new postings format.
test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java (1)
369-372: Good update to use the provided grouping criteria.Keeps the helper aligned with the new tests.
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (5)
130-130: No review comment.
214-214: No review comment.
3855-3857: LGTM.
Helper keeps the merge policy setup tidy.
8741-8743: LGTM.
Context-specific documents align with context-aware settings in this test.
8786-8788: LGTM.
Context-specific documents align with context-aware settings in this test.server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
31-47: Verify behavior when the delegate postings format is not per-field.Line 34 only wraps
CriteriaBasedPostingsFormatwhen the delegate is aPerFieldPostingsFormat. If a plugin supplies a non‑per‑field postings format, bucket attributes won’t be written and criteria grouping is effectively disabled. Please confirm this invariant or consider wrapping the non‑per‑field format as well.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
❌ Gradle check result for 7e949cb: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
cb58a50 to
10573dd
Compare
|
❌ Gradle check result for 10573dd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 9b622db: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 7d9d38d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…t#20442) Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>
…t#20442) Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>
…t#20442) Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>
Description
Fix CriteriaBasedCodec to work with delegate codec
Related Issues
Fix CriteriaBasedCodec to work with delegate codec