Skip to content

Fix CriteriaBasedCodec to work with delegate codec#20442

Merged
Bukhtawar merged 1 commit into
opensearch-project:mainfrom
RS146BIJAY:fix-codec
Feb 9, 2026
Merged

Fix CriteriaBasedCodec to work with delegate codec#20442
Bukhtawar merged 1 commit into
opensearch-project:mainfrom
RS146BIJAY:fix-codec

Conversation

@RS146BIJAY
Copy link
Copy Markdown
Contributor

Description

Fix CriteriaBasedCodec to work with delegate codec

Related Issues

Fix CriteriaBasedCodec to work with delegate codec

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 20, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors the codec architecture by introducing CriteriaBasedPostingsFormat as a new SPI provider to handle postings format with bucket-level attribute tracking. CriteriaBasedCodec is simplified to delegate postings format handling while unregistered from the Codec SPI. Tests validate the new grouping functionality across multiple scenarios.

Changes

Cohort / File(s) Summary
Codec Architecture Refactor
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java, server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
CriteriaBasedCodec simplified to add public postingsFormat() override and ATTRIBUTE_BINDING_TARGET_FIELD constant; removed segmentInfoFormat and docValuesFormat customizations. New CriteriaBasedPostingsFormat class implements PostingsFormat with bucket-level attribute tracking, delegate codec wrapping, and specialized FieldsConsumer for merge-aware bucket propagation.
Service Provider Configuration
server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec, server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat
Removed CriteriaBasedCodec from Codec SPI registry; added CriteriaBasedPostingsFormat to PostingsFormat SPI registry.
Test Coverage
server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java, server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java, test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java
New CriteriaBasedPostingsFormatTests class validates indexing, bucket attribute assignment, and null bucket placeholder behavior. InternalEngineTests adds testCriteriaBasedGrouping() test and helper method. EngineTestCase.testContextSpecificDocument() signature updated to accept groupingCriteria parameter.
Documentation
CHANGELOG.md
Added changelog entry documenting CriteriaBasedCodec delegate codec compatibility fix.

Sequence Diagram(s)

sequenceDiagram
    participant Writer as IndexWriter
    participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
    participant FieldsConsumer as CriteriaBasedFieldsConsumer
    participant DelegateFormat as Delegate PostingsFormat
    participant SegmentState as SegmentWriteState

    Writer->>CriteriaBasedPostingsFormat: fieldsConsumer(state)
    CriteriaBasedPostingsFormat->>DelegateFormat: fieldsConsumer(state)
    DelegateFormat-->>FieldsConsumer: FieldsConsumer instance
    CriteriaBasedPostingsFormat->>FieldsConsumer: wrap delegate
    FieldsConsumer->>FieldsConsumer: setAttributes on segment<br/>(bucket, delegate codec)
    FieldsConsumer->>FieldsConsumer: set bucket on _id field<br/>if bucket present
    FieldsConsumer->>DelegateFormat: write(fields)
    DelegateFormat-->>FieldsConsumer: success
    FieldsConsumer->>SegmentState: recordAttribute(DELEGATE_CODEC_KEY)
Loading
sequenceDiagram
    participant Reader as IndexReader
    participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
    participant SegmentState as SegmentReadState
    participant CodecRegistry as Codec Registry
    participant DelegateFormat as Delegate PostingsFormat

    Reader->>CriteriaBasedPostingsFormat: fieldsProducer(state)
    CriteriaBasedPostingsFormat->>SegmentState: readAttribute(DELEGATE_CODEC_KEY)
    SegmentState-->>CriteriaBasedPostingsFormat: codec name
    CriteriaBasedPostingsFormat->>CodecRegistry: lookup(codec name)
    CodecRegistry-->>CriteriaBasedPostingsFormat: PostingsFormat
    CriteriaBasedPostingsFormat->>DelegateFormat: fieldsProducer(state)
    DelegateFormat-->>CriteriaBasedPostingsFormat: FieldsProducer instance
    CriteriaBasedPostingsFormat-->>Reader: FieldsProducer
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

lucene

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description lacks substantive detail about what the fix achieves, missing explanation of architectural changes, and the Related Issues section duplicates the title without referencing a specific GitHub issue. Expand the description to explain the delegate codec pattern, why it's needed, and what problem it solves. Update Related Issues to reference the actual GitHub issue number rather than repeating the title.
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main fix implemented across the changeset: restructuring CriteriaBasedCodec to work with a delegate codec pattern.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In
`@server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java`:
- Around line 118-130: In merge(MergeState mergeState, NormsProducer norms) the
code assumes mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD)
is non-null which can cause an NPE; instead iterate over mergeState.fieldInfos
to find the first FieldInfos instance whose
fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null, read its
getAttribute(BUCKET_NAME) with a null-check, and only then call
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute) and
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME,
attribute); if no non-null fieldInfo is found, skip setting the attribute.
Ensure you still call delegateFieldsConsumer.merge(mergeState, norms) as before.

In
`@server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java`:
- Around line 41-69: The test creates a ByteBuffersDirectory (Directory dir =
new ByteBuffersDirectory()) but never closes it; wrap the Directory in a
try-with-resources that encloses the IndexWriter and IndexReader usage in
testBasicFunctionality (and apply the same change to the other three test
methods) so the Directory is closed after the reader; ensure the IndexReader is
opened/closed inside the Directory try block (or nest the reader/writer
try-with-resources inside the Directory try-with-resources) to guarantee the
reader is closed before the Directory is closed.

In `@server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java`:
- Around line 3816-3853: In testCriteriaBasedGrouping the test reuses the same
document id ("1") for every iteration which causes older grouping buckets to be
removed/merged and makes the final bucket assertion flaky; update the loop in
testCriteriaBasedGrouping so each ParsedDocument created by testParsedDocument
has a unique _id (for example append the loop index i or the groupingCriteria to
the id) when calling engine.index(indexForDoc(doc)) so that live docs exist in
each bucket and the Set<String> attributes assertion reliably contains all three
grouping criteria.
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)

77-80: Fail fast when DELEGATE_CODEC_KEY is missing (asserts are off in prod).

Line 79 relies on assert, which won’t run in production; a missing attribute will instead fail later with a less actionable error. Consider an explicit null check with a clear exception.

♻️ Suggested refactor
-        assert state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY) != null;
-        return PostingsFormat.forName(state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY)).fieldsProducer(state);
+        String delegateName = state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY);
+        if (delegateName == null) {
+            throw new IllegalStateException(
+                "Missing " + DELEGATE_CODEC_KEY + " in segmentInfo for " + state.segmentInfo.name
+            );
+        }
+        return PostingsFormat.forName(delegateName).fieldsProducer(state);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d145d6b and 7e949cb.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
  • server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec
  • server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat
  • server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java
  • server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java
  • test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java
💤 Files with no reviewable changes (1)
  • server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.

Applied to files:

  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
  • server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java
  • server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
🧬 Code graph analysis (3)
server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
  • CriteriaBasedPostingsFormat (32-145)
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (2)
server/src/main/java/org/opensearch/index/CriteriaBasedMergePolicy.java (1)
  • CriteriaBasedMergePolicy (30-82)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
  • CriteriaBasedCodec (20-49)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
  • CriteriaBasedCodec (20-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: gradle-check
  • GitHub Check: Mend Security Check
🔇 Additional comments (9)
CHANGELOG.md (1)

35-35: Changelog entry looks good.

Captures the fix in the correct section.

server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat (1)

3-3: Provider registration is correct.

This enables SPI discovery for the new postings format.

test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java (1)

369-372: Good update to use the provided grouping criteria.

Keeps the helper aligned with the new tests.

server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (5)

130-130: No review comment.


214-214: No review comment.


3855-3857: LGTM.
Helper keeps the merge policy setup tidy.


8741-8743: LGTM.
Context-specific documents align with context-aware settings in this test.


8786-8788: LGTM.
Context-specific documents align with context-aware settings in this test.

server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)

31-47: Verify behavior when the delegate postings format is not per-field.

Line 34 only wraps CriteriaBasedPostingsFormat when the delegate is a PerFieldPostingsFormat. If a plugin supplies a non‑per‑field postings format, bucket attributes won’t be written and criteria grouping is effectively disabled. Please confirm this invariant or consider wrapping the non‑per‑field format as well.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 7e949cb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@RS146BIJAY RS146BIJAY force-pushed the fix-codec branch 2 times, most recently from cb58a50 to 10573dd Compare February 1, 2026 05:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 1, 2026

❌ Gradle check result for 10573dd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 1, 2026

❌ Gradle check result for 9b622db: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 9, 2026

❌ Gradle check result for 7d9d38d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 9, 2026

✅ Gradle check result for 7d9d38d: SUCCESS

@Bukhtawar Bukhtawar merged commit ac911ed into opensearch-project:main Feb 9, 2026
37 of 39 checks passed
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
pradeep-L pushed a commit to pradeep-L/OpenSearch that referenced this pull request Apr 21, 2026
…t#20442)

Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants