[Derived Source] Support ignore_above/normalizer for keyword/wildcard field and optimise text field lookup by tanik98 · Pull Request #20113 · opensearch-project/OpenSearch

tanik98 · 2025-11-28T03:11:22Z

For keyword and wildcard field handle ignore_above by explicitly storing values exceeding that and for noarmalizer store value in all cases, and use stored field for using it as a fallback
For text field, use sub keyword fields for deriving source(using doc_values) instead of explicitly storing the value, this way we can avoid storing duplicates(as a doc_values under sub keyword and stored field under text field).
In case of multiple keyword fields, we will be going through each one to find first one, and in case of ingored field, we are choosing keyword with max value of ignore_above to save the dupplication of docValue and fallback stored field. Updation of sub keyword field is append only, we can't remove sub keyword once added, these would get appended in applicable keyword fields for derived source as and when mapping gets updated.

Summary by CodeRabbit

Release Notes

New Features
- Added support for ignore_above parameter on keyword and wildcard fields.
- Optimized text field retrieval in derived source scenarios.
Bug Fixes
- Improved handling of values exceeding the ignore_above threshold for keyword and wildcard fields with derived source enabled.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-28T03:11:38Z

📝 Walkthrough

Walkthrough

This PR extends support for the ignore_above configuration to keyword and wildcard fields within derived source mappings. It introduces a composite field value fetcher mechanism to retrieve values from multiple sources (stored fields, doc values, and a dedicated ignore field) and updates core mapper classes to store values exceeding ignore_above in a special derived-source field for later retrieval.

Changes

Cohort / File(s)	Summary
Changelog `CHANGELOG.md`	Adds entry documenting support for `ignore_above` on keyword/wildcard fields and text field optimization under derived source.
Core mapper infrastructure `server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java`, `FieldValueFetcher.java`	Introduces `CompositeFieldValueFetcher` class that chains multiple `FieldValueFetcher` instances, trying each in sequence until a non-empty result is found. Updates `FieldValueFetcher.write` to handle null value lists.
Field type additions `server/src/main/java/org/opensearch/index/mapper/StringFieldType.java`	Adds `derivedSourceIgnoreFieldName()` method and private constant `IGNORED_VALUE_FIELD_SUFFIX` to support derived source ignore field naming.
Keyword field mapper `server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java`	Relaxes derived-source eligibility checks for `ignore_above`; introduces `DerivedSourceHelper` to manage value fetching from primary field and fallback ignore field; stores values exceeding `ignore_above` in dedicated ignore field when derived source is enabled.
Wildcard field mapper `server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java`	Updates parse flow to store values exceeding `ignore_above` in derived source ignore field; relaxes guard conditions in `canDeriveSourceInternal`; replaces single `SortedSetDocValuesFetcher` with composite approach; introduces `DerivedSourceHelper`.
Text field mapper `server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java`	Extends field type construction to propagate `MultiFields`; adds public getters/setters for derived-source keyword support and ignore length tracking; injects logic to store ignore field when needed; reworks derived source field value fetching to use `CompositeFieldValueFetcher`; introduces `DerivedSourceHelper` to detect and collect keyword-derived fetchers.
Match-only text field mapper `server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java`	Updates `buildFieldType` method signature to accept `MultiFields` parameter; threads multi-field configuration through field type construction.
Integration tests `server/src/internalClusterTest/java/org/opensearch/get/GetActionIT.java`, `recovery/FullRollingRestartIT.java`, `search/simple/SimpleSearchIT.java`, `update/UpdateIT.java`	Updates derived source test mappings: extends keyword sub-field definitions with `ignore_above` configuration; removes `store: true` from text fields and `doc_values: true` from wildcard fields; updates document indexing and assertions to reflect new field configurations.
Unit tests `server/src/test/java/org/opensearch/index/mapper/CompositeFieldValueFetcherTests.java`	New test class validating `CompositeFieldValueFetcher` behavior across multiple fetcher scenarios, empty lists, and exception propagation.
Keyword field mapper tests `server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java`	Adds factory method `getMapperServiceForDerivedSource` and tests for derived source behavior with `ignore_above` threshold scenarios; replaces exception expectations with no-exception assertions for canDeriveSource path.
Text field mapper tests `server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java`	Renames and refactors stored field test; adds multiple tests for derived source with keyword sub-fields, ignore_above thresholds, normalization, and long text handling; updates `IndexAnalyzers` construction with "whitespace" analyzer.
Wildcard field mapper tests `server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java`	Adds tests for derived value fetching and parse behavior around `ignore_above` threshold with derived source enabled; introduces `getMapperServiceForDerivedSource` helper and `assertDoesNotThrow` utility.
Object mapper tests `server/src/test/java/org/opensearch/index/mapper/ObjectMapperTests.java`	Removes multiple derive-source validation test cases (ignore_above, nested object fields, copy_to, multi-field scenarios); simplifies remaining multi-type field mapping by removing stored and doc_values flags.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Parser as Field Parser
    participant IgnoreField as Ignore Field Storage
    participant Derived as Derived Source Generator
    participant Composite as CompositeFieldValueFetcher
    participant Primary as Primary Fetcher
    participant Fallback as Fallback Fetcher

    Client->>Parser: Index document with text exceeding ignore_above
    Parser->>Parser: Check if derived source enabled & value > ignore_above
    alt Value exceeds ignore_above threshold
        Parser->>IgnoreField: Store value in derivedSourceIgnore field
        Parser->>Derived: Skip normal indexing, return early
    else Value within threshold
        Parser->>Parser: Normal field indexing
    end
    
    Note over Client,Fallback: Later: Deriving source for document
    Client->>Derived: Request derived source
    Derived->>Composite: Create fetcher chain
    Composite->>Primary: Try primary fetcher (doc values or stored)
    alt Primary has value
        Primary-->>Composite: Return value
        Composite-->>Derived: Return converted value
    else Primary empty
        Composite->>Fallback: Try fallback fetcher (ignore field)
        Fallback->>IgnoreField: Read from derivedSourceIgnore field
        IgnoreField-->>Fallback: Return stored value
        Fallback-->>Composite: Return converted value
        Composite-->>Derived: Return value
    end
    Derived-->>Client: Return derived source with retrieved value

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

CompositeFieldValueFetcher logic and proper ordering of fetcher chains across multiple field mappers
Derived source storage and retrieval paths in KeywordFieldMapper, WildcardFieldMapper, and TextFieldMapper — verify null-safety and edge cases around ignore_above thresholds
Public API additions to StringFieldType and TextFieldType — confirm backward compatibility and semantic correctness of new getters/setters
DerivedSourceHelper implementations across three mapper classes — ensure consistency in keyword detection and fetcher collection logic
Multi-field wiring changes in TextFieldMapper and MatchOnlyTextFieldMapper — verify MultiFields parameter threading does not break existing functionality
Integration test updates — confirm that removed store and doc_values flags do not inadvertently change test expectations or mask regressions

Suggested labels

Search:Performance

Suggested reviewers

cwperks
gbbafna
saratvemulapalli
sachinpkale
kotwanikunal

Poem

🐰 Ignoring above, we now derive with care,
Composite fetchers fetch from everywhere,
Keyword and wildcard fields now play their part,
Derived source shines, a clever work of art!
Long text? No problem—we store what's spared! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.12% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately summarizes the main changes: supporting ignore_above/normalizer for keyword/wildcard fields and optimizing text field lookup under derived source, matching the changelog entry and core implementation across multiple field mappers.
Description check	✅ Passed	The PR description adequately explains the changes being made, including the derived-source handling modifications for keyword/wildcard fields, text field optimization, and multi-keyword field strategy.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-28T03:25:14Z

❌ Gradle check result for e4707a1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-11-28T03:53:50Z

❌ Gradle check result for 6ad1447: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-12-08T17:17:37Z

❌ Gradle check result for c152793: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-12-08T19:05:32Z

❌ Gradle check result for 9a2ff84: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-12-08T19:41:23Z

❌ Gradle check result for de290a0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-12-09T03:56:43Z

❌ Gradle check result for 007aa20: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-12-09T05:34:49Z

✅ Gradle check result for 88ac318: SUCCESS

codecov · 2025-12-09T05:35:43Z

Codecov Report

❌ Patch coverage is 82.06897% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.26%. Comparing base (d47931e) to head (88ac318).
⚠️ Report is 75 commits behind head on main.

Files with missing lines	Patch %	Lines
...g/opensearch/index/mapper/WildcardFieldMapper.java	71.42%	6 Missing and 4 partials ⚠️
...rg/opensearch/index/mapper/KeywordFieldMapper.java	76.47%	5 Missing and 3 partials ⚠️
...a/org/opensearch/index/mapper/TextFieldMapper.java	86.20%	3 Missing and 5 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #20113      +/-   ##
============================================
- Coverage     73.30%   73.26%   -0.04%     
- Complexity    71732    71761      +29     
============================================
  Files          5793     5794       +1     
  Lines        328056   328228     +172     
  Branches      47245    47285      +40     
============================================
+ Hits         240476   240490      +14     
- Misses        68264    68442     +178     
+ Partials      19316    19296      -20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (1)
232-238: Dead code in else branch.

The condition if (fieldType().hasDocValues() == false) at line 235 is always true within this else branch (since we reach it when fieldType().hasDocValues() returns false at line 232). This makes the inner condition redundant.
     if (fieldType().hasDocValues()) {
         context.doc().add(new SortedSetDocValuesField(fieldType().name(), binaryValue));
     } else {
-        if (fieldType().hasDocValues() == false) {
-            createFieldNamesField(context);
-        }
+        createFieldNamesField(context);
     }

🧹 Nitpick comments (11)

server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java (2)

614-647: Consider extracting the helper to a shared test utility.

The getMapperServiceForDerivedSource() method is nearly identical to the one in WildcardFieldMapperTests.java (lines 438-471). This code duplication could be consolidated into a shared test utility class or a common base class method to improve maintainability.

689-695: Custom assertDoesNotThrow helper is duplicated.

This helper is also present in WildcardFieldMapperTests.java. Consider using JUnit's built-in capabilities or consolidating this into a shared test utility. Additionally, the current implementation catches Exception but not Throwable, which means AssertionError and other errors won't be caught.
server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java (1)
27-30: Consider defensive copy of the fetchers list.

The constructor stores a direct reference to the passed list. If the caller modifies the list after construction, it could lead to unexpected behavior.
 public CompositeFieldValueFetcher(String simpleName, List<FieldValueFetcher> fieldValueFetchers) {
     super(simpleName);
-    this.fieldValueFetchers = fieldValueFetchers;
+    this.fieldValueFetchers = List.copyOf(fieldValueFetchers);
 }
server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (1)

963-1015: DerivedSourceHelper is duplicated across mappers.

The DerivedSourceHelper inner class is nearly identical to the one in KeywordFieldMapper.java (lines 959-1004 in the relevant code snippet). Consider extracting this to a shared utility class to reduce code duplication and improve maintainability.
server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java (2)
438-471: Duplicated helper method across test classes.

getMapperServiceForDerivedSource() is nearly identical to the one in KeywordFieldMapperTests.java. This should be consolidated into a shared test utility to avoid maintenance burden.

Additionally, line 456 has the same issue noted in KeywordFieldMapperTests.java - getIndexSettings() may return different settings than the locally constructed indexSettings with derived source enabled.
-        ScriptService scriptService = new ScriptService(getIndexSettings(), scriptModule.engines, scriptModule.contexts);
+        ScriptService scriptService = new ScriptService(indexSettings.getSettings(), scriptModule.engines, scriptModule.contexts);
500-506: Duplicated assertDoesNotThrow helper.

Same as in KeywordFieldMapperTests.java - consider consolidating into a shared test utility.
server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java (1)
1197-1213: Consider adding assertion for sub-keyword doc_values presence.

The test verifies that no stored field is added when the value fits within ignore_above, but it doesn't verify that the keyword sub-field's doc_values would actually contain the value. Consider adding an assertion to confirm the keyword field is indexed properly.
         // Verify stored field is not added
         TextFieldMapper textFieldMapper = (TextFieldMapper) mapperService.documentMapper().mappers().getMapper("field");
         IndexableField[] storedFields = doc.rootDoc().getFields(textFieldMapper.fieldType().derivedSourceIgnoreFieldName());
         assertEquals(0, storedFields.length);
+
+        // Verify keyword sub-field has doc_values
+        IndexableField[] keywordFields = doc.rootDoc().getFields("field.keyword");
+        assertTrue(keywordFields.length > 0);
     }
server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java (1)
156-190: Unused multiFields parameter in buildFieldType.

The multiFields parameter is added to the method signature for interface consistency with TextFieldMapper.Builder.buildFieldType(), but it's not used in the method body. This is likely intentional since MatchOnlyTextFieldMapper may not need derived-source keyword support.

Consider adding a brief comment explaining why multiFields is unused, or if derived-source keyword support should be added for match_only_text fields in the future.
     @Override
-    protected MatchOnlyTextFieldType buildFieldType(FieldType fieldType, MultiFields multiFields, BuilderContext context) {
+    protected MatchOnlyTextFieldType buildFieldType(FieldType fieldType, MultiFields multiFields, BuilderContext context) {
+        // Note: multiFields parameter unused - match_only_text does not support derived source keyword optimization
         NamedAnalyzer indexAnalyzer = analyzers.getIndexAnalyzer();
server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (3)
1276-1319: Review the anonymous MappedFieldType implementation for ignored field fetching.

The anonymous MappedFieldType at lines 1286-1308 returns null from valueFetcher() and termQuery(). While this works for the current use case (only StoredFieldFetcher uses isStored()), this could cause issues if these methods are called unexpectedly.

Consider using a minimal implementation that throws UnsupportedOperationException for methods that shouldn't be called, making debugging easier if misuse occurs.
             @Override
             public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchLookup, String format) {
-                return null;
+                throw new UnsupportedOperationException("valueFetcher not supported for derived source ignore field");
             }

             @Override
             public Query termQuery(Object value, QueryShardContext context) {
-                return null;
+                throw new UnsupportedOperationException("termQuery not supported for derived source ignore field");
             }
1334-1338: Consider clarifying the KEYWORD_ANALYZER check.

The condition Lucene.KEYWORD_ANALYZER.equals(keywordFieldMapper.fieldType().normalizer()) allows keywords with KEYWORD_ANALYZER as normalizer. This might be intentional (treating no-op normalizer as acceptable), but could be confusing.

Consider adding a comment explaining this behavior:
     private static boolean isDerivedSourceSupportedKeyword(KeywordFieldMapper keywordFieldMapper) {
+        // A keyword supports derived source if it has no normalizer (or uses the identity KEYWORD_ANALYZER)
+        // and has either stored field or doc_values enabled
         return (keywordFieldMapper.fieldType().normalizer() == null
             || Lucene.KEYWORD_ANALYZER.equals(keywordFieldMapper.fieldType().normalizer()))
             && (keywordFieldMapper.fieldType().isStored() || keywordFieldMapper.fieldType().hasDocValues());
     }
1340-1350: Default value of -1 for ignoredLength may cause unexpected behavior.

If no keyword sub-fields support derived source, getIgnoredLengthForDerivedSourceSupportedKeyword returns -1. This is later compared with value.length() in parseCreateField. Since any string length is > -1, this would trigger storing in the ignore field, which is correct behavior.

However, the use of -1 as a sentinel value is implicit. Consider documenting this or using a named constant.
+    // Sentinel value indicating no derived-source-supported keyword exists
+    private static final int NO_KEYWORD_IGNORE_LENGTH = -1;
+
     private static int getIgnoredLengthForDerivedSourceSupportedKeyword(MultiFields multiFields) {
-        int ignoredLength = -1;
+        int ignoredLength = NO_KEYWORD_IGNORE_LENGTH;
         for (Mapper mapper : multiFields) {

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f8d381 and 88ac318.

📒 Files selected for processing (17)

CHANGELOG.md (1 hunks)
server/src/internalClusterTest/java/org/opensearch/get/GetActionIT.java (3 hunks)
server/src/internalClusterTest/java/org/opensearch/recovery/FullRollingRestartIT.java (3 hunks)
server/src/internalClusterTest/java/org/opensearch/search/simple/SimpleSearchIT.java (1 hunks)
server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java (1 hunks)
server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java (1 hunks)
server/src/main/java/org/opensearch/index/mapper/FieldValueFetcher.java (1 hunks)
server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java (6 hunks)
server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java (1 hunks)
server/src/main/java/org/opensearch/index/mapper/StringFieldType.java (2 hunks)
server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (8 hunks)
server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (4 hunks)
server/src/test/java/org/opensearch/index/mapper/CompositeFieldValueFetcherTests.java (1 hunks)
server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java (5 hunks)
server/src/test/java/org/opensearch/index/mapper/ObjectMapperTests.java (6 hunks)
server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java (3 hunks)
server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (3)

server/src/main/java/org/opensearch/common/lucene/Lucene.java (1)

Lucene (114-992)

server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (1)

DerivedSourceHelper (1321-1366)

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java (1)

DerivedSourceHelper (960-1005)

server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java (2)

server/src/main/java/org/opensearch/indices/IndicesModule.java (1)

IndicesModule (108-315)

server/src/main/java/org/opensearch/script/ScriptModule.java (1)

ScriptModule (54-127)

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java (3)

server/src/main/java/org/opensearch/common/lucene/Lucene.java (1)

Lucene (114-992)

server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (1)

DerivedSourceHelper (1321-1366)

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (1)

DerivedSourceHelper (963-1015)

server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java (2)

server/src/main/java/org/opensearch/indices/IndicesModule.java (1)

IndicesModule (108-315)

server/src/main/java/org/opensearch/script/ScriptModule.java (1)

ScriptModule (54-127)

🔇 Additional comments (27)

server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java (1)

934-936: Verify text field behavior and mapping completeness for derived source optimization.

This test's text_field mapping (lines 934-936) removed explicit storage after this change but lacks a sub-keyword field, which the PR objectives indicate should provide doc_values for derived source. The test assertions (lines 969, 983, 1000, 1015, 1039, 1049) continue to expect text_field values to be retrievable.

Clarify:

Whether the implementation provides a fallback mechanism for text fields without sub-keyword fields in derived source mode

Whether this test should include a sub-keyword field to properly validate the optimization (e.g., "fields": {"keyword": {"type": "keyword"}})

server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java (1)

558-579: Good test coverage for derived source with ignore_above.

The test properly validates that values exceeding ignore_above are stored in the derived source ignore field and can be correctly retrieved during source derivation.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (2)

216-222: Approve derived source storage for ignored values.

The logic correctly stores values exceeding ignore_above in a special derived source ignore field when the normalizer condition is satisfied and derived source is enabled. This enables proper source reconstruction.

924-929: Good relaxation of canDeriveSourceInternal guards.

The updated condition now permits deriving source when the normalizer is null or is the keyword analyzer, which aligns with the PR objectives. The early return on line 926-927 appropriately throws for unsupported normalizers.

server/src/test/java/org/opensearch/index/mapper/CompositeFieldValueFetcherTests.java (1)

27-132: Good test coverage for CompositeFieldValueFetcher.

The tests comprehensively cover the key behaviors:

Priority-based fetching (first non-empty wins)

Early termination when values are found

Null/empty handling from all fetchers

Empty fetcher list edge case

IOException propagation

The mock setup with conversion validation ("CONVERTED:<value>") effectively verifies the conversion flow through inner fetchers.

server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java (1)

382-436: Good test coverage for wildcard field derived source scenarios.

The tests properly validate:

Values exceeding ignore_above are stored in the derived source ignore field

Values below threshold produce normal fields without the ignore field

Derived source can be reconstructed from the ignore field

server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java (1)

32-48: Verify callers handle null return value.

The fetch() method returns null when no fetcher yields values (line 47). Ensure that all callers of this method handle the null case appropriately to avoid NullPointerException.

server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java (5)

1093-1117: LGTM! Good test coverage for stored field derivation.

The test properly verifies that when store: true is set, the stored field is correctly retrieved via deriveSource(). The test setup, document parsing, and assertions are well-structured.

1119-1131: LGTM! Validates keyword sub-field detection for derived source.

Test correctly verifies that a text field with a keyword sub-field sets hasDerivedSourceSupportedKeyword to true and defaults keywordIgnoredLengthForDerivedSource to Integer.MAX_VALUE when no ignore_above is specified.

1147-1161: LGTM! Tests max ignore_above selection across multiple keywords.

Correctly verifies that when multiple keyword sub-fields exist with different ignore_above values, the maximum value (20) is selected, which aligns with the PR objective to minimize duplication.

1163-1175: LGTM! Tests normalized keyword exclusion.

Correctly verifies that a keyword sub-field with a normalizer is not considered a valid derived-source-supported keyword, which is necessary since normalization would alter the original value.

1177-1195: LGTM! Validates ignore field storage for long text exceeding ignore_above.

Test correctly verifies that when text length exceeds the keyword's ignore_above threshold, a stored field is added to the special derived-source ignore field for fallback retrieval.

server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java (1)

140-154: LGTM! Consistent wiring of MultiFields.

The refactoring correctly builds MultiFields once and passes it through to both buildFieldType and the constructor, avoiding redundant builds and maintaining consistency with TextFieldMapper.

server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (4)

399-428: LGTM! Proper integration of derived source keyword detection during field type build.

The logic correctly:

Checks if derived source is enabled via index settings

Detects if any keyword sub-field supports derived source

Captures the maximum ignore_above value for threshold comparison

1079-1087: Potential performance concern: stored field added on every parse when conditions met.

The condition fieldType().getKeywordIgnoredLengthForDerivedSource() < value.length() correctly identifies values exceeding the threshold. However, consider caching fieldType().getHasDerivedSourceSupportedKeyword() and fieldType().getKeywordIgnoredLengthForDerivedSource() as local variables if this path is hot, though the current approach is readable and correct.

1321-1366: LGTM! Well-structured helper class for derived source keyword detection.

The DerivedSourceHelper inner class cleanly encapsulates:

Detection of supported keyword sub-fields (no normalizer, has stored or doc_values)

Finding maximum ignore_above across keywords

Creating value fetchers for supported keywords

The logic correctly excludes normalized keywords since normalization would alter the original text value.

1313-1318: Verify FieldValueType.DOC_VALUES is appropriate here.

The getDerivedFieldPreference() returns DOC_VALUES, but the composite fetcher includes stored fields (from the ignore field and potentially from keyword stored fields). Ensure this preference correctly reflects the primary source for value retrieval, as the choice between doc_values and stored fields can impact performance and behavior.

server/src/internalClusterTest/java/org/opensearch/recovery/FullRollingRestartIT.java (1)

419-420: Removing store: true from text_field mappings is consistent with the new derived-source model

These derived-source rolling-restart tests only validate _source contents via getSourceAsMap and do not rely on text_field being a stored field. Dropping "store": true on text_field in all three mappings reduces redundant stored data and matches the direction of deriving text values from other field structures rather than from a stored text field itself.

Also applies to: 461-462, 546-547

server/src/internalClusterTest/java/org/opensearch/search/simple/SimpleSearchIT.java (1)

780-784: Derived-source mapping cleanup for text_field and wildcard_field looks sound

Dropping "store": true from text_field avoids redundant stored payloads now that derived source no longer depends on a stored text field here.

Relying on the wildcard field’s default doc_values instead of forcing "doc_values": true keeps the test aligned with the mapper’s defaults while preserving the intended derived-source behavior.

server/src/internalClusterTest/java/org/opensearch/get/GetActionIT.java (1)

827-833: Good targeted coverage of ignore_above via text-field keyword sub-field in derived source

The updated mapping, document value, and validateDeriveSource expectations together exercise the new behavior where:

text_field is not stored itself but has a keyword_field sub-field with ignore_above: 20.

The indexed text exceeds that threshold, ensuring the derived-source pipeline must handle ignored keyword values correctly while still reconstructing the full text_field string.

Re-checking before/after refresh and flush verifies the behavior across translog and on-disk retrieval.

This is a solid regression/behavioral test for the new derived-source strategy.

Also applies to: 856-856, 1087-1097

CHANGELOG.md (1)

77-77: Changelog entry accurately reflects the derived-source enhancements

The new bullet concisely documents the ignore_above support for keyword/wildcard and the text-field optimization under derived source, correctly linked to this PR.

server/src/main/java/org/opensearch/index/mapper/FieldValueFetcher.java (1)

53-56: Null-safe guard in FieldValueFetcher.write is appropriate

Changing the early return to values == null || values.isEmpty() makes the writer robust to fetchers that may return null instead of an empty list, which is particularly relevant now that composite fetchers are in play. No behavior change for existing non-null callers.

server/src/main/java/org/opensearch/index/mapper/StringFieldType.java (1)

69-69: Centralized ignore-value field naming for derived source looks good

Defining IGNORED_VALUE_FIELD_SUFFIX and exposing derivedSourceIgnoreFieldName() on StringFieldType standardizes the internal ignore-field naming and is reused by the derived-source helpers. This keeps the pattern consistent across keyword/wildcard (and any other string-like) mappers.

Also applies to: 260-262

server/src/test/java/org/opensearch/index/mapper/ObjectMapperTests.java (1)

695-825: Derived-source mapper validation tests now align well with the new rules

The updated testDeriveSourceMapperValidation covers the key constraints:

Nested fields and copy_to correctly rejected under index.derived_source.enabled = true.

Mappings with both doc_values and store disabled fail fast.

A broad multi-type mapping (including text and wildcard without extra overrides) validates the supported set.

Unsupported types like geo_shape are explicitly asserted to fail.

This gives good regression coverage for the refined derived-source eligibility logic.

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java (3)

274-279: Derived-source eligibility and composite fetcher wiring for keyword fields are coherent

canDeriveSourceInternal now rejects fields with a non-default normalizer (anything other than null/KEYWORD_ANALYZER), which matches the requirement that we only derive source when we can faithfully reconstruct the original string.

derivedFieldGenerator() builds a CompositeFieldValueFetcher that:

Uses doc values when available, falling back to stored fields otherwise.

Adds a second fetcher targeting the ignore-value field, so values dropped by ignore_above remain reachable for derived source.

Declares a DOC_VALUES preference, which is consistent with the overall design.

The split between eligibility checks and the composite fetcher keeps the responsibilities clear.

Also applies to: 295-312

884-899: parseCreateField correctly persists ignored values for derived source while preserving ignore_above semantics

The updated parseCreateField logic:

Early-returns on value == null, avoiding unnecessary work.

For value.length() > ignoreAbove, still skips indexing (maintaining the ignore_above contract) but, when:

the normalizer is null or KEYWORD_ANALYZER,

derived source is enabled on the index, and

we’re not inside a multi-field context,

it writes the raw value to a dedicated stored field named via fieldType().derivedSourceIgnoreFieldName().

This ensures that:

Queries and aggregations still ignore over-long values as before.

Derived source can reconstruct those values from the dedicated stored field on eligible top-level keyword fields without introducing behavior changes for other cases.

Also applies to: 904-917

960-1005: DerivedSourceHelper cleanly encapsulates primary and fallback fetchers for keyword-derived source

The new DerivedSourceHelper:

Produces the primary FieldValueFetcher using doc values when available or stored fields otherwise, and allows the caller to choose the logical “simple name” (e.g., mapping a keyword sub-field back to its parent text field).

Builds a dedicated MappedFieldType for the ignore-value field (using derivedSourceIgnoreFieldName()), with:

stored = true,

no search/query capabilities,

valueForDisplay decoding BytesRef to String,
and wraps it in a StoredFieldFetcher.

This abstraction is shared both by keyword’s own derived-field generation and by text-field helpers that derive text source from keyword sub-fields, keeping the ignore-field wiring consistent.

coderabbitai · 2025-12-09T05:51:01Z

server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java

+            Settings.EMPTY,
+            getPlugins().stream().filter(p -> p instanceof ScriptPlugin).map(p -> (ScriptPlugin) p).collect(toList())
+        );
+        ScriptService scriptService = new ScriptService(getIndexSettings(), scriptModule.engines, scriptModule.contexts);


⚠️ Potential issue | 🟡 Minor

Potential mismatch in ScriptService initialization.

getIndexSettings() returns settings from the base MapperTestCase, which may differ from the indexSettings variable being constructed with derived source enabled. This could lead to inconsistent behavior in tests.

Consider using the locally constructed indexSettings instead:

- ScriptService scriptService = new ScriptService(getIndexSettings(), scriptModule.engines, scriptModule.contexts); + ScriptService scriptService = new ScriptService(indexSettings.getSettings(), scriptModule.engines, scriptModule.contexts);

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java around line 632, the ScriptService is being constructed with getIndexSettings() (from the base test) which can differ from the locally built indexSettings that has derived source enabled; replace the call to getIndexSettings() with the locally constructed indexSettings variable so the ScriptService uses the same settings as the test, i.e., pass indexSettings to the ScriptService constructor and ensure the indexSettings variable is in scope at that line.

mgodwan · 2025-12-10T12:28:21Z

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java

        return new Builder(simpleName(), indexAnalyzers).init(this);
    }
+
+    static final class DerivedSourceHelper {


We don't need this separate namespace class

Removed this from all three mappers.

mgodwan · 2025-12-10T12:50:24Z

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java

+
+        // Explicitly add value as a stored field if value is getting ignored, to be able to derive the source
+        if (value.length() > ignoreAbove) {
+            if ((normalizer == null || Lucene.KEYWORD_ANALYZER.equals(normalizer))


Would this not work if splitting queries on whitespace is set as well? Should we just check for default?

Normalizer is not being controlled by splitting queries on whitespace mapping parameter, if normalizer is not configured explicitly then default keyword(doesn't change anything on the string) one is being used.

mgodwan · 2025-12-10T12:51:40Z

server/src/main/java/org/opensearch/index/mapper/StringFieldType.java

    }
+
+    public String derivedSourceIgnoreFieldName() {
+        return name() + IGNORED_VALUE_FIELD_SUFFIX;


This can cause issues for existing mappings where users may have created field with this name? Can we only support this feature for index created above the new version?

Also, we should block users from creating field with these naming conventions for indices created beyond this version to avoid naming conflicts.

Added version check as well as, added mapping validation in DocumentMapper to not allow fields matching with internal field name .ingored_value.

mgodwan · 2025-12-10T12:54:06Z

server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java


        @Override
-        protected MatchOnlyTextFieldType buildFieldType(FieldType fieldType, BuilderContext context) {
+        protected MatchOnlyTextFieldType buildFieldType(FieldType fieldType, MultiFields multiFields, BuilderContext context) {


Can this break any place where this method is used in plugins?

Validated only MatchOnlyTextFieldMapper is extending TextFieldMapper across packages.

mgodwan · 2025-12-10T12:55:52Z

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java

+
+                @Override
+                public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchLookup, String format) {
+                    return null;


Is this not expected to be called ever? Can we throw UnsupportedOperationException instead?

mgodwan · 2025-12-10T13:03:55Z

server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java

+            this,
+            simpleName()
+        );
+        final FieldValueFetcher fallbackFieldValueFetcher = KeywordFieldMapper.DerivedSourceHelper.getFallbackFieldValueFetcher(this);


Do we always need a fallbackValueFetcher? If normalizer/ignore_above is not set, would it only need primary?

Updated it to not always consider fallbackValueFetcher and only for normalizer/ignore_above case.

opensearch-trigger-bot · 2026-01-09T15:23:33Z

This PR is stalled because it has been open for 30 days with no activity.

…nore_above for keyword field Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

…eld mappers Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

github-actions · 2026-02-18T12:29:42Z

❌ Gradle check result for 659de98: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

tanik98 changed the title ~~Improve derived source storage footprint for text field and handle ig…~~ Improve storage footprint for text field under derived source and handle ig… Nov 28, 2025

This was referenced Nov 28, 2025

[AUTOCUT] Gradle Check Flaky Test Report for ReindexBasicTests #15826

Open

[AUTOCUT] Gradle Check Flaky Test Report for InternalEngineTests #15838

Open

tanik98 changed the title ~~Improve storage footprint for text field under derived source and handle ig…~~ Support ignore_above for keyword/wildcard field and optimise text field under derived source Dec 8, 2025

tanik98 force-pushed the derived-source-text-field branch from 9a2ff84 to de290a0 Compare December 8, 2025 19:09

tanik98 force-pushed the derived-source-text-field branch from de290a0 to 007aa20 Compare December 9, 2025 03:49

tanik98 force-pushed the derived-source-text-field branch from 007aa20 to 88ac318 Compare December 9, 2025 04:16

tanik98 marked this pull request as ready for review December 9, 2025 05:39

tanik98 requested review from Bukhtawar, CEHENKLE, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh and owaiskazi19 as code owners December 9, 2025 05:39

tanik98 requested review from a team, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners December 9, 2025 05:39

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

mgodwan requested changes Dec 10, 2025

View reviewed changes

opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Jan 9, 2026

tanik98 force-pushed the derived-source-text-field branch from 88ac318 to 68ae5c7 Compare February 18, 2026 10:20

tanik98 changed the title ~~Support ignore_above for keyword/wildcard field and optimise text field under derived source~~ [Derived Source] Support ignore_above/normalizer for keyword/wildcard field and optimise text field lookup Feb 18, 2026

tanik98 added 5 commits February 18, 2026 16:03

Improve derived source storage footprint for text field and handle ig…

1b67967

…nore_above for keyword field Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

Fix build

4ec8e88

Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

Add unit tests for ignore_above handling for keyword/wildcard/text fi…

2fccd39

…eld mappers Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

Fix text field mapper

d109ead

Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

Handling normalizer

659de98

Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>

tanik98 force-pushed the derived-source-text-field branch from 68ae5c7 to 659de98 Compare February 18, 2026 11:05

tanik98 requested a review from mgodwan February 19, 2026 03:57

neetikasinghal mentioned this pull request Mar 19, 2026

[FEATURE] Converge OpenSearch core and k-NN Derived Source feature opensearch-project/k-NN#3058

Open

Conversation

tanik98 commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

codecov bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

opensearch-trigger-bot bot commented Jan 9, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tanik98 commented Nov 28, 2025 •

edited

Loading

coderabbitai bot commented Nov 28, 2025 •

edited

Loading

codecov bot commented Dec 9, 2025 •

edited

Loading