[Derived Source] Support ignore_above/normalizer for keyword/wildcard field and optimise text field lookup#20113
Conversation
📝 WalkthroughWalkthroughThis PR extends support for the Changes
Sequence DiagramsequenceDiagram
participant Client
participant Parser as Field Parser
participant IgnoreField as Ignore Field Storage
participant Derived as Derived Source Generator
participant Composite as CompositeFieldValueFetcher
participant Primary as Primary Fetcher
participant Fallback as Fallback Fetcher
Client->>Parser: Index document with text exceeding ignore_above
Parser->>Parser: Check if derived source enabled & value > ignore_above
alt Value exceeds ignore_above threshold
Parser->>IgnoreField: Store value in derivedSourceIgnore field
Parser->>Derived: Skip normal indexing, return early
else Value within threshold
Parser->>Parser: Normal field indexing
end
Note over Client,Fallback: Later: Deriving source for document
Client->>Derived: Request derived source
Derived->>Composite: Create fetcher chain
Composite->>Primary: Try primary fetcher (doc values or stored)
alt Primary has value
Primary-->>Composite: Return value
Composite-->>Derived: Return converted value
else Primary empty
Composite->>Fallback: Try fallback fetcher (ignore field)
Fallback->>IgnoreField: Read from derivedSourceIgnore field
IgnoreField-->>Fallback: Return stored value
Fallback-->>Composite: Return converted value
Composite-->>Derived: Return value
end
Derived-->>Client: Return derived source with retrieved value
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
|
❌ Gradle check result for e4707a1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 6ad1447: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for c152793: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 9a2ff84: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
9a2ff84 to
de290a0
Compare
|
❌ Gradle check result for de290a0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
de290a0 to
007aa20
Compare
|
❌ Gradle check result for 007aa20: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
007aa20 to
88ac318
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20113 +/- ##
============================================
- Coverage 73.30% 73.26% -0.04%
- Complexity 71732 71761 +29
============================================
Files 5793 5794 +1
Lines 328056 328228 +172
Branches 47245 47285 +40
============================================
+ Hits 240476 240490 +14
- Misses 68264 68442 +178
+ Partials 19316 19296 -20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (1)
232-238: Dead code in else branch.The condition
if (fieldType().hasDocValues() == false)at line 235 is always true within thiselsebranch (since we reach it whenfieldType().hasDocValues()returns false at line 232). This makes the inner condition redundant.if (fieldType().hasDocValues()) { context.doc().add(new SortedSetDocValuesField(fieldType().name(), binaryValue)); } else { - if (fieldType().hasDocValues() == false) { - createFieldNamesField(context); - } + createFieldNamesField(context); }
🧹 Nitpick comments (11)
server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java (2)
614-647: Consider extracting the helper to a shared test utility.The
getMapperServiceForDerivedSource()method is nearly identical to the one inWildcardFieldMapperTests.java(lines 438-471). This code duplication could be consolidated into a shared test utility class or a common base class method to improve maintainability.
689-695: CustomassertDoesNotThrowhelper is duplicated.This helper is also present in
WildcardFieldMapperTests.java. Consider using JUnit's built-in capabilities or consolidating this into a shared test utility. Additionally, the current implementation catchesExceptionbut notThrowable, which meansAssertionErrorand other errors won't be caught.server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java (1)
27-30: Consider defensive copy of the fetchers list.The constructor stores a direct reference to the passed list. If the caller modifies the list after construction, it could lead to unexpected behavior.
public CompositeFieldValueFetcher(String simpleName, List<FieldValueFetcher> fieldValueFetchers) { super(simpleName); - this.fieldValueFetchers = fieldValueFetchers; + this.fieldValueFetchers = List.copyOf(fieldValueFetchers); }server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (1)
963-1015: DerivedSourceHelper is duplicated across mappers.The
DerivedSourceHelperinner class is nearly identical to the one inKeywordFieldMapper.java(lines 959-1004 in the relevant code snippet). Consider extracting this to a shared utility class to reduce code duplication and improve maintainability.server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java (2)
438-471: Duplicated helper method across test classes.
getMapperServiceForDerivedSource()is nearly identical to the one inKeywordFieldMapperTests.java. This should be consolidated into a shared test utility to avoid maintenance burden.Additionally, line 456 has the same issue noted in
KeywordFieldMapperTests.java-getIndexSettings()may return different settings than the locally constructedindexSettingswith derived source enabled.- ScriptService scriptService = new ScriptService(getIndexSettings(), scriptModule.engines, scriptModule.contexts); + ScriptService scriptService = new ScriptService(indexSettings.getSettings(), scriptModule.engines, scriptModule.contexts);
500-506: DuplicatedassertDoesNotThrowhelper.Same as in
KeywordFieldMapperTests.java- consider consolidating into a shared test utility.server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java (1)
1197-1213: Consider adding assertion for sub-keyword doc_values presence.The test verifies that no stored field is added when the value fits within
ignore_above, but it doesn't verify that the keyword sub-field's doc_values would actually contain the value. Consider adding an assertion to confirm the keyword field is indexed properly.// Verify stored field is not added TextFieldMapper textFieldMapper = (TextFieldMapper) mapperService.documentMapper().mappers().getMapper("field"); IndexableField[] storedFields = doc.rootDoc().getFields(textFieldMapper.fieldType().derivedSourceIgnoreFieldName()); assertEquals(0, storedFields.length); + + // Verify keyword sub-field has doc_values + IndexableField[] keywordFields = doc.rootDoc().getFields("field.keyword"); + assertTrue(keywordFields.length > 0); }server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java (1)
156-190: UnusedmultiFieldsparameter inbuildFieldType.The
multiFieldsparameter is added to the method signature for interface consistency withTextFieldMapper.Builder.buildFieldType(), but it's not used in the method body. This is likely intentional sinceMatchOnlyTextFieldMappermay not need derived-source keyword support.Consider adding a brief comment explaining why
multiFieldsis unused, or if derived-source keyword support should be added formatch_only_textfields in the future.@Override - protected MatchOnlyTextFieldType buildFieldType(FieldType fieldType, MultiFields multiFields, BuilderContext context) { + protected MatchOnlyTextFieldType buildFieldType(FieldType fieldType, MultiFields multiFields, BuilderContext context) { + // Note: multiFields parameter unused - match_only_text does not support derived source keyword optimization NamedAnalyzer indexAnalyzer = analyzers.getIndexAnalyzer();server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (3)
1276-1319: Review the anonymousMappedFieldTypeimplementation for ignored field fetching.The anonymous
MappedFieldTypeat lines 1286-1308 returnsnullfromvalueFetcher()andtermQuery(). While this works for the current use case (onlyStoredFieldFetcherusesisStored()), this could cause issues if these methods are called unexpectedly.Consider using a minimal implementation that throws
UnsupportedOperationExceptionfor methods that shouldn't be called, making debugging easier if misuse occurs.@Override public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchLookup, String format) { - return null; + throw new UnsupportedOperationException("valueFetcher not supported for derived source ignore field"); } @Override public Query termQuery(Object value, QueryShardContext context) { - return null; + throw new UnsupportedOperationException("termQuery not supported for derived source ignore field"); }
1334-1338: Consider clarifying theKEYWORD_ANALYZERcheck.The condition
Lucene.KEYWORD_ANALYZER.equals(keywordFieldMapper.fieldType().normalizer())allows keywords withKEYWORD_ANALYZERas normalizer. This might be intentional (treating no-op normalizer as acceptable), but could be confusing.Consider adding a comment explaining this behavior:
private static boolean isDerivedSourceSupportedKeyword(KeywordFieldMapper keywordFieldMapper) { + // A keyword supports derived source if it has no normalizer (or uses the identity KEYWORD_ANALYZER) + // and has either stored field or doc_values enabled return (keywordFieldMapper.fieldType().normalizer() == null || Lucene.KEYWORD_ANALYZER.equals(keywordFieldMapper.fieldType().normalizer())) && (keywordFieldMapper.fieldType().isStored() || keywordFieldMapper.fieldType().hasDocValues()); }
1340-1350: Default value of -1 forignoredLengthmay cause unexpected behavior.If no keyword sub-fields support derived source,
getIgnoredLengthForDerivedSourceSupportedKeywordreturns -1. This is later compared withvalue.length()inparseCreateField. Since any string length is > -1, this would trigger storing in the ignore field, which is correct behavior.However, the use of -1 as a sentinel value is implicit. Consider documenting this or using a named constant.
+ // Sentinel value indicating no derived-source-supported keyword exists + private static final int NO_KEYWORD_IGNORE_LENGTH = -1; + private static int getIgnoredLengthForDerivedSourceSupportedKeyword(MultiFields multiFields) { - int ignoredLength = -1; + int ignoredLength = NO_KEYWORD_IGNORE_LENGTH; for (Mapper mapper : multiFields) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
CHANGELOG.md(1 hunks)server/src/internalClusterTest/java/org/opensearch/get/GetActionIT.java(3 hunks)server/src/internalClusterTest/java/org/opensearch/recovery/FullRollingRestartIT.java(3 hunks)server/src/internalClusterTest/java/org/opensearch/search/simple/SimpleSearchIT.java(1 hunks)server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java(1 hunks)server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java(1 hunks)server/src/main/java/org/opensearch/index/mapper/FieldValueFetcher.java(1 hunks)server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java(6 hunks)server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java(1 hunks)server/src/main/java/org/opensearch/index/mapper/StringFieldType.java(2 hunks)server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java(8 hunks)server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java(4 hunks)server/src/test/java/org/opensearch/index/mapper/CompositeFieldValueFetcherTests.java(1 hunks)server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java(5 hunks)server/src/test/java/org/opensearch/index/mapper/ObjectMapperTests.java(6 hunks)server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java(3 hunks)server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (3)
server/src/main/java/org/opensearch/common/lucene/Lucene.java (1)
Lucene(114-992)server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (1)
DerivedSourceHelper(1321-1366)server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java (1)
DerivedSourceHelper(960-1005)
server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java (2)
server/src/main/java/org/opensearch/indices/IndicesModule.java (1)
IndicesModule(108-315)server/src/main/java/org/opensearch/script/ScriptModule.java (1)
ScriptModule(54-127)
server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java (3)
server/src/main/java/org/opensearch/common/lucene/Lucene.java (1)
Lucene(114-992)server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (1)
DerivedSourceHelper(1321-1366)server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (1)
DerivedSourceHelper(963-1015)
server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java (2)
server/src/main/java/org/opensearch/indices/IndicesModule.java (1)
IndicesModule(108-315)server/src/main/java/org/opensearch/script/ScriptModule.java (1)
ScriptModule(54-127)
🔇 Additional comments (27)
server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java (1)
934-936: Verify text field behavior and mapping completeness for derived source optimization.This test's
text_fieldmapping (lines 934-936) removed explicit storage after this change but lacks a sub-keyword field, which the PR objectives indicate should provide doc_values for derived source. The test assertions (lines 969, 983, 1000, 1015, 1039, 1049) continue to expecttext_fieldvalues to be retrievable.Clarify:
- Whether the implementation provides a fallback mechanism for text fields without sub-keyword fields in derived source mode
- Whether this test should include a sub-keyword field to properly validate the optimization (e.g.,
"fields": {"keyword": {"type": "keyword"}})server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java (1)
558-579: Good test coverage for derived source with ignore_above.The test properly validates that values exceeding
ignore_aboveare stored in the derived source ignore field and can be correctly retrieved during source derivation.server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java (2)
216-222: Approve derived source storage for ignored values.The logic correctly stores values exceeding
ignore_abovein a special derived source ignore field when the normalizer condition is satisfied and derived source is enabled. This enables proper source reconstruction.
924-929: Good relaxation of canDeriveSourceInternal guards.The updated condition now permits deriving source when the normalizer is null or is the keyword analyzer, which aligns with the PR objectives. The early return on line 926-927 appropriately throws for unsupported normalizers.
server/src/test/java/org/opensearch/index/mapper/CompositeFieldValueFetcherTests.java (1)
27-132: Good test coverage for CompositeFieldValueFetcher.The tests comprehensively cover the key behaviors:
- Priority-based fetching (first non-empty wins)
- Early termination when values are found
- Null/empty handling from all fetchers
- Empty fetcher list edge case
- IOException propagation
The mock setup with conversion validation (
"CONVERTED:<value>") effectively verifies the conversion flow through inner fetchers.server/src/test/java/org/opensearch/index/mapper/WildcardFieldMapperTests.java (1)
382-436: Good test coverage for wildcard field derived source scenarios.The tests properly validate:
- Values exceeding
ignore_aboveare stored in the derived source ignore field- Values below threshold produce normal fields without the ignore field
- Derived source can be reconstructed from the ignore field
server/src/main/java/org/opensearch/index/mapper/CompositeFieldValueFetcher.java (1)
32-48: Verify callers handle null return value.The
fetch()method returnsnullwhen no fetcher yields values (line 47). Ensure that all callers of this method handle the null case appropriately to avoidNullPointerException.server/src/test/java/org/opensearch/index/mapper/TextFieldMapperTests.java (5)
1093-1117: LGTM! Good test coverage for stored field derivation.The test properly verifies that when
store: trueis set, the stored field is correctly retrieved viaderiveSource(). The test setup, document parsing, and assertions are well-structured.
1119-1131: LGTM! Validates keyword sub-field detection for derived source.Test correctly verifies that a text field with a keyword sub-field sets
hasDerivedSourceSupportedKeywordto true and defaultskeywordIgnoredLengthForDerivedSourcetoInteger.MAX_VALUEwhen noignore_aboveis specified.
1147-1161: LGTM! Tests max ignore_above selection across multiple keywords.Correctly verifies that when multiple keyword sub-fields exist with different
ignore_abovevalues, the maximum value (20) is selected, which aligns with the PR objective to minimize duplication.
1163-1175: LGTM! Tests normalized keyword exclusion.Correctly verifies that a keyword sub-field with a normalizer is not considered a valid derived-source-supported keyword, which is necessary since normalization would alter the original value.
1177-1195: LGTM! Validates ignore field storage for long text exceeding ignore_above.Test correctly verifies that when text length exceeds the keyword's
ignore_abovethreshold, a stored field is added to the special derived-source ignore field for fallback retrieval.server/src/main/java/org/opensearch/index/mapper/MatchOnlyTextFieldMapper.java (1)
140-154: LGTM! Consistent wiring of MultiFields.The refactoring correctly builds
MultiFieldsonce and passes it through to bothbuildFieldTypeand the constructor, avoiding redundant builds and maintaining consistency withTextFieldMapper.server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java (4)
399-428: LGTM! Proper integration of derived source keyword detection during field type build.The logic correctly:
- Checks if derived source is enabled via index settings
- Detects if any keyword sub-field supports derived source
- Captures the maximum
ignore_abovevalue for threshold comparison
1079-1087: Potential performance concern: stored field added on every parse when conditions met.The condition
fieldType().getKeywordIgnoredLengthForDerivedSource() < value.length()correctly identifies values exceeding the threshold. However, consider cachingfieldType().getHasDerivedSourceSupportedKeyword()andfieldType().getKeywordIgnoredLengthForDerivedSource()as local variables if this path is hot, though the current approach is readable and correct.
1321-1366: LGTM! Well-structured helper class for derived source keyword detection.The
DerivedSourceHelperinner class cleanly encapsulates:
- Detection of supported keyword sub-fields (no normalizer, has stored or doc_values)
- Finding maximum
ignore_aboveacross keywords- Creating value fetchers for supported keywords
The logic correctly excludes normalized keywords since normalization would alter the original text value.
1313-1318: VerifyFieldValueType.DOC_VALUESis appropriate here.The
getDerivedFieldPreference()returnsDOC_VALUES, but the composite fetcher includes stored fields (from the ignore field and potentially from keyword stored fields). Ensure this preference correctly reflects the primary source for value retrieval, as the choice between doc_values and stored fields can impact performance and behavior.server/src/internalClusterTest/java/org/opensearch/recovery/FullRollingRestartIT.java (1)
419-420: Removingstore: truefromtext_fieldmappings is consistent with the new derived-source modelThese derived-source rolling-restart tests only validate
_sourcecontents viagetSourceAsMapand do not rely ontext_fieldbeing a stored field. Dropping"store": trueontext_fieldin all three mappings reduces redundant stored data and matches the direction of deriving text values from other field structures rather than from a stored text field itself.Also applies to: 461-462, 546-547
server/src/internalClusterTest/java/org/opensearch/search/simple/SimpleSearchIT.java (1)
780-784: Derived-source mapping cleanup fortext_fieldandwildcard_fieldlooks sound
- Dropping
"store": truefromtext_fieldavoids redundant stored payloads now that derived source no longer depends on a stored text field here.- Relying on the wildcard field’s default
doc_valuesinstead of forcing"doc_values": truekeeps the test aligned with the mapper’s defaults while preserving the intended derived-source behavior.server/src/internalClusterTest/java/org/opensearch/get/GetActionIT.java (1)
827-833: Good targeted coverage ofignore_abovevia text-field keyword sub-field in derived sourceThe updated mapping, document value, and
validateDeriveSourceexpectations together exercise the new behavior where:
text_fieldis not stored itself but has akeyword_fieldsub-field withignore_above: 20.- The indexed text exceeds that threshold, ensuring the derived-source pipeline must handle ignored keyword values correctly while still reconstructing the full
text_fieldstring.- Re-checking before/after refresh and flush verifies the behavior across translog and on-disk retrieval.
This is a solid regression/behavioral test for the new derived-source strategy.
Also applies to: 856-856, 1087-1097
CHANGELOG.md (1)
77-77: Changelog entry accurately reflects the derived-source enhancementsThe new bullet concisely documents the
ignore_abovesupport for keyword/wildcard and the text-field optimization under derived source, correctly linked to this PR.server/src/main/java/org/opensearch/index/mapper/FieldValueFetcher.java (1)
53-56: Null-safe guard inFieldValueFetcher.writeis appropriateChanging the early return to
values == null || values.isEmpty()makes the writer robust to fetchers that may returnnullinstead of an empty list, which is particularly relevant now that composite fetchers are in play. No behavior change for existing non-null callers.server/src/main/java/org/opensearch/index/mapper/StringFieldType.java (1)
69-69: Centralized ignore-value field naming for derived source looks goodDefining
IGNORED_VALUE_FIELD_SUFFIXand exposingderivedSourceIgnoreFieldName()onStringFieldTypestandardizes the internal ignore-field naming and is reused by the derived-source helpers. This keeps the pattern consistent across keyword/wildcard (and any other string-like) mappers.Also applies to: 260-262
server/src/test/java/org/opensearch/index/mapper/ObjectMapperTests.java (1)
695-825: Derived-source mapper validation tests now align well with the new rulesThe updated
testDeriveSourceMapperValidationcovers the key constraints:
- Nested fields and
copy_tocorrectly rejected underindex.derived_source.enabled = true.- Mappings with both
doc_valuesandstoredisabled fail fast.- A broad multi-type mapping (including
textandwildcardwithout extra overrides) validates the supported set.- Unsupported types like
geo_shapeare explicitly asserted to fail.This gives good regression coverage for the refined derived-source eligibility logic.
server/src/main/java/org/opensearch/index/mapper/KeywordFieldMapper.java (3)
274-279: Derived-source eligibility and composite fetcher wiring for keyword fields are coherent
canDeriveSourceInternalnow rejects fields with a non-default normalizer (anything other thannull/KEYWORD_ANALYZER), which matches the requirement that we only derive source when we can faithfully reconstruct the original string.derivedFieldGenerator()builds aCompositeFieldValueFetcherthat:
- Uses doc values when available, falling back to stored fields otherwise.
- Adds a second fetcher targeting the ignore-value field, so values dropped by
ignore_aboveremain reachable for derived source.- Declares a DOC_VALUES preference, which is consistent with the overall design.
The split between eligibility checks and the composite fetcher keeps the responsibilities clear.
Also applies to: 295-312
884-899:parseCreateFieldcorrectly persists ignored values for derived source while preservingignore_abovesemanticsThe updated
parseCreateFieldlogic:
Early-returns on
value == null, avoiding unnecessary work.For
value.length() > ignoreAbove, still skips indexing (maintaining theignore_abovecontract) but, when:
- the normalizer is
nullorKEYWORD_ANALYZER,- derived source is enabled on the index, and
- we’re not inside a multi-field context,
it writes the raw value to a dedicated stored field named via
fieldType().derivedSourceIgnoreFieldName().This ensures that:
- Queries and aggregations still ignore over-long values as before.
- Derived source can reconstruct those values from the dedicated stored field on eligible top-level keyword fields without introducing behavior changes for other cases.
Also applies to: 904-917
960-1005:DerivedSourceHelpercleanly encapsulates primary and fallback fetchers for keyword-derived sourceThe new
DerivedSourceHelper:
- Produces the primary
FieldValueFetcherusing doc values when available or stored fields otherwise, and allows the caller to choose the logical “simple name” (e.g., mapping a keyword sub-field back to its parent text field).- Builds a dedicated
MappedFieldTypefor the ignore-value field (usingderivedSourceIgnoreFieldName()), with:
stored = true,- no search/query capabilities,
valueForDisplaydecodingBytesReftoString,
and wraps it in aStoredFieldFetcher.This abstraction is shared both by keyword’s own derived-field generation and by text-field helpers that derive text source from keyword sub-fields, keeping the ignore-field wiring consistent.
| Settings.EMPTY, | ||
| getPlugins().stream().filter(p -> p instanceof ScriptPlugin).map(p -> (ScriptPlugin) p).collect(toList()) | ||
| ); | ||
| ScriptService scriptService = new ScriptService(getIndexSettings(), scriptModule.engines, scriptModule.contexts); |
There was a problem hiding this comment.
Potential mismatch in ScriptService initialization.
getIndexSettings() returns settings from the base MapperTestCase, which may differ from the indexSettings variable being constructed with derived source enabled. This could lead to inconsistent behavior in tests.
Consider using the locally constructed indexSettings instead:
- ScriptService scriptService = new ScriptService(getIndexSettings(), scriptModule.engines, scriptModule.contexts);
+ ScriptService scriptService = new ScriptService(indexSettings.getSettings(), scriptModule.engines, scriptModule.contexts);Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In server/src/test/java/org/opensearch/index/mapper/KeywordFieldMapperTests.java
around line 632, the ScriptService is being constructed with getIndexSettings()
(from the base test) which can differ from the locally built indexSettings that
has derived source enabled; replace the call to getIndexSettings() with the
locally constructed indexSettings variable so the ScriptService uses the same
settings as the test, i.e., pass indexSettings to the ScriptService constructor
and ensure the indexSettings variable is in scope at that line.
| return new Builder(simpleName(), indexAnalyzers).init(this); | ||
| } | ||
|
|
||
| static final class DerivedSourceHelper { |
There was a problem hiding this comment.
We don't need this separate namespace class
There was a problem hiding this comment.
Removed this from all three mappers.
|
|
||
| // Explicitly add value as a stored field if value is getting ignored, to be able to derive the source | ||
| if (value.length() > ignoreAbove) { | ||
| if ((normalizer == null || Lucene.KEYWORD_ANALYZER.equals(normalizer)) |
There was a problem hiding this comment.
Would this not work if splitting queries on whitespace is set as well? Should we just check for default?
There was a problem hiding this comment.
Normalizer is not being controlled by splitting queries on whitespace mapping parameter, if normalizer is not configured explicitly then default keyword(doesn't change anything on the string) one is being used.
| } | ||
|
|
||
| public String derivedSourceIgnoreFieldName() { | ||
| return name() + IGNORED_VALUE_FIELD_SUFFIX; |
There was a problem hiding this comment.
This can cause issues for existing mappings where users may have created field with this name? Can we only support this feature for index created above the new version?
There was a problem hiding this comment.
Also, we should block users from creating field with these naming conventions for indices created beyond this version to avoid naming conflicts.
There was a problem hiding this comment.
Added version check as well as, added mapping validation in DocumentMapper to not allow fields matching with internal field name .ingored_value.
|
|
||
| @Override | ||
| protected MatchOnlyTextFieldType buildFieldType(FieldType fieldType, BuilderContext context) { | ||
| protected MatchOnlyTextFieldType buildFieldType(FieldType fieldType, MultiFields multiFields, BuilderContext context) { |
There was a problem hiding this comment.
Can this break any place where this method is used in plugins?
There was a problem hiding this comment.
Validated only MatchOnlyTextFieldMapper is extending TextFieldMapper across packages.
|
|
||
| @Override | ||
| public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchLookup, String format) { | ||
| return null; |
There was a problem hiding this comment.
Is this not expected to be called ever? Can we throw UnsupportedOperationException instead?
| this, | ||
| simpleName() | ||
| ); | ||
| final FieldValueFetcher fallbackFieldValueFetcher = KeywordFieldMapper.DerivedSourceHelper.getFallbackFieldValueFetcher(this); |
There was a problem hiding this comment.
Do we always need a fallbackValueFetcher? If normalizer/ignore_above is not set, would it only need primary?
There was a problem hiding this comment.
Updated it to not always consider fallbackValueFetcher and only for normalizer/ignore_above case.
|
This PR is stalled because it has been open for 30 days with no activity. |
88ac318 to
68ae5c7
Compare
…nore_above for keyword field Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>
…eld mappers Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>
Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>
Signed-off-by: Tanik Pansuriya <panbhai@amazon.com>
68ae5c7 to
659de98
Compare
|
❌ Gradle check result for 659de98: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
ignore_aboveby explicitly storing values exceeding that and fornoarmalizerstore value in all cases, and use stored field for using it as a fallbackdoc_values) instead of explicitly storing the value, this way we can avoid storing duplicates(as adoc_valuesunder sub keyword andstored fieldunder text field).Summary by CodeRabbit
Release Notes
New Features
ignore_aboveparameter on keyword and wildcard fields.Bug Fixes
ignore_abovethreshold for keyword and wildcard fields with derived source enabled.✏️ Tip: You can customize this high-level summary in your review settings.