Validation of the `_source` to reject contradicting and ambiguous requests by urmichm · Pull Request #20742 · opensearch-project/OpenSearch

urmichm · 2026-02-27T11:42:54Z

Description

Validation of the _source object added to avoid confusion and reject contradicting and ambiguous requests:

"_source": {} At lease one of includes or excludes must be defined. 🚫
"_source": [] Explicitly defined empty array of includes is not allowed. 🚫
"_source": { "includes": "text", "excludes": ["title", "text"] } The text field is defined in both includes and excludes. Contradiction. 🚫
"_source": { "includes": [], "excludes": ["title"] } or _source: { "includes": ["title"], "excludes": [] } Explicitly defined empty arrays are not allowed to avoid confusion. 🚫

To include the whole _source object or to exclude it completely, the existing boolean logic is encouraged.
"_source": true and "_source": false

Unit tests added to test behaviour of parsing implementation.
Existing unit tests have been extended to cover invalid cases.

Related Issues

Resolves #20612

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-02-27T11:44:18Z

PR Reviewer Guide 🔍

(Review updated until commit `68fd820`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Core _source validation logic and tests Relevant files: server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java server/src/test/java/org/opensearch/search/fetch/subphase/FetchSourceContextTests.java rest-api-spec/src/main/resources/rest-api-spec/test/search/10_source_filtering.yml CHANGELOG.md Sub-PR theme: gRPC proto utils test for ambiguous _source validation Relevant files: modules/transport-grpc/src/test/java/org/opensearch/transport/grpc/proto/request/common/FetchSourceContextProtoUtilsTests.java
⚡ Recommended focus areas for review Breaking Change The `validateAmbiguousFields()` call in the constructor means that any existing serialized `FetchSourceContext` objects deserialized via `StreamInput` (the `StreamInput` constructor) will also trigger validation. If persisted cluster state or in-flight requests contain overlapping includes/excludes (even if previously accepted), deserialization will now throw an `OpenSearchException`, potentially causing node failures or rolling-upgrade issues. public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); } Empty Object Behavior Change Previously, `_source: {}` was silently accepted and treated as fetch-all. Now it throws a `ParsingException`. This is a breaking change for existing users who may have `_source: {}` in their queries. The skip version in the YAML test (`3.5.0`) suggests this is intentional, but there is no deprecation period — it goes straight to rejection. if (includes.isEmpty() && excludes.isEmpty()) { // no valid field names -> empty or unrecognized fields; not allowed throw new ParsingException( parser.getTokenLocation(), "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" ); } Empty Array Breaking Change Previously, `_source: []` was accepted and treated as an empty includes list (fetch all). Now it throws a `ParsingException`. Existing queries using `_source: []` will break without warning. The `parseSourceFieldArray` method enforces at least one element, which is a stricter contract than before. if (sourceArr.isEmpty()) { throw new ParsingException( parser.getTokenLocation(), "Expected at least one value for an array of [" + parseField.getPreferredName() + "]" ); } Unreachable Code The `fromXContent` method uses a `switch` expression that covers all `XContentParser.Token` cases and always either returns or throws. The comment `// MUST never reach here` at line 195 is correct but the method has no return statement after the switch, which will cause a compile error unless the compiler can prove exhaustiveness. If the switch does not cover all enum values (e.g., a future token type), this will fail at runtime with no clear error. public static FetchSourceContext fromXContent(XContentParser parser) throws IOException { XContentParser.Token token = parser.currentToken(); switch (token) { case XContentParser.Token.VALUE_BOOLEAN -> { return new FetchSourceContext(parser.booleanValue()); } case XContentParser.Token.VALUE_STRING -> { String[] includes = new String[] { parser.text() }; return new FetchSourceContext(true, includes, null); } case XContentParser.Token.START_ARRAY -> { String[] includes = parseSourceFieldArray(parser, INCLUDES_FIELD, null).toArray(new String[0]); return new FetchSourceContext(true, includes, null); } case XContentParser.Token.START_OBJECT -> { return parseSourceObject(parser); } default -> { throw new ParsingException( parser.getTokenLocation(), "Expected one of [" + XContentParser.Token.VALUE_BOOLEAN + ", " + XContentParser.Token.VALUE_STRING + ", " + XContentParser.Token.START_ARRAY + ", " + XContentParser.Token.START_OBJECT + "] but found [" + token + "]" ); } } // MUST never reach here

github-actions · 2026-02-27T11:44:53Z

PR Code Suggestions ✨

Latest suggestions up to 0b4c2c8

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Add validation to deserialization constructor The `StreamInput` constructor does not call `validateAmbiguousFields()`, unlike the other constructors. This means deserialized `FetchSourceContext` objects (e.g., from inter-node communication) bypass the ambiguity validation. Call `validateAmbiguousFields()` in this constructor as well to ensure consistency. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 7 __ Why: The `StreamInput` constructor bypasses `validateAmbiguousFields()`, creating an inconsistency where deserialized objects skip validation. This is a valid concern for correctness, though in practice the data would have been validated before serialization. Adding the validation call ensures consistency across all construction paths.	Medium
Possible issue	Replace immutable singleton sets with mutable sets The `parseSourceObject` method throws an error when both `includes` and `excludes` are empty, but this check uses `Collections.emptySet()` as the initial value. If a user provides only unrecognized field names (not `includes`/`excludes`), the sets remain empty and the error is thrown correctly. However, if a user provides only `includes` or only `excludes`, the other set remains `Collections.emptySet()` — this is fine. But the check should also guard against the case where both sets are non-empty but were set to `Collections.emptySet()` due to logic flow. The current logic is actually correct, but the condition should be `includes.isEmpty() && excludes.isEmpty()` which is already the case. No change needed here — but note that `Collections.emptySet()` is immutable, so calling `.contains()` on it is safe but calling `.add()` would throw. Since `parseSourceFieldArray` returns a new `LinkedHashSet`, this is fine. The real issue is that `Collections.singleton()` (used for `VALUE_STRING` case) is also immutable — if the same field appears twice (e.g., two `includes` string values), the second assignment would silently overwrite the first without error. Consider using a mutable set instead of `Collections.singleton()`. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [234-252] -if (includes.isEmpty() && excludes.isEmpty()) { - // no valid field names -> empty or unrecognized fields; not allowed - throw new ParsingException( - parser.getTokenLocation(), - "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" - ); +case XContentParser.Token.VALUE_STRING -> { + if (INCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { + String includeEntry = parser.text(); + if (excludes.contains(includeEntry)) { + throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, includeEntry); + } + Set<String> newIncludes = new LinkedHashSet<>(includes); + newIncludes.add(includeEntry); + includes = newIncludes; + } else if (EXCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { + String excludeEntry = parser.text(); + if (includes.contains(excludeEntry)) { + throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, excludeEntry); + } + Set<String> newExcludes = new LinkedHashSet<>(excludes); + newExcludes.add(excludeEntry); + excludes = newExcludes; + } else { + throw new ParsingException( + parser.getTokenLocation(), + "Unknown key for a " + token + " in [" + currentFieldName + "]." + ); + } } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `Collections.singleton()` is immutable and could cause issues if the same field name appears twice in the JSON (the second assignment silently overwrites the first). Using a mutable `LinkedHashSet` would be more robust. However, in practice, JSON parsers typically don't produce duplicate field names, so this is a minor edge case.	Low
General	Reject duplicate entries within the same array The `parseSourceFieldArray` method silently ignores duplicate entries within the same array (e.g., `["field1", "field1"]`) by using a `LinkedHashSet`. While deduplication is handled, there is no warning or error for duplicate entries within the same array. This may hide user mistakes. Consider throwing a `ParsingException` when a duplicate is detected within the same array to provide clearer feedback. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] -private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) - throws IOException { - Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields - while (parser.nextToken() != XContentParser.Token.END_ARRAY) { - if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { - String entry = parser.text(); - if (opposite != null && opposite.contains(entry)) { - throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); - } - sourceArr.add(entry); +String entry = parser.text(); +if (opposite != null && opposite.contains(entry)) { + throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); +} +if (!sourceArr.add(entry)) { + throw new ParsingException( + parser.getTokenLocation(), + "Duplicate entry [" + entry + "] in [" + parseField.getPreferredName() + "] of _source." + ); +} Suggestion importance[1-10]: 3 __ Why: While throwing on duplicates would provide clearer feedback, the current behavior of silently deduplicating via `LinkedHashSet` is a reasonable and common approach. The test in `FetchSourceContextTests.java` at line 68-69 even explicitly tests that duplicates are deduplicated, suggesting this is intentional behavior.	Low

Previous suggestions

Suggestions up to commit 9d97f05

Category	Suggestion	Impact
Possible issue	Fix placeholder format mismatch in error messages The `AMBIGUOUS_FIELD_MESSAGE` uses `{}` as a placeholder (SLF4J/log4j style), but `ParsingException` uses `String.format`-style formatting with `%s`. This means the error message thrown from `parseSourceFieldArray` and `parseSourceObject` will contain the literal `{}` instead of the actual field name. The `OpenSearchException` used in `validateAmbiguousFields` does support `{}` placeholders, so that path is correct. The `ParsingException` calls should use `String.format` or inline the value directly. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] -private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) - throws IOException { - Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields - while (parser.nextToken() != XContentParser.Token.END_ARRAY) { - if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { - String entry = parser.text(); - if (opposite != null && opposite.contains(entry)) { - throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); - } - sourceArr.add(entry); +private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [%s] cannot be both included and excluded in _source."; +// In validateAmbiguousFields, use OpenSearchException with its own formatting: +throw new OpenSearchException("The same entry [{}] cannot be both included and excluded in _source.", exclude); + +// In parseSourceFieldArray and parseSourceObject, use: +throw new ParsingException(parser.getTokenLocation(), String.format(AMBIGUOUS_FIELD_MESSAGE, entry)); + Suggestion importance[1-10]: 8 __ Why: This is a real bug — `ParsingException` uses `String.format`-style `%s` placeholders, not `{}` style, so the error messages from `parseSourceFieldArray` and `parseSourceObject` would contain literal `{}` instead of the actual field name. The test at line 211 (`assertEquals("The same entry [AAA] cannot be both included and excluded in _source.", result.getMessage())`) would fail if this bug exists, confirming the issue is critical.	Medium
Possible issue	Validate deserialized instances for ambiguous fields The `StreamInput` constructor does not call `validateAmbiguousFields()`, unlike the other constructors. This means deserialized instances from the network or storage could bypass the new validation, potentially allowing contradicting includes/excludes to be reconstructed without error. The validation should be called here as well to ensure consistency. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 7 __ Why: The `StreamInput` constructor skips `validateAmbiguousFields()`, creating an inconsistency where deserialized objects bypass the new validation. This is a valid concern for correctness and consistency, though in practice serialized data would have been validated before serialization.	Medium
General	Avoid unnecessarily exposing internal parsing method The check `includes.isEmpty() && excludes.isEmpty()` will also throw when both fields are absent from the object, but it won't catch the case where the object contains only unrecognized field names (since those throw earlier). However, a more critical issue is that a valid object with only `excludes` (no `includes`) would have `includes` as `Collections.emptySet()`, which is fine. But consider that a user could provide `{"excludes": ["field1"]}` with no `includes` — this is a valid use case that should be allowed. The current check correctly allows this, but the error message says "Expected at least one of [includes] or [excludes]" which is accurate. This logic is correct as-is, but the condition should use `\|\|` semantics — actually the `&&` is correct here (both empty means neither was provided). No change needed for this specific logic, but the `parseSourceObject` method is now `public` which exposes internal parsing logic unnecessarily. Consider making it package-private or keeping it private. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [198] -if (includes.isEmpty() && excludes.isEmpty()) { - // no valid field names -> empty or unrecognized fields; not allowed - throw new ParsingException( - parser.getTokenLocation(), - "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" - ); -} +static FetchSourceContext parseSourceObject(XContentParser parser) throws IOException { Suggestion importance[1-10]: 3 __ Why: Making `parseSourceObject` package-private instead of `public` is a minor visibility concern. However, the test file `FetchSourceContextTests.java` directly calls `FetchSourceContext.parseSourceObject(parser)` at line 279, so changing visibility would break the test. The suggestion's `improved_code` doesn't address this dependency.	Low

Suggestions up to commit 68f6ccd

Category	Suggestion	Impact
Possible issue	Validate deserialized instances for ambiguous fields The `StreamInput` constructor does not call `validateAmbiguousFields()`, meaning deserialized instances bypass the new validation. If a node on an older version serializes a `FetchSourceContext` with conflicting includes/excludes and sends it to a newer node, the validation will be silently skipped. Add the validation call here as well. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 6 __ Why: The `StreamInput` constructor bypasses `validateAmbiguousFields()`, which could allow invalid deserialized instances to skip the new validation. Adding the validation call here ensures consistency across all construction paths, though the practical risk is limited since conflicting fields would typically be caught at the source.	Low
Possible issue	Verify empty-object validation allows excludes-only input The validation that rejects an empty object fires even when the object contained unrecognized field names (which already threw a `ParsingException`). More critically, a valid object with only `excludes` (no `includes`) will be incorrectly rejected. The check should allow either `includes` or `excludes` to be non-empty independently, which is already the case, but the error message implies both are required. Consider verifying the logic is correct for the case where only `excludes` is provided. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-266] if (includes.isEmpty() && excludes.isEmpty()) { // no valid field names -> empty or unrecognized fields; not allowed throw new ParsingException( parser.getTokenLocation(), "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" ); } +return new FetchSourceContext(true, includes.toArray(new String[0]), excludes.toArray(new String[0])); Suggestion importance[1-10]: 2 __ Why: The suggestion asks to verify the logic is correct for excludes-only input, but the existing code already handles this correctly - the check `includes.isEmpty() && excludes.isEmpty()` only throws when both are empty. The `improved_code` just adds the `return` statement that already exists on line 266, making this suggestion inaccurate about the actual issue.	Low
General	Improve error message for empty top-level source array When `_source` is specified as a top-level array (not inside an object), `parseSourceFieldArray` is called with `INCLUDES_FIELD` and `null` as the `opposite` argument. In this case, duplicate entries are silently deduplicated by the `LinkedHashSet`. This is fine, but the `parseField` parameter is only used in the empty-array error message, so passing `INCLUDES_FIELD` for a top-level array is misleading. This is a minor concern, but the real issue is that the empty-array check for a top-level `_source: []` will produce the message "Expected at least one value for an array of [includes]" which may confuse users since `includes` was not explicitly used. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] -private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) - throws IOException { - Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields - while (parser.nextToken() != XContentParser.Token.END_ARRAY) { +case XContentParser.Token.START_ARRAY -> { + if (parser.nextToken() == XContentParser.Token.END_ARRAY) { + throw new ParsingException( + parser.getTokenLocation(), + "Expected at least one value for [_source] array" + ); + } + Set<String> includes = new LinkedHashSet<>(); + do { if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { - String entry = parser.text(); - if (opposite != null && opposite.contains(entry)) { - throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); - } - sourceArr.add(entry); + includes.add(parser.text()); + } else { + throw new ParsingException( + parser.getTokenLocation(), + "Unknown key for a " + parser.currentToken() + " in [" + parser.currentName() + "]." + ); + } + } while (parser.nextToken() != XContentParser.Token.END_ARRAY); + return new FetchSourceContext(true, includes.toArray(new String[0]), null); +} Suggestion importance[1-10]: 3 __ Why: The suggestion raises a valid UX concern about the misleading error message "Expected at least one value for an array of [includes]" when `_source: []` is used at the top level. However, the `improved_code` replaces the wrong section (it modifies `fromXContent` switch case instead of `parseSourceFieldArray`), making it inaccurate relative to the `existing_code` snippet.	Low

Suggestions up to commit 47acc2f

Category	Suggestion	Impact
Possible issue	Validate deserialized instances from stream input The `StreamInput` constructor does not call `validateAmbiguousFields()`, meaning deserialized instances from the wire can bypass the ambiguity validation. While data written to the stream should already be validated, adding the validation here ensures consistency and guards against any data written by older nodes or corrupted streams. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 5 __ Why: The `StreamInput` constructor bypasses `validateAmbiguousFields()`, which could allow invalid states when deserializing from older nodes or corrupted streams. Adding validation here improves consistency, though the risk is low since data should already be validated before serialization.	Low
Possible issue	Verify placeholder substitution across exception types The `AMBIGUOUS_FIELD_MESSAGE` uses `{}` as a placeholder, which is the SLF4J/log4j style. `OpenSearchException` uses `{}` for parameter substitution, but `ParsingException` (which extends `OpenSearchException`) may format the message differently. Verify that both `OpenSearchException` and `ParsingException` correctly substitute `{}` placeholders, otherwise the error message shown to users will contain the literal `{}` instead of the actual field name. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [73] +private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [{}] cannot be both included and excluded in _source."; - Suggestion importance[1-10]: 3 __ Why: The `existing_code` and `improved_code` are identical, making this a verification-only suggestion. The test cases in the PR (e.g., `assertEquals("The same entry [AAA] cannot be both included and excluded in _source.", result.getMessage())`) already confirm that `{}` substitution works correctly for both `OpenSearchException` and `ParsingException`.	Low
General	Fix misleading comment on empty object check The check for empty includes and excludes after parsing the object will not catch the case where the object contains unrecognized field names (e.g., `{"unknown":` `"value"}`), since the `default` case in the switch throws a `ParsingException` before reaching this check. However, a more subtle issue is that if both `includes` and `excludes` remain as `Collections.emptySet()` after parsing a non-empty object with only unrecognized fields, the error message says "Expected at least one of [includes] or [excludes]" which is misleading. This is actually fine since unrecognized fields throw earlier, but the comment is inaccurate — consider updating it to reflect that this handles the empty object case only. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-264] if (includes.isEmpty() && excludes.isEmpty()) { - // no valid field names -> empty or unrecognized fields; not allowed + // empty object {} - no includes or excludes were provided throw new ParsingException( parser.getTokenLocation(), "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" ); } Suggestion importance[1-10]: 2 __ Why: This suggestion only changes a code comment, not the actual logic. While the comment clarification has minor value, it's a trivial documentation-only change with no functional impact.	Low
General	Improve error clarity for top-level empty array When `_source` is a plain string or array (not an object), it is treated as includes-only with no excludes, so there is no ambiguity possible. However, `parseSourceFieldArray` is called with `null` as the `opposite` parameter, which is handled correctly. The issue is that `parseSourceFieldArray` throws if the array is empty, but a top-level `_source: []` should arguably be treated differently from `_source: { includes: [] }`. This is a behavioral inconsistency — both currently throw the same error message referencing `[includes]`, which may be confusing for the top-level array case. Consider providing a clearer error message for the top-level empty array case. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [167-174] -case XContentParser.Token.VALUE_STRING -> { - String[] includes = new String[] { parser.text() }; - return new FetchSourceContext(true, includes, null); -} case XContentParser.Token.START_ARRAY -> { - String[] includes = parseSourceFieldArray(parser, INCLUDES_FIELD, null).toArray(new String[0]); + Set<String> includesSet = parseSourceFieldArray(parser, INCLUDES_FIELD, null); + String[] includes = includesSet.toArray(new String[0]); return new FetchSourceContext(true, includes, null); } Suggestion importance[1-10]: 2 __ Why: The `improved_code` is essentially the same as the `existing_code` (just extracting a variable), and doesn't actually address the stated concern about providing a clearer error message for the top-level empty array case. The suggestion doesn't implement the improvement it describes.	Low

Suggestions up to commit 35b7268

Category	Suggestion	Impact
Possible issue	Fix placeholder format in exception message The `AMBIGUOUS_FIELD_MESSAGE` uses `{}` as a placeholder, which is the SLF4J/log4j style. However, `OpenSearchException` and `ParsingException` constructors use Java's `String.format`-style `%s` placeholders or direct string concatenation. Using `{}` will result in the literal `{}` appearing in the exception message instead of the actual field name when thrown via `new OpenSearchException(AMBIGUOUS_FIELD_MESSAGE,` `exclude)`. Verify that `OpenSearchException` supports `{}` placeholders, or switch to `%s` or direct string concatenation. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [73] -private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [{}] cannot be both included and excluded in _source."; +private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [%s] cannot be both included and excluded in _source."; Suggestion importance[1-10]: 8 __ Why: The test in `FetchSourceContextProtoUtilsTests` asserts `"The same entry [theSameEntry] cannot be both included and excluded in _source."` which implies `{}` is being replaced correctly. However, if `OpenSearchException` uses SLF4J-style formatting with `{}` placeholders, this works, but `ParsingException` may not. The suggestion correctly identifies a potential bug where `{}` might appear literally in messages, which would cause test failures and incorrect error messages.	Medium
Possible issue	Validate deserialized objects for ambiguous fields The `StreamInput` constructor does not call `validateAmbiguousFields()`, unlike the other constructors. This means deserialized `FetchSourceContext` objects (e.g., from inter-node communication) bypass the ambiguity validation. If a node running older code serializes an ambiguous context and sends it to a node running the new code, the validation will be skipped. Add `validateAmbiguousFields()` to this constructor as well. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 6 __ Why: The `StreamInput` constructor skips `validateAmbiguousFields()`, which could allow deserialized objects with ambiguous includes/excludes to bypass validation. This is a valid consistency concern, though the practical impact may be limited since ambiguous contexts should be rejected at the source.	Low
General	Use empty array instead of null for excludes When parsing a top-level `START_ARRAY` for `_source`, `null` is passed as the `opposite` parameter to `parseSourceFieldArray`, which correctly skips ambiguity checking (since there are no excludes). However, the resulting `FetchSourceContext` is constructed with `null` for excludes, which is handled by the constructor. This is fine, but for consistency and clarity, consider passing `Strings.EMPTY_ARRAY` instead of `null`. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [171-173] case XContentParser.Token.START_ARRAY -> { String[] includes = parseSourceFieldArray(parser, INCLUDES_FIELD, null).toArray(new String[0]); - return new FetchSourceContext(true, includes, null); + return new FetchSourceContext(true, includes, Strings.EMPTY_ARRAY); } Suggestion importance[1-10]: 2 __ Why: This is a minor style/consistency suggestion. The constructor already handles `null` by converting it to `Strings.EMPTY_ARRAY`, so passing `null` vs `Strings.EMPTY_ARRAY` has no functional difference.	Low
General	Clarify empty object validation logic The check `includes.isEmpty() && excludes.isEmpty()` only throws when both are empty, but it won't catch the case where an unrecognized field name was provided (e.g., `{"unknown": ["field1"]}`). In that scenario, the `default` branch in the `switch` already throws, but if a user provides only unrecognized field names that somehow bypass the switch, the empty check would still pass. More critically, the current logic allows a valid `excludes`-only or `includes`-only object, which is fine, but the condition should be reviewed to ensure it truly only rejects completely empty objects and not valid single-field objects. Actually, the main concern is that `Collections.emptySet()` is used as the initial value and then reassigned — but if only one of `includes` or `excludes` is provided, the other remains `emptySet()`, and the check `includes.isEmpty() && excludes.isEmpty()` would not throw. This is correct behavior. However, if neither recognized field is encountered (e.g., only unknown fields), the `default` case throws before reaching this check, so the check is redundant but harmless. No change needed here, but consider adding a comment clarifying this. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-264] if (includes.isEmpty() && excludes.isEmpty()) { - // no valid field names -> empty or unrecognized fields; not allowed + // no valid field names -> empty object; not allowed throw new ParsingException( parser.getTokenLocation(), "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" ); } Suggestion importance[1-10]: 1 __ Why: The suggestion only proposes adding a comment clarification with no actual code change. The `existing_code` and `improved_code` are functionally identical, differing only in a comment. This is a very low-impact suggestion.	Low

Suggestions up to commit 7f7eff2

Category	Suggestion	Impact
Possible issue	Validate deserialized instances for ambiguous fields The `StreamInput` constructor does not call `validateAmbiguousFields()`, meaning deserialized instances can bypass the new validation. If a node running an older version sends a `FetchSourceContext` with conflicting includes/excludes over the wire, the receiving node will accept it silently. Add the validation call here as well to ensure consistency. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95] public FetchSourceContext(StreamInput in) throws IOException { fetchSource = in.readBoolean(); includes = in.readStringArray(); excludes = in.readStringArray(); + validateAmbiguousFields(); } Suggestion importance[1-10]: 6 __ Why: The `StreamInput` constructor bypasses `validateAmbiguousFields()`, which could allow deserialized instances with conflicting includes/excludes to bypass the new validation. Adding the call ensures consistency across all construction paths.	Low
General	Detect duplicate entries within the same source array Duplicate entries within the same array (e.g., `includes: ["field1", "field1"]`) are silently deduplicated by the `LinkedHashSet`. While the test `testFetchSourceArray` shows this is intentional, there is no validation or warning when duplicates appear within the same array. This could mask user mistakes. Consider throwing a `ParsingException` or at least logging a warning when a duplicate entry is detected within the same array. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279] private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite) throws IOException { - Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields + Set<String> sourceArr = new LinkedHashSet<>(); while (parser.nextToken() != XContentParser.Token.END_ARRAY) { if (parser.currentToken() == XContentParser.Token.VALUE_STRING) { String entry = parser.text(); if (opposite != null && opposite.contains(entry)) { throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry); } - sourceArr.add(entry); + if (!sourceArr.add(entry)) { + throw new ParsingException( + parser.getTokenLocation(), + "Duplicate entry [" + entry + "] in [" + parseField.getPreferredName() + "] of _source." + ); + } Suggestion importance[1-10]: 3 __ Why: The test `testFetchSourceArray` explicitly verifies that duplicates are silently deduplicated, indicating this is intentional behavior. Throwing an exception for duplicates would be a behavioral change that contradicts the existing test design, making this suggestion potentially disruptive.	Low
General	Clarify empty-object validation logic and comment The check `includes.isEmpty() && excludes.isEmpty()` only throws when both are empty, but it won't catch the case where an unrecognized field name was provided (which would leave both sets empty). However, a more critical issue is that this check will also incorrectly reject a valid case where only `excludes` is provided without `includes` (e.g., `{"excludes": ["field1"]}`). The condition should allow either one to be non-empty, which it does — but the comment is misleading. More importantly, the validation in `validateAmbiguousFields()` is called from the constructor, so if `parseSourceObject` already validates ambiguity during parsing, the constructor will also re-validate, which is redundant but not harmful. Consider verifying that the empty-check logic correctly handles the case where only one of includes/excludes is specified. server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-265] if (includes.isEmpty() && excludes.isEmpty()) { - // no valid field names -> empty or unrecognized fields; not allowed + // neither includes nor excludes were provided -> empty or unrecognized fields; not allowed throw new ParsingException( parser.getTokenLocation(), "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]" ); } Suggestion importance[1-10]: 1 __ Why: The suggestion only changes a comment from "no valid field names -> empty or unrecognized fields; not allowed" to "neither includes nor excludes were provided -> empty or unrecognized fields; not allowed". The `existing_code` and `improved_code` are functionally identical, making this a trivial comment-only change with minimal impact.	Low

github-actions · 2026-02-27T11:53:37Z

❌ Gradle check result for c2718e8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-02-27T13:32:50Z

Persistent review updated to latest commit 418c9b7

github-actions · 2026-02-27T13:35:09Z

Persistent review updated to latest commit 8f222c4

github-actions · 2026-02-27T13:40:06Z

Persistent review updated to latest commit 49d359c

github-actions · 2026-03-25T16:29:13Z

Failed to generate code suggestions for PR

github-actions · 2026-03-25T17:35:09Z

❌ Gradle check result for 5848b18: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-25T18:26:03Z

Failed to generate code suggestions for PR

github-actions · 2026-03-25T18:33:14Z

Failed to generate code suggestions for PR

github-actions · 2026-03-25T19:53:27Z

Failed to generate code suggestions for PR

github-actions · 2026-03-25T20:03:37Z

❌ Gradle check result for 51e63e6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-26T07:06:58Z

Failed to generate code suggestions for PR

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

extract array parsing as its own function Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

parseSourceObject: split key-value process into different code-blocks Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

include and exclude collections must not contain the same entries Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

The same entry MUST NOT be present in both inludes AND excludes arrays. This leads to ambiguity. Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Explicitly defined source object '_source={}' not allowed Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

empty object validation updated to cover all possible cases Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

toXContent to create valid object Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

github-actions · 2026-03-26T08:23:10Z

❌ Gradle check result for cabbc32: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-26T13:54:05Z

Failed to generate code suggestions for PR

github-actions · 2026-03-26T15:11:37Z

❌ Gradle check result for fe8b28f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

urmichm requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners February 27, 2026 11:42

github-actions bot added enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers Search:Query Insights labels Feb 27, 2026

urmichm mentioned this pull request Feb 27, 2026

[Feature Request] [BUG] _source include / exclude validation #20612

Open

urmichm force-pushed the 20612-source-validation branch from 68fd820 to 5848b18 Compare March 25, 2026 16:27

opensearch-ci-bot mentioned this pull request Mar 25, 2026

[AUTOCUT] Gradle Check Flaky Test Report for SmokeTestMultiNodeClientYamlTestSuiteIT #14408

Open

opensearch-ci-bot mentioned this pull request Mar 25, 2026

[AUTOCUT] Gradle Check Flaky Test Report for AzureBlobStoreRepositoryTests #14291

Open

urmichm and others added 16 commits March 26, 2026 08:46

Init marks for ISSUE-20612

004d89f

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

simplification i

d972072

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

switch case in favour of if-else-if

1838d84

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

minor refactor

2fdf32a

extract array parsing as its own function Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Refactor

30004e5

parseSourceObject: split key-value process into different code-blocks Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Intersection Validation

9dd5fbd

include and exclude collections must not contain the same entries Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Validation of includes / excludes arrays

52830b7

The same entry MUST NOT be present in both inludes AND excludes arrays. This leads to ambiguity. Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Empty _source not allowed

9b1465f

Explicitly defined source object '_source={}' not allowed Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

FetchSourceContext validate after construction

18231c0

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

unit test i

cf89692

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

spotless apply

73bdbff

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

Consistent Excpetion usage

96a7182

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

FetchSourceContext minor update

51d3bb5

empty object validation updated to cover all possible cases Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

Source toXContent update

a9de5f6

toXContent to create valid object Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

yml tests

0386618

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>

CHANGELOG.md

fe8b28f

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>

urmichm force-pushed the 20612-source-validation branch from cabbc32 to fe8b28f Compare March 26, 2026 13:53

Conversation

urmichm commented Feb 27, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 68fd820)

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Feb 27, 2026 •

edited

Loading

(Review updated until commit `68fd820`)

github-actions bot commented Feb 27, 2026 •

edited

Loading