Skip to content

Validation of the _source to reject contradicting and ambiguous requests#20742

Open
urmichm wants to merge 16 commits intoopensearch-project:mainfrom
urmichm:20612-source-validation
Open

Validation of the _source to reject contradicting and ambiguous requests#20742
urmichm wants to merge 16 commits intoopensearch-project:mainfrom
urmichm:20612-source-validation

Conversation

@urmichm
Copy link
Copy Markdown
Contributor

@urmichm urmichm commented Feb 27, 2026

Description

Validation of the _source object added to avoid confusion and reject contradicting and ambiguous requests:

  • "_source": {} At lease one of includes or excludes must be defined. 🚫
  • "_source": [] Explicitly defined empty array of includes is not allowed. 🚫
  • "_source": { "includes": "text", "excludes": ["title", "text"] } The text field is defined in both includes and excludes. Contradiction. 🚫
  • "_source": { "includes": [], "excludes": ["title"] } or _source: { "includes": ["title"], "excludes": [] } Explicitly defined empty arrays are not allowed to avoid confusion. 🚫

To include the whole _source object or to exclude it completely, the existing boolean logic is encouraged.
"_source": true and "_source": false

Unit tests added to test behaviour of parsing implementation.
Existing unit tests have been extended to cover invalid cases.

Related Issues

Resolves #20612

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 27, 2026

PR Reviewer Guide 🔍

(Review updated until commit 68fd820)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Core _source validation logic and tests

Relevant files:

  • server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java
  • server/src/test/java/org/opensearch/search/fetch/subphase/FetchSourceContextTests.java
  • rest-api-spec/src/main/resources/rest-api-spec/test/search/10_source_filtering.yml
  • CHANGELOG.md

Sub-PR theme: gRPC proto utils test for ambiguous _source validation

Relevant files:

  • modules/transport-grpc/src/test/java/org/opensearch/transport/grpc/proto/request/common/FetchSourceContextProtoUtilsTests.java

⚡ Recommended focus areas for review

Breaking Change

The validateAmbiguousFields() call in the constructor means that any existing serialized FetchSourceContext objects deserialized via StreamInput (the StreamInput constructor) will also trigger validation. If persisted cluster state or in-flight requests contain overlapping includes/excludes (even if previously accepted), deserialization will now throw an OpenSearchException, potentially causing node failures or rolling-upgrade issues.

public FetchSourceContext(StreamInput in) throws IOException {
    fetchSource = in.readBoolean();
    includes = in.readStringArray();
    excludes = in.readStringArray();
}
Empty Object Behavior Change

Previously, _source: {} was silently accepted and treated as fetch-all. Now it throws a ParsingException. This is a breaking change for existing users who may have _source: {} in their queries. The skip version in the YAML test (3.5.0) suggests this is intentional, but there is no deprecation period — it goes straight to rejection.

if (includes.isEmpty() && excludes.isEmpty()) {
    // no valid field names -> empty or unrecognized fields; not allowed
    throw new ParsingException(
        parser.getTokenLocation(),
        "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]"
    );
}
Empty Array Breaking Change

Previously, _source: [] was accepted and treated as an empty includes list (fetch all). Now it throws a ParsingException. Existing queries using _source: [] will break without warning. The parseSourceFieldArray method enforces at least one element, which is a stricter contract than before.

if (sourceArr.isEmpty()) {
    throw new ParsingException(
        parser.getTokenLocation(),
        "Expected at least one value for an array of [" + parseField.getPreferredName() + "]"
    );
}
Unreachable Code

The fromXContent method uses a switch expression that covers all XContentParser.Token cases and always either returns or throws. The comment // MUST never reach here at line 195 is correct but the method has no return statement after the switch, which will cause a compile error unless the compiler can prove exhaustiveness. If the switch does not cover all enum values (e.g., a future token type), this will fail at runtime with no clear error.

public static FetchSourceContext fromXContent(XContentParser parser) throws IOException {
    XContentParser.Token token = parser.currentToken();
    switch (token) {
        case XContentParser.Token.VALUE_BOOLEAN -> {
            return new FetchSourceContext(parser.booleanValue());
        }
        case XContentParser.Token.VALUE_STRING -> {
            String[] includes = new String[] { parser.text() };
            return new FetchSourceContext(true, includes, null);
        }
        case XContentParser.Token.START_ARRAY -> {
            String[] includes = parseSourceFieldArray(parser, INCLUDES_FIELD, null).toArray(new String[0]);
            return new FetchSourceContext(true, includes, null);
        }
        case XContentParser.Token.START_OBJECT -> {
            return parseSourceObject(parser);
        }
        default -> {
            throw new ParsingException(
                parser.getTokenLocation(),
                "Expected one of ["
                    + XContentParser.Token.VALUE_BOOLEAN
                    + ", "
                    + XContentParser.Token.VALUE_STRING
                    + ", "
                    + XContentParser.Token.START_ARRAY
                    + ", "
                    + XContentParser.Token.START_OBJECT
                    + "] but found ["
                    + token
                    + "]"
            );
        }
    }
    // MUST never reach here

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 27, 2026

PR Code Suggestions ✨

Latest suggestions up to 0b4c2c8

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add validation to deserialization constructor

The StreamInput constructor does not call validateAmbiguousFields(), unlike the
other constructors. This means deserialized FetchSourceContext objects (e.g., from
inter-node communication) bypass the ambiguity validation. Call
validateAmbiguousFields() in this constructor as well to ensure consistency.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95]

 public FetchSourceContext(StreamInput in) throws IOException {
     fetchSource = in.readBoolean();
     includes = in.readStringArray();
     excludes = in.readStringArray();
+    validateAmbiguousFields();
 }
Suggestion importance[1-10]: 7

__

Why: The StreamInput constructor bypasses validateAmbiguousFields(), creating an inconsistency where deserialized objects skip validation. This is a valid concern for correctness, though in practice the data would have been validated before serialization. Adding the validation call ensures consistency across all construction paths.

Medium
Replace immutable singleton sets with mutable sets

The parseSourceObject method throws an error when both includes and excludes are
empty, but this check uses Collections.emptySet() as the initial value. If a user
provides only unrecognized field names (not includes/excludes), the sets remain
empty and the error is thrown correctly. However, if a user provides only includes
or only excludes, the other set remains Collections.emptySet() — this is fine. But
the check should also guard against the case where both sets are non-empty but were
set to Collections.emptySet() due to logic flow. The current logic is actually
correct, but the condition should be includes.isEmpty() && excludes.isEmpty() which
is already the case. No change needed here — but note that Collections.emptySet() is
immutable, so calling .contains() on it is safe but calling .add() would throw.
Since parseSourceFieldArray returns a new LinkedHashSet, this is fine. The real
issue is that Collections.singleton() (used for VALUE_STRING case) is also immutable
— if the same field appears twice (e.g., two includes string values), the second
assignment would silently overwrite the first without error. Consider using a
mutable set instead of Collections.singleton().

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [234-252]

-if (includes.isEmpty() && excludes.isEmpty()) {
-    // no valid field names -> empty or unrecognized fields; not allowed
-    throw new ParsingException(
-        parser.getTokenLocation(),
-        "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]"
-    );
+case XContentParser.Token.VALUE_STRING -> {
+    if (INCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
+        String includeEntry = parser.text();
+        if (excludes.contains(includeEntry)) {
+            throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, includeEntry);
+        }
+        Set<String> newIncludes = new LinkedHashSet<>(includes);
+        newIncludes.add(includeEntry);
+        includes = newIncludes;
+    } else if (EXCLUDES_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
+        String excludeEntry = parser.text();
+        if (includes.contains(excludeEntry)) {
+            throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, excludeEntry);
+        }
+        Set<String> newExcludes = new LinkedHashSet<>(excludes);
+        newExcludes.add(excludeEntry);
+        excludes = newExcludes;
+    } else {
+        throw new ParsingException(
+            parser.getTokenLocation(),
+            "Unknown key for a " + token + " in [" + currentFieldName + "]."
+        );
+    }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that Collections.singleton() is immutable and could cause issues if the same field name appears twice in the JSON (the second assignment silently overwrites the first). Using a mutable LinkedHashSet would be more robust. However, in practice, JSON parsers typically don't produce duplicate field names, so this is a minor edge case.

Low
General
Reject duplicate entries within the same array

The parseSourceFieldArray method silently ignores duplicate entries within the same
array (e.g., ["field1", "field1"]) by using a LinkedHashSet. While deduplication is
handled, there is no warning or error for duplicate entries within the same array.
This may hide user mistakes. Consider throwing a ParsingException when a duplicate
is detected within the same array to provide clearer feedback.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279]

-private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite)
-    throws IOException {
-    Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields
-    while (parser.nextToken() != XContentParser.Token.END_ARRAY) {
-        if (parser.currentToken() == XContentParser.Token.VALUE_STRING) {
-            String entry = parser.text();
-            if (opposite != null && opposite.contains(entry)) {
-                throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry);
-            }
-            sourceArr.add(entry);
+String entry = parser.text();
+if (opposite != null && opposite.contains(entry)) {
+    throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry);
+}
+if (!sourceArr.add(entry)) {
+    throw new ParsingException(
+        parser.getTokenLocation(),
+        "Duplicate entry [" + entry + "] in [" + parseField.getPreferredName() + "] of _source."
+    );
+}
Suggestion importance[1-10]: 3

__

Why: While throwing on duplicates would provide clearer feedback, the current behavior of silently deduplicating via LinkedHashSet is a reasonable and common approach. The test in FetchSourceContextTests.java at line 68-69 even explicitly tests that duplicates are deduplicated, suggesting this is intentional behavior.

Low

Previous suggestions

Suggestions up to commit 9d97f05
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix placeholder format mismatch in error messages

The AMBIGUOUS_FIELD_MESSAGE uses {} as a placeholder (SLF4J/log4j style), but
ParsingException uses String.format-style formatting with %s. This means the error
message thrown from parseSourceFieldArray and parseSourceObject will contain the
literal {} instead of the actual field name. The OpenSearchException used in
validateAmbiguousFields does support {} placeholders, so that path is correct. The
ParsingException calls should use String.format or inline the value directly.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279]

-private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite)
-    throws IOException {
-    Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields
-    while (parser.nextToken() != XContentParser.Token.END_ARRAY) {
-        if (parser.currentToken() == XContentParser.Token.VALUE_STRING) {
-            String entry = parser.text();
-            if (opposite != null && opposite.contains(entry)) {
-                throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry);
-            }
-            sourceArr.add(entry);
+private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [%s] cannot be both included and excluded in _source.";
 
+// In validateAmbiguousFields, use OpenSearchException with its own formatting:
+throw new OpenSearchException("The same entry [{}] cannot be both included and excluded in _source.", exclude);
+
+// In parseSourceFieldArray and parseSourceObject, use:
+throw new ParsingException(parser.getTokenLocation(), String.format(AMBIGUOUS_FIELD_MESSAGE, entry));
+
Suggestion importance[1-10]: 8

__

Why: This is a real bug — ParsingException uses String.format-style %s placeholders, not {} style, so the error messages from parseSourceFieldArray and parseSourceObject would contain literal {} instead of the actual field name. The test at line 211 (assertEquals("The same entry [AAA] cannot be both included and excluded in _source.", result.getMessage())) would fail if this bug exists, confirming the issue is critical.

Medium
Validate deserialized instances for ambiguous fields

The StreamInput constructor does not call validateAmbiguousFields(), unlike the
other constructors. This means deserialized instances from the network or storage
could bypass the new validation, potentially allowing contradicting
includes/excludes to be reconstructed without error. The validation should be called
here as well to ensure consistency.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95]

 public FetchSourceContext(StreamInput in) throws IOException {
     fetchSource = in.readBoolean();
     includes = in.readStringArray();
     excludes = in.readStringArray();
+    validateAmbiguousFields();
 }
Suggestion importance[1-10]: 7

__

Why: The StreamInput constructor skips validateAmbiguousFields(), creating an inconsistency where deserialized objects bypass the new validation. This is a valid concern for correctness and consistency, though in practice serialized data would have been validated before serialization.

Medium
General
Avoid unnecessarily exposing internal parsing method

The check includes.isEmpty() && excludes.isEmpty() will also throw when both fields
are absent from the object, but it won't catch the case where the object contains
only unrecognized field names (since those throw earlier). However, a more critical
issue is that a valid object with only excludes (no includes) would have includes as
Collections.emptySet(), which is fine. But consider that a user could provide
{"excludes": ["field1"]} with no includes — this is a valid use case that should be
allowed. The current check correctly allows this, but the error message says
"Expected at least one of [includes] or [excludes]" which is accurate. This logic is
correct as-is, but the condition should use || semantics — actually the && is
correct here (both empty means neither was provided). No change needed for this
specific logic, but the parseSourceObject method is now public which exposes
internal parsing logic unnecessarily. Consider making it package-private or keeping
it private.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [198]

-if (includes.isEmpty() && excludes.isEmpty()) {
-    // no valid field names -> empty or unrecognized fields; not allowed
-    throw new ParsingException(
-        parser.getTokenLocation(),
-        "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]"
-    );
-}
+static FetchSourceContext parseSourceObject(XContentParser parser) throws IOException {
Suggestion importance[1-10]: 3

__

Why: Making parseSourceObject package-private instead of public is a minor visibility concern. However, the test file FetchSourceContextTests.java directly calls FetchSourceContext.parseSourceObject(parser) at line 279, so changing visibility would break the test. The suggestion's improved_code doesn't address this dependency.

Low
Suggestions up to commit 68f6ccd
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate deserialized instances for ambiguous fields

The StreamInput constructor does not call validateAmbiguousFields(), meaning
deserialized instances bypass the new validation. If a node on an older version
serializes a FetchSourceContext with conflicting includes/excludes and sends it to a
newer node, the validation will be silently skipped. Add the validation call here as
well.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95]

 public FetchSourceContext(StreamInput in) throws IOException {
     fetchSource = in.readBoolean();
     includes = in.readStringArray();
     excludes = in.readStringArray();
+    validateAmbiguousFields();
 }
Suggestion importance[1-10]: 6

__

Why: The StreamInput constructor bypasses validateAmbiguousFields(), which could allow invalid deserialized instances to skip the new validation. Adding the validation call here ensures consistency across all construction paths, though the practical risk is limited since conflicting fields would typically be caught at the source.

Low
Verify empty-object validation allows excludes-only input

The validation that rejects an empty object fires even when the object contained
unrecognized field names (which already threw a ParsingException). More critically,
a valid object with only excludes (no includes) will be incorrectly rejected. The
check should allow either includes or excludes to be non-empty independently, which
is already the case, but the error message implies both are required. Consider
verifying the logic is correct for the case where only excludes is provided.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-266]

 if (includes.isEmpty() && excludes.isEmpty()) {
     // no valid field names -> empty or unrecognized fields; not allowed
     throw new ParsingException(
         parser.getTokenLocation(),
         "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]"
     );
 }
+return new FetchSourceContext(true, includes.toArray(new String[0]), excludes.toArray(new String[0]));
Suggestion importance[1-10]: 2

__

Why: The suggestion asks to verify the logic is correct for excludes-only input, but the existing code already handles this correctly - the check includes.isEmpty() && excludes.isEmpty() only throws when both are empty. The improved_code just adds the return statement that already exists on line 266, making this suggestion inaccurate about the actual issue.

Low
General
Improve error message for empty top-level source array

When _source is specified as a top-level array (not inside an object),
parseSourceFieldArray is called with INCLUDES_FIELD and null as the opposite
argument. In this case, duplicate entries are silently deduplicated by the
LinkedHashSet. This is fine, but the parseField parameter is only used in the
empty-array error message, so passing INCLUDES_FIELD for a top-level array is
misleading. This is a minor concern, but the real issue is that the empty-array
check for a top-level _source: [] will produce the message "Expected at least one
value for an array of [includes]" which may confuse users since includes was not
explicitly used.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279]

-private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite)
-    throws IOException {
-    Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields
-    while (parser.nextToken() != XContentParser.Token.END_ARRAY) {
+case XContentParser.Token.START_ARRAY -> {
+    if (parser.nextToken() == XContentParser.Token.END_ARRAY) {
+        throw new ParsingException(
+            parser.getTokenLocation(),
+            "Expected at least one value for [_source] array"
+        );
+    }
+    Set<String> includes = new LinkedHashSet<>();
+    do {
         if (parser.currentToken() == XContentParser.Token.VALUE_STRING) {
-            String entry = parser.text();
-            if (opposite != null && opposite.contains(entry)) {
-                throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry);
-            }
-            sourceArr.add(entry);
+            includes.add(parser.text());
+        } else {
+            throw new ParsingException(
+                parser.getTokenLocation(),
+                "Unknown key for a " + parser.currentToken() + " in [" + parser.currentName() + "]."
+            );
+        }
+    } while (parser.nextToken() != XContentParser.Token.END_ARRAY);
+    return new FetchSourceContext(true, includes.toArray(new String[0]), null);
+}
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid UX concern about the misleading error message "Expected at least one value for an array of [includes]" when _source: [] is used at the top level. However, the improved_code replaces the wrong section (it modifies fromXContent switch case instead of parseSourceFieldArray), making it inaccurate relative to the existing_code snippet.

Low
Suggestions up to commit 47acc2f
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate deserialized instances from stream input

The StreamInput constructor does not call validateAmbiguousFields(), meaning
deserialized instances from the wire can bypass the ambiguity validation. While data
written to the stream should already be validated, adding the validation here
ensures consistency and guards against any data written by older nodes or corrupted
streams.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95]

 public FetchSourceContext(StreamInput in) throws IOException {
     fetchSource = in.readBoolean();
     includes = in.readStringArray();
     excludes = in.readStringArray();
+    validateAmbiguousFields();
 }
Suggestion importance[1-10]: 5

__

Why: The StreamInput constructor bypasses validateAmbiguousFields(), which could allow invalid states when deserializing from older nodes or corrupted streams. Adding validation here improves consistency, though the risk is low since data should already be validated before serialization.

Low
Verify placeholder substitution across exception types

The AMBIGUOUS_FIELD_MESSAGE uses {} as a placeholder, which is the SLF4J/log4j
style. OpenSearchException uses {} for parameter substitution, but ParsingException
(which extends OpenSearchException) may format the message differently. Verify that
both OpenSearchException and ParsingException correctly substitute {} placeholders,
otherwise the error message shown to users will contain the literal {} instead of
the actual field name.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [73]

+private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [{}] cannot be both included and excluded in _source.";
 
-
Suggestion importance[1-10]: 3

__

Why: The existing_code and improved_code are identical, making this a verification-only suggestion. The test cases in the PR (e.g., assertEquals("The same entry [AAA] cannot be both included and excluded in _source.", result.getMessage())) already confirm that {} substitution works correctly for both OpenSearchException and ParsingException.

Low
General
Fix misleading comment on empty object check

The check for empty includes and excludes after parsing the object will not catch
the case where the object contains unrecognized field names (e.g., {"unknown":
"value"}), since the default case in the switch throws a ParsingException before
reaching this check. However, a more subtle issue is that if both includes and
excludes remain as Collections.emptySet() after parsing a non-empty object with only
unrecognized fields, the error message says "Expected at least one of [includes] or
[excludes]" which is misleading. This is actually fine since unrecognized fields
throw earlier, but the comment is inaccurate — consider updating it to reflect that
this handles the empty object case only.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-264]

 if (includes.isEmpty() && excludes.isEmpty()) {
-    // no valid field names -> empty or unrecognized fields; not allowed
+    // empty object {} - no includes or excludes were provided
     throw new ParsingException(
         parser.getTokenLocation(),
         "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]"
     );
 }
Suggestion importance[1-10]: 2

__

Why: This suggestion only changes a code comment, not the actual logic. While the comment clarification has minor value, it's a trivial documentation-only change with no functional impact.

Low
Improve error clarity for top-level empty array

When _source is a plain string or array (not an object), it is treated as
includes-only with no excludes, so there is no ambiguity possible. However,
parseSourceFieldArray is called with null as the opposite parameter, which is
handled correctly. The issue is that parseSourceFieldArray throws if the array is
empty, but a top-level _source: [] should arguably be treated differently from
_source: { includes: [] }. This is a behavioral inconsistency — both currently throw
the same error message referencing [includes], which may be confusing for the
top-level array case. Consider providing a clearer error message for the top-level
empty array case.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [167-174]

-case XContentParser.Token.VALUE_STRING -> {
-    String[] includes = new String[] { parser.text() };
-    return new FetchSourceContext(true, includes, null);
-}
 case XContentParser.Token.START_ARRAY -> {
-    String[] includes = parseSourceFieldArray(parser, INCLUDES_FIELD, null).toArray(new String[0]);
+    Set<String> includesSet = parseSourceFieldArray(parser, INCLUDES_FIELD, null);
+    String[] includes = includesSet.toArray(new String[0]);
     return new FetchSourceContext(true, includes, null);
 }
Suggestion importance[1-10]: 2

__

Why: The improved_code is essentially the same as the existing_code (just extracting a variable), and doesn't actually address the stated concern about providing a clearer error message for the top-level empty array case. The suggestion doesn't implement the improvement it describes.

Low
Suggestions up to commit 35b7268
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix placeholder format in exception message

The AMBIGUOUS_FIELD_MESSAGE uses {} as a placeholder, which is the SLF4J/log4j
style. However, OpenSearchException and ParsingException constructors use Java's
String.format-style %s placeholders or direct string concatenation. Using {} will
result in the literal {} appearing in the exception message instead of the actual
field name when thrown via new OpenSearchException(AMBIGUOUS_FIELD_MESSAGE,
exclude). Verify that OpenSearchException supports {} placeholders, or switch to %s
or direct string concatenation.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [73]

-private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [{}] cannot be both included and excluded in _source.";
+private static final String AMBIGUOUS_FIELD_MESSAGE = "The same entry [%s] cannot be both included and excluded in _source.";
Suggestion importance[1-10]: 8

__

Why: The test in FetchSourceContextProtoUtilsTests asserts "The same entry [theSameEntry] cannot be both included and excluded in _source." which implies {} is being replaced correctly. However, if OpenSearchException uses SLF4J-style formatting with {} placeholders, this works, but ParsingException may not. The suggestion correctly identifies a potential bug where {} might appear literally in messages, which would cause test failures and incorrect error messages.

Medium
Validate deserialized objects for ambiguous fields

The StreamInput constructor does not call validateAmbiguousFields(), unlike the
other constructors. This means deserialized FetchSourceContext objects (e.g., from
inter-node communication) bypass the ambiguity validation. If a node running older
code serializes an ambiguous context and sends it to a node running the new code,
the validation will be skipped. Add validateAmbiguousFields() to this constructor as
well.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95]

 public FetchSourceContext(StreamInput in) throws IOException {
     fetchSource = in.readBoolean();
     includes = in.readStringArray();
     excludes = in.readStringArray();
+    validateAmbiguousFields();
 }
Suggestion importance[1-10]: 6

__

Why: The StreamInput constructor skips validateAmbiguousFields(), which could allow deserialized objects with ambiguous includes/excludes to bypass validation. This is a valid consistency concern, though the practical impact may be limited since ambiguous contexts should be rejected at the source.

Low
General
Use empty array instead of null for excludes

When parsing a top-level START_ARRAY for _source, null is passed as the opposite
parameter to parseSourceFieldArray, which correctly skips ambiguity checking (since
there are no excludes). However, the resulting FetchSourceContext is constructed
with null for excludes, which is handled by the constructor. This is fine, but for
consistency and clarity, consider passing Strings.EMPTY_ARRAY instead of null.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [171-173]

 case XContentParser.Token.START_ARRAY -> {
     String[] includes = parseSourceFieldArray(parser, INCLUDES_FIELD, null).toArray(new String[0]);
-    return new FetchSourceContext(true, includes, null);
+    return new FetchSourceContext(true, includes, Strings.EMPTY_ARRAY);
 }
Suggestion importance[1-10]: 2

__

Why: This is a minor style/consistency suggestion. The constructor already handles null by converting it to Strings.EMPTY_ARRAY, so passing null vs Strings.EMPTY_ARRAY has no functional difference.

Low
Clarify empty object validation logic

The check includes.isEmpty() && excludes.isEmpty() only throws when both are empty,
but it won't catch the case where an unrecognized field name was provided (e.g.,
{"unknown": ["field1"]}). In that scenario, the default branch in the switch already
throws, but if a user provides only unrecognized field names that somehow bypass the
switch, the empty check would still pass. More critically, the current logic allows
a valid excludes-only or includes-only object, which is fine, but the condition
should be reviewed to ensure it truly only rejects completely empty objects and not
valid single-field objects.
Actually, the main concern is that
Collections.emptySet() is used as the initial value and then reassigned — but if
only one of includes or excludes is provided, the other remains emptySet(), and the
check includes.isEmpty() && excludes.isEmpty() would not throw. This is correct
behavior. However, if neither recognized field is encountered (e.g., only unknown
fields), the default case throws before reaching this check, so the check is
redundant but harmless. No change needed here, but consider adding a comment
clarifying this.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-264]

 if (includes.isEmpty() && excludes.isEmpty()) {
-    // no valid field names -> empty or unrecognized fields; not allowed
+    // no valid field names -> empty object; not allowed
     throw new ParsingException(
         parser.getTokenLocation(),
         "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]"
     );
 }
Suggestion importance[1-10]: 1

__

Why: The suggestion only proposes adding a comment clarification with no actual code change. The existing_code and improved_code are functionally identical, differing only in a comment. This is a very low-impact suggestion.

Low
Suggestions up to commit 7f7eff2
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate deserialized instances for ambiguous fields

The StreamInput constructor does not call validateAmbiguousFields(), meaning
deserialized instances can bypass the new validation. If a node running an older
version sends a FetchSourceContext with conflicting includes/excludes over the wire,
the receiving node will accept it silently. Add the validation call here as well to
ensure consistency.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [91-95]

 public FetchSourceContext(StreamInput in) throws IOException {
     fetchSource = in.readBoolean();
     includes = in.readStringArray();
     excludes = in.readStringArray();
+    validateAmbiguousFields();
 }
Suggestion importance[1-10]: 6

__

Why: The StreamInput constructor bypasses validateAmbiguousFields(), which could allow deserialized instances with conflicting includes/excludes to bypass the new validation. Adding the call ensures consistency across all construction paths.

Low
General
Detect duplicate entries within the same source array

Duplicate entries within the same array (e.g., includes: ["field1", "field1"]) are
silently deduplicated by the LinkedHashSet. While the test testFetchSourceArray
shows this is intentional, there is no validation or warning when duplicates appear
within the same array. This could mask user mistakes. Consider throwing a
ParsingException or at least logging a warning when a duplicate entry is detected
within the same array.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [269-279]

 private static Set<String> parseSourceFieldArray(XContentParser parser, ParseField parseField, Set<String> opposite)
     throws IOException {
-    Set<String> sourceArr = new LinkedHashSet<>(); // include or exclude lists, LinkedHashSet preserves the order of fields
+    Set<String> sourceArr = new LinkedHashSet<>();
     while (parser.nextToken() != XContentParser.Token.END_ARRAY) {
         if (parser.currentToken() == XContentParser.Token.VALUE_STRING) {
             String entry = parser.text();
             if (opposite != null && opposite.contains(entry)) {
                 throw new ParsingException(parser.getTokenLocation(), AMBIGUOUS_FIELD_MESSAGE, entry);
             }
-            sourceArr.add(entry);
+            if (!sourceArr.add(entry)) {
+                throw new ParsingException(
+                    parser.getTokenLocation(),
+                    "Duplicate entry [" + entry + "] in [" + parseField.getPreferredName() + "] of _source."
+                );
+            }
Suggestion importance[1-10]: 3

__

Why: The test testFetchSourceArray explicitly verifies that duplicates are silently deduplicated, indicating this is intentional behavior. Throwing an exception for duplicates would be a behavioral change that contradicts the existing test design, making this suggestion potentially disruptive.

Low
Clarify empty-object validation logic and comment

The check includes.isEmpty() && excludes.isEmpty() only throws when both are empty,
but it won't catch the case where an unrecognized field name was provided (which
would leave both sets empty). However, a more critical issue is that this check will
also incorrectly reject a valid case where only excludes is provided without
includes (e.g., {"excludes": ["field1"]}). The condition should allow either one to
be non-empty, which it does — but the comment is misleading. More importantly, the
validation in validateAmbiguousFields() is called from the constructor, so if
parseSourceObject already validates ambiguity during parsing, the constructor will
also re-validate, which is redundant but not harmful. Consider verifying that the
empty-check logic correctly handles the case where only one of includes/excludes is
specified.

server/src/main/java/org/opensearch/search/fetch/subphase/FetchSourceContext.java [259-265]

 if (includes.isEmpty() && excludes.isEmpty()) {
-    // no valid field names -> empty or unrecognized fields; not allowed
+    // neither includes nor excludes were provided -> empty or unrecognized fields; not allowed
     throw new ParsingException(
         parser.getTokenLocation(),
         "Expected at least one of [" + INCLUDES_FIELD.getPreferredName() + "] or [" + EXCLUDES_FIELD.getPreferredName() + "]"
     );
 }
Suggestion importance[1-10]: 1

__

Why: The suggestion only changes a comment from "no valid field names -> empty or unrecognized fields; not allowed" to "neither includes nor excludes were provided -> empty or unrecognized fields; not allowed". The existing_code and improved_code are functionally identical, making this a trivial comment-only change with minimal impact.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for c2718e8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 418c9b7

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8f222c4

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 49d359c

@urmichm urmichm force-pushed the 20612-source-validation branch from 68fd820 to 5848b18 Compare March 25, 2026 16:27
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 5848b18: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 51e63e6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

urmichm and others added 16 commits March 26, 2026 08:46
Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
extract array parsing as its own function

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
parseSourceObject: split key-value process into different code-blocks

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>
include and exclude collections must not contain the same entries

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>
The same entry MUST NOT be present in both inludes AND excludes arrays.
This leads to ambiguity.

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
Explicitly defined source object '_source={}' not allowed

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>
Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
empty object validation updated to cover all possible cases

Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
toXContent to create valid object

Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>
Signed-off-by: Mikhail Urmich <m.urmich@jobware.de>
Signed-off-by: Mikhail Urmich <urmich.m@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cabbc32: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@urmichm urmichm force-pushed the 20612-source-validation branch from cabbc32 to fe8b28f Compare March 26, 2026 13:53
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for fe8b28f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers Search:Query Insights

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] [BUG] _source include / exclude validation

4 participants