Skip to content

[Backport 2.19] Fix case insensitive and escaped query on wildcard#20870

Merged
andrross merged 1 commit intoopensearch-project:2.19from
ShawnQiang1:backport/backport-16827-to-2.19
Mar 17, 2026
Merged

[Backport 2.19] Fix case insensitive and escaped query on wildcard#20870
andrross merged 1 commit intoopensearch-project:2.19from
ShawnQiang1:backport/backport-16827-to-2.19

Conversation

@ShawnQiang1
Copy link
Copy Markdown
Contributor

Backport of #16827.

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

ShawnQiang1 commented Mar 14, 2026

@andrross @msfroh please help merge this

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 14, 2026

PR Reviewer Guide 🔍

(Review updated until commit c08eeb0)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Fix case insensitive and escaped query on wildcard field

Relevant files:

  • server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java
  • rest-api-spec/src/main/resources/rest-api-spec/test/search/270_wildcard_fieldtype_queries.yml

Sub-PR theme: Update CHANGELOG for wildcard fix

Relevant files:

  • CHANGELOG.md

⚡ Recommended focus areas for review

Escape Logic

In performEscape, when a backslash is found followed by a character NOT in targetChars, the backslash itself is still appended to the output (since i is not incremented and sb.append(str.charAt(i)) appends the backslash). This means lone backslashes or backslashes before non-special characters are preserved as-is. Verify this is the intended behavior for both wildcard and regexp modes, especially for edge cases like \\ (escaped backslash) where \\ is in both WILDCARD_SPECIAL and REGEXP_SPECIAL.

private static String performEscape(String str, boolean regexpMode) {
    final StringBuilder sb = new StringBuilder();
    final Set<Character> targetChars = regexpMode ? REGEXP_SPECIAL : WILDCARD_SPECIAL;

    for (int i = 0; i < str.length(); i++) {
        if (str.charAt(i) == '\\' && (i + 1) < str.length()) {
            char c = str.charAt(i + 1);
            if (targetChars.contains(c)) {
                i++;
            }
        }
        sb.append(str.charAt(i));
    }
    return sb.toString();
SINGLE type match

For AUTOMATON_TYPE.SINGLE in wildcardQuery, the match predicate calls performEscape(finalValue, false) which strips escape characters from the query value. However, finalValue may already have been lowercased (via normalizer) before this point, and the comparison is done after lowercasing s. Verify that the escape stripping and case normalization interact correctly when both a normalizer and caseInsensitive=true are active simultaneously.

matchPredicate = s -> {
    if (caseInsensitive) {
        s = s.toLowerCase(Locale.ROOT);
    }
    return s.equals(performEscape(finalValue, false));
};
Wildcard match count

The wildcard match-all test was updated from 6 to 8 hits (adding documents with * and \\* as field values). Verify that a wildcard query value: "*" correctly matches documents whose field value is literally * or \\* — these are actual stored values, not wildcards in the query. The semantics here need careful validation since * in a wildcard query means "match all", so it should match all documents with the field present.

* ['l', 'u']
regexpQuery finalValue

In regexpQuery, the line final String finalValue = normalizer != null ? value = normalizer.normalize(name(), value).utf8ToString() : value; has a side-effect assignment (value = ...) inside the ternary expression. While functionally equivalent, this is a code smell and could be confusing. Also, finalValue is used in the SINGLE predicate with performEscape(finalValue, true) — verify that normalizing the value before passing it to RegExp and then using it in performEscape is correct, since the regexp engine may interpret the normalized value differently.

final String finalValue = normalizer != null ? value = normalizer.normalize(name(), value).utf8ToString() : value;
final boolean caseInsensitive = matchFlags == RegExp.ASCII_CASE_INSENSITIVE;

RegExp regExp = new RegExp(finalValue, syntaxFlags, matchFlags);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 14, 2026

PR Code Suggestions ✨

Latest suggestions up to c08eeb0

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix case-insensitive comparison for single regexp automaton

For the SINGLE automaton type in regexpQuery, the predicate compares the (possibly
lowercased) string s against performEscape(finalValue, true). However, when
caseInsensitive is true, s is lowercased but finalValue is not, so the comparison
will fail for mixed-case inputs. The escaped final value should also be lowercased
when caseInsensitive is true.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [625-632]

 } else if (compiledAutomaton.type == CompiledAutomaton.AUTOMATON_TYPE.SINGLE) {
     // when type equals SINGLE, #compiledAutomaton.runAutomaton is null
+    final String escapedFinalValue = caseInsensitive
+        ? performEscape(finalValue, true).toLowerCase(Locale.ROOT)
+        : performEscape(finalValue, true);
     regexpPredicate = s -> {
         if (caseInsensitive) {
             s = s.toLowerCase(Locale.ROOT);
         }
-        return s.equals(performEscape(finalValue, true));
+        return s.equals(escapedFinalValue);
     };
 }
Suggestion importance[1-10]: 7

__

Why: When caseInsensitive is true, s is lowercased but performEscape(finalValue, true) is not, causing the equality check to fail for mixed-case inputs. This is a real correctness bug in the case-insensitive regexp SINGLE type handling.

Medium
Avoid double-escaping backslashes in wildcard quoting

WILDCARD_SPECIAL includes '\' itself, so quoteWildcard will escape backslashes in
the input. This means a literal backslash in the value will be double-escaped, which
could cause incorrect matching when the escaped string is later processed by
performEscape. Verify that escaping backslashes here is intentional and consistent
with how performEscape unescapes them.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [592-601]

 private static String quoteWildcard(String str) {
     StringBuilder sb = new StringBuilder();
     for (int i = 0; i < str.length(); i++) {
-        if (WILDCARD_SPECIAL.contains(str.charAt(i))) {
+        char c = str.charAt(i);
+        if (c == '?' || c == '*') {
             sb.append('\\');
         }
-        sb.append(str.charAt(i));
+        sb.append(c);
     }
     return sb.toString();
 }
Suggestion importance[1-10]: 6

__

Why: Since WILDCARD_SPECIAL includes '\\', quoteWildcard will escape existing backslashes in the input, potentially causing double-escaping issues when performEscape later processes the result. The suggestion to only escape ? and * (not \\) may be more correct depending on the intended semantics, but the test cases in the PR suggest backslash escaping is intentional.

Low
General
Remove redundant side-effect assignment in ternary

The expression value = normalizer.normalize(name(), value).utf8ToString() assigns to
value as a side effect inside a ternary operator, which is confusing and unnecessary
since value is not used after this point. Simplify the assignment to avoid the
redundant side-effect assignment.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [613]

-final String finalValue = normalizer != null ? value = normalizer.normalize(name(), value).utf8ToString() : value;
+final String finalValue = normalizer != null ? normalizer.normalize(name(), value).utf8ToString() : value;
Suggestion importance[1-10]: 6

__

Why: The assignment value = normalizer.normalize(...) inside the ternary is a side-effect that's confusing and unnecessary since value isn't used after this line. Removing it improves readability without changing behavior.

Low

Previous suggestions

Suggestions up to commit edf92a8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix case-insensitive comparison for single regexp match

When compiledAutomaton.type == SINGLE, the automaton matches a single literal
string. For a regexp like AbCd with caseInsensitive=true, finalValue is "AbCd" but s
is lowercased to "abcd", so performEscape(finalValue, true) returns "AbCd" and the
comparison fails. The escaped value should also be lowercased when caseInsensitive
is true.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [627-632]

 regexpPredicate = s -> {
                 if (caseInsensitive) {
                     s = s.toLowerCase(Locale.ROOT);
                 }
-                return s.equals(performEscape(finalValue, true));
+                String target = performEscape(finalValue, true);
+                if (caseInsensitive) {
+                    target = target.toLowerCase(Locale.ROOT);
+                }
+                return s.equals(target);
             };
Suggestion importance[1-10]: 8

__

Why: When caseInsensitive is true, s is lowercased but performEscape(finalValue, true) returns the original case string, causing the equality check to fail for case-insensitive regexp queries. The test case "case insensitive regexp query on wildcard field" with SINGLE automaton type would fail without this fix.

Medium
Fix case-insensitive comparison for single wildcard match

When caseInsensitive is true, s is lowercased but performEscape(finalValue, false)
may return a mixed-case string (e.g., if finalValue is "\ABCD", the result is
"ABCD"), causing the comparison to fail. The unescaped finalValue should also be
lowercased when caseInsensitive is true.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [456-461]

 matchPredicate = s -> {
                 if (caseInsensitive) {
                     s = s.toLowerCase(Locale.ROOT);
                 }
-                return s.equals(performEscape(finalValue, false));
+                String target = performEscape(finalValue, false);
+                if (caseInsensitive) {
+                    target = target.toLowerCase(Locale.ROOT);
+                }
+                return s.equals(target);
             };
Suggestion importance[1-10]: 7

__

Why: When caseInsensitive is true and finalValue contains uppercase letters (e.g., escaped sequences like \\ABCD), the comparison between lowercased s and the non-lowercased result of performEscape(finalValue, false) would fail. However, in practice wildcards with SINGLE type typically don't have mixed-case escape sequences, so the impact may be limited.

Medium
Avoid double-escaping backslashes in term queries

WILDCARD_SPECIAL includes '\', so quoteWildcard will escape backslashes in the
input. This means a term query for the literal string * (stored as \ in the
index) would be double-escaped to \\\
, causing a mismatch. Backslashes that are
already part of an escape sequence should not be double-escaped, or the set should
not include '\' for quoteWildcard.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [592-601]

 private static String quoteWildcard(String str) {
         StringBuilder sb = new StringBuilder();
         for (int i = 0; i < str.length(); i++) {
-            if (WILDCARD_SPECIAL.contains(str.charAt(i))) {
+            char c = str.charAt(i);
+            if (c == '*' || c == '?') {
                 sb.append('\\');
             }
-            sb.append(str.charAt(i));
+            sb.append(c);
         }
         return sb.toString();
     }
Suggestion importance[1-10]: 7

__

Why: Since WILDCARD_SPECIAL includes '\\', quoteWildcard escapes backslashes in the input, which could cause double-escaping issues for strings containing backslashes. However, looking at the test cases, the PR seems to handle \\* correctly, so the actual impact depends on how the stored values and query values interact - this warrants careful review.

Medium
General
Remove unnecessary side-effect assignment in ternary

The expression value = normalizer.normalize(name(), value).utf8ToString() assigns to
value as a side effect inside a ternary operator, which is confusing and unnecessary
since value is not used after this point. Simplify this to a clean conditional
assignment without the side-effect assignment.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [613]

-final String finalValue = normalizer != null ? value = normalizer.normalize(name(), value).utf8ToString() : value;
+final String finalValue = normalizer != null ? normalizer.normalize(name(), value).utf8ToString() : value;
Suggestion importance[1-10]: 6

__

Why: The assignment value = normalizer.normalize(...) inside the ternary is a side-effect that's confusing and unnecessary since value isn't used after this point. The improved code is cleaner and more readable.

Low
Suggestions up to commit d871f6a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix case-insensitive comparison for single regexp automaton

For the SINGLE automaton type in regexpQuery, the predicate compares the (possibly
lowercased) string s against performEscape(finalValue, true). However, if
caseInsensitive is true, s is lowercased but finalValue is not, so the comparison
will fail for uppercase patterns. The escaped final value should also be lowercased
when caseInsensitive is true.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [625-632]

 } else if (compiledAutomaton.type == CompiledAutomaton.AUTOMATON_TYPE.SINGLE) {
     // when type equals SINGLE, #compiledAutomaton.runAutomaton is null
+    final String escapedFinalValue = caseInsensitive
+        ? performEscape(finalValue, true).toLowerCase(Locale.ROOT)
+        : performEscape(finalValue, true);
     regexpPredicate = s -> {
         if (caseInsensitive) {
             s = s.toLowerCase(Locale.ROOT);
         }
-        return s.equals(performEscape(finalValue, true));
+        return s.equals(escapedFinalValue);
     };
 }
Suggestion importance[1-10]: 8

__

Why: When caseInsensitive is true, s is lowercased but performEscape(finalValue, true) is not, causing the equality check to fail for uppercase patterns. This is a real correctness bug that would cause case-insensitive single-match regexp queries to return wrong results.

Medium
Prevent double-escaping of backslashes in wildcard quoting

WILDCARD_SPECIAL contains '\', so quoteWildcard escapes backslashes in the input.
If the input already contains escape sequences (e.g., * meaning a literal ), this
will double-escape them. The method should only escape
and ? characters (not
backslash), or handle pre-existing escape sequences to avoid incorrect
double-escaping.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [592-601]

 private static String quoteWildcard(String str) {
     StringBuilder sb = new StringBuilder();
     for (int i = 0; i < str.length(); i++) {
-        if (WILDCARD_SPECIAL.contains(str.charAt(i))) {
+        char c = str.charAt(i);
+        if (c == '*' || c == '?') {
             sb.append('\\');
         }
-        sb.append(str.charAt(i));
+        sb.append(c);
     }
     return sb.toString();
 }
Suggestion importance[1-10]: 7

__

Why: Since WILDCARD_SPECIAL includes '\\', quoteWildcard will escape backslashes in term values, but termQuery and termQueryCaseInsensitive receive raw field values (not wildcard patterns), so backslashes should not be escaped. This could cause term queries on values containing backslashes to fail to match.

Medium
General
Remove redundant side-effect assignment in ternary

The expression value = normalizer.normalize(name(), value).utf8ToString() assigns to
value as a side effect inside a ternary operator, which is confusing and unnecessary
since value is not used after this point. Simplify the assignment to avoid the
redundant side-effect assignment.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [613]

-final String finalValue = normalizer != null ? value = normalizer.normalize(name(), value).utf8ToString() : value;
+final String finalValue = normalizer != null ? normalizer.normalize(name(), value).utf8ToString() : value;
Suggestion importance[1-10]: 6

__

Why: The value = normalizer.normalize(...) assignment inside the ternary is a side-effect that's unnecessary since value isn't used after this point. Removing it improves clarity without changing behavior.

Low
Suggestions up to commit 654478f
CategorySuggestion                                                                                                                                    Impact
Possible issue
Use automaton term for SINGLE type regexp predicate

When compiledAutomaton.type == SINGLE, the automaton matches a single literal
string. For a regexp, finalValue is the raw regexp pattern (e.g., "AbCd"), not the
literal string it matches. Calling performEscape(finalValue, true) strips regexp
escape sequences from the pattern, but the result may still contain regexp
metacharacters rather than the actual literal value the automaton accepts. The
correct literal to compare against should be derived from the automaton itself
(e.g., compiledAutomaton.term.utf8ToString()) rather than processing the raw regexp
string.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [627-632]

 regexpPredicate = s -> {
                 if (caseInsensitive) {
                     s = s.toLowerCase(Locale.ROOT);
                 }
-                return s.equals(performEscape(finalValue, true));
+                return s.equals(compiledAutomaton.term.utf8ToString());
             };
Suggestion importance[1-10]: 7

__

Why: Using compiledAutomaton.term.utf8ToString() is more semantically correct for the SINGLE automaton type, as it directly gives the literal string the automaton matches rather than processing the raw regexp pattern through performEscape. This could fix incorrect matching for patterns with regexp metacharacters.

Medium
Avoid double-escaping backslashes in wildcard quoting

WILDCARD_SPECIAL includes '\' (backslash), so quoteWildcard will escape existing
backslashes in the input. This means a value like * (literal backslash-star)
becomes \\\ after quoting, which when passed to wildcardQuery and then
performEscape may not correctly round-trip. Only
and ? need to be escaped for
wildcard queries; backslashes that are already part of escape sequences should not
be double-escaped.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [592-601]

 private static String quoteWildcard(String str) {
         StringBuilder sb = new StringBuilder();
         for (int i = 0; i < str.length(); i++) {
-            if (WILDCARD_SPECIAL.contains(str.charAt(i))) {
+            char c = str.charAt(i);
+            if (c == '*' || c == '?') {
                 sb.append('\\');
             }
-            sb.append(str.charAt(i));
+            sb.append(c);
         }
         return sb.toString();
     }
Suggestion importance[1-10]: 7

__

Why: Since WILDCARD_SPECIAL includes '\\', quoteWildcard will escape backslashes in the input, potentially causing double-escaping issues when values containing backslashes are passed through wildcardQuery and performEscape. Only * and ? need escaping for wildcard queries, making this a valid correctness concern.

Medium
General
Remove unnecessary side-effect assignment in ternary

The expression value = normalizer.normalize(name(), value).utf8ToString() assigns to
value as a side effect inside a ternary operator, which is confusing and unnecessary
since value is not used after this point. Simplify this to a clean conditional
assignment without the side-effect assignment.

server/src/main/java/org/opensearch/index/mapper/WildcardFieldMapper.java [613]

-final String finalValue = normalizer != null ? value = normalizer.normalize(name(), value).utf8ToString() : value;
+final String finalValue = normalizer != null ? normalizer.normalize(name(), value).utf8ToString() : value;
Suggestion importance[1-10]: 6

__

Why: The assignment value = normalizer.normalize(...) inside the ternary is a side-effect that's confusing and unnecessary since value isn't used after this line. The improved_code correctly removes this side-effect while preserving the same behavior.

Low

@ShawnQiang1 ShawnQiang1 force-pushed the backport/backport-16827-to-2.19 branch from 654478f to d871f6a Compare March 14, 2026 07:45
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit d871f6a

@ShawnQiang1 ShawnQiang1 force-pushed the backport/backport-16827-to-2.19 branch from d871f6a to edf92a8 Compare March 14, 2026 07:57
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit edf92a8

…t#16827)

* fix case insensitive and escaped query on wildcard

Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com>

* add changelog

Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com>

---------

Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Co-authored-by: gesong.samuel <gesong.samuel@bytedance.com>
Co-authored-by: Michael Froh <froh@amazon.com>
(cherry picked from commit 5afb92f)
Signed-off-by: Shawn Qiang <814238703@qq.com>
@ShawnQiang1 ShawnQiang1 force-pushed the backport/backport-16827-to-2.19 branch from edf92a8 to c08eeb0 Compare March 14, 2026 08:03
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c08eeb0

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for c08eeb0: TIMEOUT

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

@andrross @msfroh please help merge this missing backport of 2.19

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for c08eeb0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for c08eeb0: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.98%. Comparing base (e582a51) to head (c08eeb0).
⚠️ Report is 4 commits behind head on 2.19.

Files with missing lines Patch % Lines
...g/opensearch/index/mapper/WildcardFieldMapper.java 71.42% 10 Missing and 6 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               2.19   #20870      +/-   ##
============================================
+ Coverage     71.95%   71.98%   +0.03%     
- Complexity    65989    66020      +31     
============================================
  Files          5342     5342              
  Lines        307360   307390      +30     
  Branches      44857    44862       +5     
============================================
+ Hits         221167   221288     +121     
+ Misses        67736    67624     -112     
- Partials      18457    18478      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

man! what can i say

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

@andrross

@ShawnQiang1
Copy link
Copy Markdown
Contributor Author

@msfroh

@andrross andrross merged commit 3954541 into opensearch-project:2.19 Mar 17, 2026
51 of 56 checks passed
@ShawnQiang1 ShawnQiang1 deleted the backport/backport-16827-to-2.19 branch March 18, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants