Skip to content

[BUG] Null values matched in query_string with asterisks #21280

@mmippolito

Description

@mmippolito

Describe the bug

I used the index template below to create an index, along with a single document for testing. The document is indexed such that only the digits are kept; for example, indexing a value like “aaa123” would only retain the “123” portion.

Likewise, if a user searches for something like “zzz999”, the only token generated should be “999”.

This works for the most part. But if the user enters a query_string that contains an asterisk, then the document is returned regardless of whether it actually matches. For example, searching for:

*asdf*

produces a match with the document below, which has a value of 12345. This is in contrast to the two _analyze functions listed below, which produce tokens that don’t match each other.

Thanks in advance for looking at this issue.

Related component

Search

To Reproduce

Create an index as follows:
PUT ds1 { "settings": { "analysis": { "char_filter": { "strip_nondigits": { "type": "pattern_replace", "pattern": "\\D", "replacement": "" } }, "filter": { "remove_empty_tokens": { "type": "length", "min": 1 }, "replace_empty_with_null": { "type": "pattern_replace", "pattern": "^$", "replacement": "<NULL>" } }, "analyzer": { "special_number_analyzer": { "type": "custom", "char_filter": [ "strip_nondigits" ], "tokenizer": "keyword", "filter": [ "remove_empty_tokens" ] }, "special_number_analyzer_search": { "type": "custom", "char_filter": [ "strip_nondigits" ], "tokenizer": "keyword", "filter": [ "replace_empty_with_null" ] } } } }, "mappings": { "properties": { "special_number_field": { "type": "text", "analyzer": "special_number_analyzer", "search_analyzer": "special_number_analyzer_search" } } } }

Add a test document:
POST /_bulk?refresh=true { "index": { "_index": "ds1"} } { "special_number_field": "1234" }

Test with analyzer:
GET ds1/_analyze { "text": "*asdf*", "analyzer": "special_number_analyzer" }

correctly produces:
{ "tokens": [] }

Test with search_anlyzer:
GET ds1/_analyze { "text": "*asdf*", "analyzer": "special_number_analyzer_search" }

correctly produces:

{ "tokens": [ { "token": "<NULL>", "start_offset": 6, "end_offset": 6, "type": "word", "position": 0 } ]

The bug occurs when you run this query:
GET ds1/_search { "query": { "query_string": { "query": "*asdf*", "analyzer": "special_number_analyzer" } } }

produces this unwanted hit:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 1, "hits": [ { "_index": "ds1", "_id": "vB0QnJ0Bpkf8R5zRYeIl", "_score": 1, "_source": { "special_number_field": "1234" } } ] } }

Likewise, with the search_analyzer:
GET ds1/_search { "query": { "query_string": { "query": "*asdf*", "analyzer": "special_number_analyzer_search" } } }

produces this unwanted hit:
{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 1, "hits": [ { "_index": "ds1", "_id": "vB0QnJ0Bpkf8R5zRYeIl", "_score": 1, "_source": { "special_number_field": "1234" } } ] } }

Expected behavior

The expected behaviour is that no hits should be produced, since the analyzer does not produce any tokens. If an analyzer either produces no tokens or produces tokens that do not match the tokens of an indexed document, the query_string should not return any hits regardless of whether an asterisk is used in the search.

Additional Details

Plugins
n/a

Screenshots
n/a

Host/Environment (please complete the following information):

  • OpenSearch Version 2.19.4

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcbugSomething isn't workinguntriaged

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions