-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the bug
My application is currently using AWS Elasticsearch 7.10 but we are working on upgrading it to Opensearch (ideally, to the latest release--2.8.0). In OpenSearch 2.8.0, when the _missing: _last option is used on a sort, it can omit documents that have a null value for the sort field.
I narrowed the issue down to a short script that reproduces (included below) and ran it on different versions of OpenSearch. It appears that OpenSearch's behavior matched Elasticsearch 7.10 from OpenSearch 1.0.0 through 2.7.0, but in 2.8.0 there's a regression and it has this bug.
To Reproduce
- Put this in
opensearch_regression:
#!/usr/bin/env bash
url_root=$1
echo 'Opensearch Version:'
curl -is $url_root | grep number
echo
echo 'Deleting example_index to have a clean slate...'
curl -X DELETE -H 'Content-type: application/json' $url_root/example_index?ignore_unavailable=true
echo
echo
echo 'Creating example_index...'
curl -X PUT -H 'Content-type: application/json' $url_root/example_index --data-binary @- << EOF
{"settings":{"index.number_of_shards":10},"mappings":{"dynamic":"strict","properties":{"id":{"type":"keyword"},"amount_cents":{"type":"integer"}},"_routing":{"required":true}}}
EOF
echo
echo
echo 'Indexing documents...'
curl -X POST -H 'Content-type: application/x-ndjson' $url_root/_bulk?refresh=true --data-binary @- << EOF
{"index":{"_index":"example_index","_id":"1","routing":"epmshv"}}
{"id":"1","amount_cents":100}
{"index":{"_index":"example_index","_id":"4","routing":"abowtm"}}
{"id":"4","amount_cents":250}
{"index":{"_index":"example_index","_id":"5","routing":"ewgphx"}}
{"id":"5","amount_cents":300}
{"index":{"_index":"example_index","_id":"3","routing":"weooyf"}}
{"id":"3","amount_cents":200}
{"index":{"_index":"example_index","_id":"2","routing":"pduylo"}}
{"id":"2","amount_cents":150}
{"index":{"_index":"example_index","_id":"with_null","routing":"jhsycu"}}
{"id":"with_null","amount_cents":null}
EOF
echo
echo
echo 'Searching...'
curl -X POST -H 'Content-type: application/json' $url_root/example_index/_search?filter_path=hits.hits._id --data-binary @- << EOF
{"size":5,"sort":[{"amount_cents":{"order":"desc","missing":"_last"}},{"id":{"order":"desc","missing":"_last"}}],"search_after":[150,"2"],"track_total_hits":false,"_source":false}
EOF
echo- Make the file executable (
chmod +x opensearch_regression). - Run it against OpenSearch 2.7.0, passing it the URL for a 2.7.0 cluster. Here's the output when I run it on my machine:
$ opensearch_regression http://localhost:9234
Opensearch Version:
"number" : "2.7.0",
Deleting example_index to have a clean slate...
{"acknowledged":true}
Creating example_index...
{"acknowledged":true,"shards_acknowledged":true,"index":"example_index"}
Indexing documents...
{"took":98,"errors":false,"items":[{"index":{"_index":"example_index","_id":"1","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"4","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"5","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"3","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"2","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"with_null","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1,"status":201}}]}
Searching...
{"hits":{"hits":[{"_id":"1"},{"_id":"with_null"}]}}
Notice that the search results include both id 1 and id with_null.
- Run it against OpenSearch 2.8.0, passing it the URL for a 2.8.0 cluster. Here's the output when I run it on my machine:
Opensearch Version:
"number" : "2.8.0",
Deleting example_index to have a clean slate...
{"acknowledged":true}
Creating example_index...
{"acknowledged":true,"shards_acknowledged":true,"index":"example_index"}
Indexing documents...
{"took":63,"errors":false,"items":[{"index":{"_index":"example_index","_id":"1","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"4","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"5","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"3","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"2","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}},{"index":{"_index":"example_index","_id":"with_null","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1,"status":201}}]}
Searching...
{"hits":{"hits":[{"_id":"1"}]}}
Notice that the with_null result was omitted.
Expected behavior
As on Elasticsearch 7.10 and OpenSearch 1.0.0-2.7.0, I expect documents with null values for sort fields to be included in sorted search responses that use _missing: _last.
Plugins
Here's the Dockerfile I'm using to boot OpenSearch locally:
ARG VERSION
FROM opensearchproject/opensearch:${VERSION}
RUN /usr/share/opensearch/bin/opensearch-plugin remove opensearch-security
RUN /usr/share/opensearch/bin/opensearch-plugin install --batch mapper-size
It installs the mapper-size plugin and removes the opensearch-security plugin.
Screenshots
N/A
Host/Environment (please complete the following information):
- OS: MacOS Ventura
- Version: 13.4 (22F66)
Additional context
When I was narrowing this down to a reproducible script, I could not always consistently reproduce the issue on OpenSearch 2.8.0, depending on my index configuration. Specifically, these bits of index configuration seemed to matter:
settings.index.number_of_shards: 10-- when I removed this the bug wouldn't reproduce._routing.required: true-- when I removed this (and the_routingvalues in the_bulkcall) the bug wouldn't reproduce.
I suspect that the reproducibility of this bug depends on the example documents residing on specific shards relative to each other, and by changing either of those settings it changed what shards the example docs go on.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status