Skip to content

[Feature Request] Improve performance of range queries #13566

@harshavamsi

Description

@harshavamsi

Is your feature request related to a problem? Please describe

I've been thinking of how we can improve the performance of certain types of range queries(non scoring to start with). Consider https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/big5/operations/default.json#L32C1-L61C7. These are types of queries a user might have when they first load up a dashboard. Give me all the events that took place in the last 30 days, last 24 hours, etc.

Range queries on timestamps are common use cases for such dashboard events. Typically these are non-scoring queries -- queries that don't have another clause in them that force scoring of documents. For example, if this range queries was used in conjunction/disjunction with a text query, we would need to score all the documents in their order of relevance. Scoring + filtering on a range is a time consuming event and since the introduction of Lucene's IndexOrDocValuesQuery, have become much faster thanks to the use of doc_values in certain cases.

For the non-scoring cases, I feel we can do better. Today Lucene scores all documents in a segment but collects only 10. OpenSearch enforces that we collect 10,000 documents. By default, a search without the size attribute returns 10 hits but can be scrolled to get up-to 10,000 hits. What if we only score 10,000 hits instead of all the documents in the segment? This could significantly speed up these non-scoring queries.

Describe the solution you'd like

Similar to how we use other Lucene classes in OpenSearch, we could override the PointRangeQuery class. During search time, we use the searchContext to figure out the query shape and if it a simple range query. If it fits the description, we use the size attribute to determine the number of documents to collect. If size > 10,000, we collect size else we collect min(size, 10,000).

We override Lucene's intesectVisitor to stop intersecting after collecting the number of hits we want and then start collecting. Then we can override the range queries in the field mappers to point to this new query type.

Related component

Search:Performance

Describe alternatives you've considered

Alternative is to do nothing. But this could yield promising results.

Additional context

No response

Metadata

Metadata

Assignees

Labels

RFCIssues requesting major changesRoadmap:SearchProject-wide roadmap labelSearch:PerformanceenhancementEnhancement or improvement to existing feature or requestv2.17.0v3.0.0Issues and PRs related to version 3.0.0

Type

No type

Projects

Status

✅ Done

Status

Done

Status

New

Status

2.17 (First RC 09/03, Release 09/17)

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions