-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
Background
Coming from #18313 (comment) Identified a regression with Approximation framework especially with asc_sorts when the dataset is skewed (example http_logs). The nightly benchmark dashboard also shows the regression for asc_sort_timestamp .
While working on #14406 to add approximation for other numeric types and starting with Integer, after extending approximation and testing the http_logs asc_sort_size I have noticed it introduced a regression
| Heap used for points | | 0 | MB |
| Heap used for stored fields | | 0 | MB |
| Segment count | | 35 | |
| Min Throughput | asc_sort_size | 0.5 | ops/s |
| Mean Throughput | asc_sort_size | 0.5 | ops/s |
| Median Throughput | asc_sort_size | 0.5 | ops/s |
| Max Throughput | asc_sort_size | 0.5 | ops/s |
| 50th percentile latency | asc_sort_size | 32.4552 | ms |
| 90th percentile latency | asc_sort_size | 35.7239 | ms |
| 99th percentile latency | asc_sort_size | 57.71 | ms |
| 100th percentile latency | asc_sort_size | 58.15 | ms |
| 50th percentile service time | asc_sort_size | 30.0029 | ms |
| 90th percentile service time | asc_sort_size | 33.4095 | ms |
| 99th percentile service time | asc_sort_size | 54.7088 | ms |
| 100th percentile service time | asc_sort_size | 55.0696 | ms |
| error rate | asc_sort_size | 0 | % |
Seeing the nightly benchmark dashboards it clearly shows the default Lucene optimization is faster.
Overview
The current implementation of ApproximatePointRangeQuery in OpenSearch showed a good performance improvement for workloads like big5 but does not perform the same for skewed datasets like http_logs.
In fact on the skewed datasets like http_logs compared to ApproximatePointRangeQuery the Lucene's original PointRangeQuery, especially for ASC sort queries performed better because of Lucene's getInverseIntersectVisitor strategy as includes optimizations like fast path for dense matches which returns DocIdSetIterator.all(maxDoc), but with approximation this is not the case because even though we short circuit we still have to traverse through the dense nodes, rather than using Lucene return all docs and remove the non-matching ones.
For http_logs following is the split rations and notice the tree is skewed and unbalanced
(used the https://github.com/msfroh/lucene-university/pull/28/files BKDTreeInspector to inspect the tree)
Level split values (sampling):
Node 1 (Level 0): Range [897249601000 to 897854400000], Size: 36297158
Split value: 897585682000
Split ratios: left=0.56, right=0.44
Node 2 (Level 1): Range [897249601000 to 897585682000], Size: 19520000
Split value: 897504395000
Split ratios: left=0.76, right=0.24
Node 4 (Level 2): Range [897249601000 to 897504395000], Size: 11131392
Split value: 897485609000
Split ratios: left=0.93, right=0.07
Node 8 (Level 3): Range [897249601000 to 897485609000], Size: 6937088
Split value: 897413980000
Split ratios: left=0.70, right=0.30
Node 16 (Level 4): Range [897249601000 to 897413980000], Size: 4194304
But notice for big5 the dataset is not skewed
Level split values (sampling):
Node 1 (Level 0): Range [1673571600000 to 1673575198000], Size: 372372
Split value: 1673573936000
Split ratios: left=0.65, right=0.35
Node 2 (Level 1): Range [1673571600000 to 1673573936000], Size: 241664
Split value: 1673572867000
Split ratios: left=0.54, right=0.46
Node 4 (Level 2): Range [1673571600000 to 1673572867000], Size: 131072
Split value: 1673572235000
Split ratios: left=0.50, right=0.50
Node 8 (Level 3): Range [1673571600000 to 1673572235000], Size: 65536
Split value: 1673571917000
Split ratios: left=0.50, right=0.50
Node 16 (Level 4): Range [1673571600000 to 1673571917000], Size: 32768
Split value: 1673571759000
Split ratios: left=0.50, right=0.50
Node 32 (Level 5): Range [1673571600000 to 1673571759000], Size: 16384
Split value: 1673571681000
Split ratios: left=0.51, right=0.49
Describe the solution you'd like
Example tree
N0 (1000 docs)
/ \
N1 (900) N2 (100)
/ \
N3 (800) L3 (100)
/ \
L1(600) L2(200)
The existing code has a custom traversal the intersectLeft method that was BFS-like in nature. This method would traverse level by level, always exploring the “left” subtree first at each branch, but effectively still scanning broad portions of the tree before diving deeper.
DFS-based strategy naturally produces results in sorted order for numeric fields. It will immediately descend to the leftmost leaf (the smallest values, for timestamp fields its oldest) and start visiting documents. If, say, the query only needs the top 100 results, the DFS can find those in the leftmost leaves before visiting higher-value regions. As soon as the collector has gathered 100 docs, it can trigger an early termination. The traversal can then be aborted without ever exploring large portions of the tree that correspond to higher timestamp values. In other words, DFS yields earlier and more efficient short-circuiting because it finds the required values first.
N0
└── N1
└── N3
└── L1 (collect 100 docs)
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status