[Feature Request] Improve `ApproximatePointRangeQuery` Traversal for Skewed Datasets with DFS Strategy

### Is your feature request related to a problem? Please describe

### Background
Coming from https://github.com/opensearch-project/OpenSearch/issues/18313#issuecomment-2887636353 Identified a regression with Approximation framework especially with asc_sorts when the dataset is skewed (example http_logs). The [nightly benchmark dashboard](https://benchmarks.opensearch.org/app/dashboards#/view/ae971a70-0715-11ee-a68c-1330e5a77b4c?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(description:'',filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'333e2950-edec-11ed-a453-51f8c622cf9a',key:operation,negate:!f,params:(query:asc_sort_timestamp),type:phrase),query:(match_phrase:(operation:asc_sort_timestamp)))),fullScreenMode:!f,options:(hidePanelTitles:!f,useMargins:!t),query:(language:kuery,query:''),timeRestore:!f,title:'HTTP%20Logs',viewMode:view)) also shows the regression for `asc_sort_timestamp` .

While working on https://github.com/opensearch-project/OpenSearch/issues/14406 to add approximation for other numeric types and starting with `Integer`, after extending approximation and testing the  `http_logs` `asc_sort_size` I have noticed it introduced a regression 

```
|                                           Heap used for points |               |           0 |     MB |
|                                    Heap used for stored fields |               |           0 |     MB |
|                                                  Segment count |               |          35 |        |
|                                                 Min Throughput | asc_sort_size |         0.5 |  ops/s |
|                                                Mean Throughput | asc_sort_size |         0.5 |  ops/s |
|                                              Median Throughput | asc_sort_size |         0.5 |  ops/s |
|                                                 Max Throughput | asc_sort_size |         0.5 |  ops/s |
|                                        50th percentile latency | asc_sort_size |     32.4552 |     ms |
|                                        90th percentile latency | asc_sort_size |     35.7239 |     ms |
|                                        99th percentile latency | asc_sort_size |       57.71 |     ms |
|                                       100th percentile latency | asc_sort_size |       58.15 |     ms |
|                                   50th percentile service time | asc_sort_size |     30.0029 |     ms |
|                                   90th percentile service time | asc_sort_size |     33.4095 |     ms |
|                                   99th percentile service time | asc_sort_size |     54.7088 |     ms |
|                                  100th percentile service time | asc_sort_size |     55.0696 |     ms |
|                                                     error rate | asc_sort_size |           0 |      % |

```
Seeing the [nightly benchmark dashboards ](https://benchmarks.opensearch.org/app/dashboards#/view/ae971a70-0715-11ee-a68c-1330e5a77b4c?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(description:'',filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'333e2950-edec-11ed-a453-51f8c622cf9a',key:operation,negate:!f,params:(query:asc_sort_timestamp),type:phrase),query:(match_phrase:(operation:asc_sort_timestamp)))),fullScreenMode:!f,options:(hidePanelTitles:!f,useMargins:!t),query:(language:kuery,query:''),timeRestore:!f,title:'HTTP%20Logs',viewMode:view)) it clearly shows the default Lucene optimization is faster.

### Overview 
The current implementation of `ApproximatePointRangeQuery` in OpenSearch showed a good performance improvement for workloads like `big5` but does not perform the same for skewed datasets like `http_logs`.

In fact on the skewed datasets like http_logs compared to `ApproximatePointRangeQuery` the Lucene's  original `PointRangeQuery`, especially for ASC sort queries performed better because of Lucene's `getInverseIntersectVisitor` strategy as includes optimizations like fast path for dense matches which returns `DocIdSetIterator.all(maxDoc)`, but with approximation this is not the case because even though we short circuit we still have to traverse through the dense nodes, rather than using Lucene return all docs and remove the non-matching ones.

For `http_logs` following is the split rations and notice the tree is skewed and unbalanced 

(used the https://github.com/msfroh/lucene-university/pull/28/files `BKDTreeInspector` to inspect the tree)

```
Level split values (sampling):
Node 1 (Level 0): Range [897249601000 to 897854400000], Size: 36297158
  Split value: 897585682000
  Split ratios: left=0.56, right=0.44
Node 2 (Level 1): Range [897249601000 to 897585682000], Size: 19520000
  Split value: 897504395000
  Split ratios: left=0.76, right=0.24
Node 4 (Level 2): Range [897249601000 to 897504395000], Size: 11131392
  Split value: 897485609000
  Split ratios: left=0.93, right=0.07
Node 8 (Level 3): Range [897249601000 to 897485609000], Size: 6937088
  Split value: 897413980000
  Split ratios: left=0.70, right=0.30
Node 16 (Level 4): Range [897249601000 to 897413980000], Size: 4194304
```

But notice for `big5` the dataset is not skewed
```
Level split values (sampling):
Node 1 (Level 0): Range [1673571600000 to 1673575198000], Size: 372372
  Split value: 1673573936000
  Split ratios: left=0.65, right=0.35
Node 2 (Level 1): Range [1673571600000 to 1673573936000], Size: 241664
  Split value: 1673572867000
  Split ratios: left=0.54, right=0.46
Node 4 (Level 2): Range [1673571600000 to 1673572867000], Size: 131072
  Split value: 1673572235000
  Split ratios: left=0.50, right=0.50
Node 8 (Level 3): Range [1673571600000 to 1673572235000], Size: 65536
  Split value: 1673571917000
  Split ratios: left=0.50, right=0.50
Node 16 (Level 4): Range [1673571600000 to 1673571917000], Size: 32768
  Split value: 1673571759000
  Split ratios: left=0.50, right=0.50
Node 32 (Level 5): Range [1673571600000 to 1673571759000], Size: 16384
  Split value: 1673571681000
  Split ratios: left=0.51, right=0.49
```

### Describe the solution you'd like

### Example tree
```
                  N0 (1000 docs)
                 /             \
         N1 (900)              N2 (100)
        /      \
  N3 (800)     L3 (100)
   /     \
L1(600)  L2(200)
```

The existing code has a custom traversal the `intersectLeft` method that was BFS-like in nature. This method would traverse level by level, always exploring the “left” subtree first at each branch, but effectively still scanning broad portions of the tree before diving deeper.

DFS-based strategy naturally produces results in sorted order for numeric fields. It will immediately descend to the leftmost leaf (the smallest values, for `timestamp` fields its oldest) and start visiting documents. If, say, the query only needs the top 100 results, the DFS can find those in the leftmost leaves before visiting higher-value regions. As soon as the collector has gathered 100 docs, it can trigger an early termination. The traversal can then be aborted without ever exploring large portions of the tree that correspond to higher timestamp values. In other words, DFS yields earlier and more efficient short-circuiting because it finds the required values first. 

```
N0
└── N1
    └── N3
        └── L1  (collect 100 docs)
```

### Related component

Search:Performance

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Improve `ApproximatePointRangeQuery` Traversal for Skewed Datasets with DFS Strategy #18341

Is your feature request related to a problem? Please describe

Background

Overview

Describe the solution you'd like

Example tree

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Improve ApproximatePointRangeQuery Traversal for Skewed Datasets with DFS Strategy #18341

Description

Is your feature request related to a problem? Please describe

Background

Overview

Describe the solution you'd like

Example tree

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature Request] Improve `ApproximatePointRangeQuery` Traversal for Skewed Datasets with DFS Strategy #18341