Skip to content

[Concurrent Segment Search] Doc count error needs to be computed at the slice level #11680

@jed326

Description

@jed326

Is your feature request related to a problem? Please describe

For #9246 we forced doc count error to be 0 during the shard level reduce phase as we were not eliminating any buckets at that stage. However, this logic was changed to use a slice_size = shard_size * 1.5 + 10 heuristic as a part of #11585. This means that it's now possible to eliminate bucket candidates during the shard level reduce so the doc count error needs to be calculated accordingly in those cases.

As an example, take this agg from the noaa OSB workload:

{
  "size": 0,
  "aggs": {
    "station": {
      "terms": {
        "field": "station.elevation",
        "size": 50
      },
      "aggs": {
        "date": {
          "terms": {
            "field": "date",
            "size": 1
          },
          "aggs": {
            "max": {
              "max": {
                "field": "TMAX"
              }
            }
          }
        }
      }
    }
  }
}

The "date" aggregation uses size = 1, so the computed slice_size heuristic will be 26 which is fairly small compared to the cardinality of the "date" field.

Attaching the aggregation outputs with concurrent search enabled/disabled:
cs-disabled.txt
cs-enabled.txt

Describe the solution you'd like

Doc count error needs to be calculated in a way that includes the buckets eliminated at the slice level reduce.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Search:PerformanceenhancementEnhancement or improvement to existing feature or requestv2.12.0Issues and PRs related to version 2.12.0v3.0.0Issues and PRs related to version 3.0.0

Type

No type

Projects

Status

Done

Status

Planned work items

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions