[Concurrent Segment Search] Perform buildAggregation in parallel

### Is your feature request related to a problem? Please describe

In the current concurrent search paradigm we will create `slice_count` times the number of collectors compared to non-concurrent search and for these collectors `collect` will be called in concurrently. Afterwards, `reduce` is called on the `search` threadpool sequentially for all of these collectors.

There are 2 main scenarios where this sequential behavior can be problematic -- [1] whenever the aggregators are nested and `buildAggregation` needs to BFS/DFS through the collector tree and [2] when `buildAggregation` itself is an expensive operation for the given collector.

One such example of this is the [`keyword-terms-numeric-terms`](https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/f749caadf5091bc5009f6f672f4f980724e326bf/noaa/operations/default.json#L823-L850), which is a nested terms aggregation. Even in the non-concurrent search case nested terms aggregations will suffer from combinatorial explosion of buckets for each additional nested layer and for concurrent search this problem is essentially multiplied by `slice_count` as the bucket creation during `buildAggregation` is done sequentially. The combinatorial explosion was partially addressed by #11585 however the sequential work is still a bottleneck.

Here is some basic query profiler breakdown to further illustrate this point:


Metric | weather-data-2016 - disabled - size=500 | (in ms) | weather-data-2016 - slice=2 - slice_size=1150 (1.5 * shard_size + 10 — size=500) | (in ms)
-- | -- | -- | -- | --
Took | 10918 |   | 17567 |  
ConstantScoreQuery — time_in_nanos | 3021298801 | 3021.2988 | 8023 |  
MatchAllDocsQuery — time_in_nanos | 1087697018 | 1087.69702 | 2338 |  
rewrite_time | 27406 |   | 10733 |  
collector — time_in_nanos | 4758597960 | 4758.59796 | 3919422108 | 3919.42211
GlobalOrdinalsStringTermsAggregator — time_in_nanos | 7655394436 | 7655.39444 | 9238307858 | 9238.30786
collect | 2670211530 | 2670.21153 | 2349533379 | 2349.53338
avg_collect |   |   | 2223026375 | 2223.02638
collect_count | 33659481 |   | 33659481 |  
build_aggregation | 4976192164 | 4976.19216 | 12500064156 | 12500.06416
avg_build_aggregation |   |   | 6250026755 | 6250.02676
build_aggregation_count | 1 |   | 2 |  
NumericTermsAggregator — time_in_nanos | 4167152226 | 4167.15223 | 5953806596 | 5953.8066
collect | 281349935 | 281.34994 | 7061608537 | 7061.60854
avg_collect |   |   | 271087497 | 271.0875
collect_count | 832960 |   | 1329446 |  
build_aggregation | 3885328997 | 3885.329 | 11216763404 | 11216.7634
avg_build_aggregation |   |   | 5220390427 | 5220.39043
build_aggregation_count | 1 |   | 2 |  
MaxAggregator — time_in_nanos | 540335 | 0.54034 | 2229363 | 2.22936
collect | 458491 | 0.45849 | 5622515353 | 5622.51535
avg_collect |   |   | 1576283 | 1.57628
collect_count | 8360 |   | 59800 |  
build_aggregation | 69834 | 0.06983 | 5617813957 | 5617.81396
avg_build_aggregation |   |   | 251549 | 0.25155
build_aggregation_count | 1 |   | 2

We can see that for the `NumericTermsAggregator` `build_aggregation` is taking 3-4x as long and happening sequentially as well as the combinatorial explosion in the `collect_count`.


### Describe the solution you'd like

The additional combinatorial explosion is due to `slice_size` > `shard_size`, which is something we can revisit (ie does `slice_size` really need to be `1.5*shard_size + 10` or can `slice_size == shard_size`?). However the long pole is actually the `build_aggregation` as a whole taking a long time as happening sequentially. I propose that we move the `buildAggregation` steps to `processPostCollection` so that it can happen in parallel on the `index_searcher` thread.

I will follow-up with a PR with this change as well as some benchmarking data to further discuss.

### Related component

Search:Performance

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Concurrent Segment Search] Perform buildAggregation in parallel #11673

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	weather-data-2016 - disabled - size=500	(in ms)	weather-data-2016 - slice=2 - slice_size=1150 (1.5 * shard_size + 10 — size=500)	(in ms)
Took	10918		17567
ConstantScoreQuery — time_in_nanos	3021298801	3021.2988	8023
MatchAllDocsQuery — time_in_nanos	1087697018	1087.69702	2338
rewrite_time	27406		10733
collector — time_in_nanos	4758597960	4758.59796	3919422108	3919.42211
GlobalOrdinalsStringTermsAggregator — time_in_nanos	7655394436	7655.39444	9238307858	9238.30786
collect	2670211530	2670.21153	2349533379	2349.53338
avg_collect			2223026375	2223.02638
collect_count	33659481		33659481
build_aggregation	4976192164	4976.19216	12500064156	12500.06416
avg_build_aggregation			6250026755	6250.02676
build_aggregation_count	1		2
NumericTermsAggregator — time_in_nanos	4167152226	4167.15223	5953806596	5953.8066
collect	281349935	281.34994	7061608537	7061.60854
avg_collect			271087497	271.0875
collect_count	832960		1329446
build_aggregation	3885328997	3885.329	11216763404	11216.7634
avg_build_aggregation			5220390427	5220.39043
build_aggregation_count	1		2
MaxAggregator — time_in_nanos	540335	0.54034	2229363	2.22936
collect	458491	0.45849	5622515353	5622.51535
avg_collect			1576283	1.57628
collect_count	8360		59800
build_aggregation	69834	0.06983	5617813957	5617.81396
avg_build_aggregation			251549	0.25155
build_aggregation_count	1		2

[Concurrent Segment Search] Perform buildAggregation in parallel #11673

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions