Skip to content

Concurrent Searching #1286

@anasalkouz

Description

@anasalkouz

Is your feature request related to a problem? Please describe.
At least since Apache Lucene 6.x, there is a new experimental low-level API which allows to parallelize execution of the search across segments [3]. As of latest Apache Lucene 8.10.1, the API is still marked as experimental (see please [1]). The community feedback on this feature is looking positive so far (see please [2]), there are high chances that for certain kind of indices parallelizing the search over segments could bring performance benefits.

[1] https://lucene.apache.org/core/8_10_1/core/org/apache/lucene/search/IndexSearcher.html#search-org.apache.lucene.search.Query-org.apache.lucene.search.CollectorManager-
[2] https://engineeringblog.yelp.com/2021/09/nrtsearch-yelps-fast-scalable-and-cost-effective-search-engine.html
[3] https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html

Describe the solution you'd like
From the essential parts, since the API is experimental, it should be controlled by the setting and have allocated a dedicated configurable thread pool:

  • "search.allow_concurrent_segment_search", default value is false
  • "index_searcher" thread pool (default number of threads == number of cores)

The change, although quite complex, is mostly isolated in the QueryPhase and QueryCollectorContext (and surrounding classes).

Describe alternatives you've considered
N/A

Additional context
Currently, the search implementation implies sequential flow, the results are accumulated by individual collectors (backed by collector contexts) and post processed at the end. It has to be changed to use CollectorManagers and reducers instead to assemble the final query results.

The impediments: early termination and time-bounded search are exception driven. This is difficult to replicate as-is, in this case the flow is interrupted and the reducers are not available.

It would make sense to come up with the benchmarks to compare the sequential and parallel segment search and have a proof when each of those would be useful. Also, once such proof is collected, the engine itself may provide the hints at runtime to recommend switching the feature on/off (probably, on per-index basis).

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions