Limit the number of concurrent requests per node#31206
Limit the number of concurrent requests per node#31206s1monw merged 5 commits intoelastic:masterfrom
Conversation
With `max_concurrent_shard_requests` we used to throttle / limit the number of concurrent shard requests a high level search request can execute per node. This had several problems since it limited the number on a global level based on the number of nodes. This change now throttles the number of concurrent requests per node while still allowing concurrency across multiple nodes. Closes elastic#31192
|
Pinging @elastic/es-search-aggs |
…oose it in a race
jpountz
left a comment
There was a problem hiding this comment.
It looks like you had a search/replace issue but I like it in general. Given that it is an expert option I don't think we need sophisticated backward compatibility, let's just add a note about it in the breaking changes for 7.0?
|
|
||
| synchronized void add(Runnable runnable) { | ||
| if (queue == null) { // create this lazily | ||
| queue = new LinkedList<>(); |
There was a problem hiding this comment.
not that it matters much but ArrayDeque should be better?
| } | ||
|
|
||
| synchronized void add(Runnable runnable) { | ||
| if (queue == null) { // create this lazily |
There was a problem hiding this comment.
why does it need to be lazy?
| String[] aliases = aliasFilter.getAliases(); | ||
| String[] finalIndices = aliases.length == 0 ? new String[] {shardId.getIndexName()} : aliases; | ||
| // here we have to map the filters to the UUID since from now on we use the uuid for the lookup | ||
| // here we have to pendingExecutionsPerNode the filters to the UUID since from now on we use the uuid for the lookup |
| try { | ||
| executePhaseOnShard(shardIt, shard, new SearchActionListener<FirstResult>(new SearchShardTarget(shard | ||
| .currentNodeId(), | ||
| shardIt.shardId(), shardIt.getClusterAlias(), shardIt.getOriginalIndices()), shardIndex) { |
There was a problem hiding this comment.
indentation of the two above lines doesn't help readability
| * each shard. If profiling was not enabled, this will return null | ||
| * | ||
| * @return The profile results or an empty map | ||
| * @return The profile results or an empty pendingExecutionsPerNode |
| /** | ||
| * Returns the profile results for this search response (including all shards). | ||
| * An empty map is returned if profiling was not enabled | ||
| * An empty pendingExecutionsPerNode is returned if profiling was not enabled |
| /** | ||
| * Return a map of nodeId to pending number of search requests. | ||
| * This is a snapshot of the current pending search and not a live map. | ||
| * Return a pendingExecutionsPerNode of nodeId to pending number of search requests. |
| fork(() -> onShardFailure(shardIndex, shard, shard.currentNodeId(), shardIt, e)); | ||
| } | ||
| }; | ||
| if (pendingExecutions == null || pendingExecutions.tryAcquire()) { |
There was a problem hiding this comment.
maybe remove this tryAcquire and go through the else block directly? I think it's simpler if the management of the semaphore permits are contained in PendingExecutions?
|
@jpountz I addressed your comments |
* master: Remove RestGetAllAliasesAction (#31308) Temporary fix for broken build Reenable Checkstyle's unused import rule (#31270) Remove remaining unused imports before merging #31270 Fix non-REST doc snippet [DOC] Extend SQL docs Immediately flush channel after writing to buffer (#31301) [DOCS] Shortens ML API intros Use quotes in the call invocation (#31249) move security ingest processors to a sub ingest directory (#31306) Add 5.6.11 version constant. Fix version detection. SQL: Whitelist SQL utility class for better scripting (#30681) [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (#31299) CCS: don't proxy requests for already connected node (#31273) Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap [test] opensuse packaging turn up debug logging Add unreleased version 6.3.1 Removes experimental tag from scripted_metric aggregation (#31298) [Rollup] Metric config parser must use builder so validation runs (#31159) [ML] Check licence when datafeeds use cross cluster search (#31247) Add notion of internal index settings (#31286) Test: Remove broken yml test feature (#31255) REST hl client: cluster health to default to cluster level (#31268) [ML] Update test thresholds to account for changes to memory control (#31289) Log warnings when cluster state publication failed to some nodes (#31233) Fix AntFixture waiting condition (#31272) Ignore numeric shard count if waiting for ALL (#31265) [ML] Implement new rules design (#31110) index_prefixes back-compat should test 6.3 (#30951) Core: Remove plain execute method on TransportAction (#30998) Update checkstyle to 8.10.1 (#31269) Set analyzer version in PreBuiltAnalyzerProviderFactory (#31202) Modify pipelining handlers to require full requests (#31280) Revert upgrade to Netty 4.1.25.Final (#31282) Use armored input stream for reading public key (#31229) Fix Netty 4 Server Transport tests. Again. REST hl client: adjust wait_for_active_shards param in cluster health (#31266) REST high-level Client: remove deprecated API methods (#31200) [DOCS] Mark SQL feature as experimental [DOCS] Updates machine learning custom URL screenshots (#31222) Fix naming conventions check for XPackTestCase Fix security Netty 4 transport tests Fix race in clear scroll (#31259) [DOCS] Clarify audit index settings when remote indexing (#30923) Delete typos in SAML docs (#31199) REST high-level client: add Cluster Health API (#29331) [ML][TEST] Mute tests using rules (#31204) Support RequestedAuthnContext (#31238) SyncedFlushResponse to implement ToXContentObject (#31155) Add Get Aliases API to the high-level REST client (#28799) Remove some line length supressions (#31209) Validate xContentType in PutWatchRequest. (#31088) [INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024) Suppress extras FS on caching directory tests Revert "[DOCS] Added 6.3 info & updated the upgrade table. (#30940)" Revert "Fix snippets in upgrade docs" Fix snippets in upgrade docs [DOCS] Added 6.3 info & updated the upgrade table. (#30940) LLClient: Support host selection (#30523) Upgrade to Netty 4.1.25.Final (#31232) Enable custom credentials for core REST tests (#31235) Move ESIndexLevelReplicationTestCase to test framework (#31243) Encapsulate Translog in Engine (#31220) HLRest: Add get index templates API (#31161) Remove all unused imports and fix CRLF (#31207) [Tests] Fix self-referencing tests [TEST] Fix testRecoveryAfterPrimaryPromotion [Docs] Remove mention pattern files in Grok processor (#31170) Use stronger write-once semantics for Azure repository (#30437) Don't swallow exceptions on replication (#31179) Limit the number of concurrent requests per node (#31206) Call ensureNoSelfReferences() on _agg state variable after scripted metric agg script executions (#31044) Move java version checker back to its own jar (#30708) [test] add fix for rare virtualbox error (#31212)
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to elastic#31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to elastic#31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to elastic#31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
Some of the docs were outdated as they did not mention that the limit is not per node. Also, The default value changed. Relates to #31206
With
max_concurrent_shard_requestswe used to throttle / limitthe number of concurrent shard requests a high level search request
can execute per node. This had several problems since it limited the
number on a global level based on the number of nodes. This change
now throttles the number of concurrent requests per node while still
allowing concurrency across multiple nodes.
Closes #31192