Sending operations concurrently in peer recovery#58018
Sending operations concurrently in peer recovery#58018dnhatn merged 14 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-distributed (:Distributed/Recovery) |
server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java
Outdated
Show resolved
Hide resolved
DaveCTurner
left a comment
There was a problem hiding this comment.
I think this is a little risky; I've seen cases where it appeared that the load from recovery phase 2 already has a performance impact on indexing into other active shards on the target node. It's certainly nice to have the option of more concurrency in phase 2 but I think we should be cautious about doing so by default.
I'll link such a case to this PR.
original-brownbear
left a comment
There was a problem hiding this comment.
One question on the mechanics of this, maybe I'm missing something here.
server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java
Outdated
Show resolved
Hide resolved
original-brownbear
left a comment
There was a problem hiding this comment.
Looks good to me technically :)
I share Davids concern though. Even more so if we start moving the handler for this to WRITE which under load could cause additional trouble from more rejections in a case like the one David linked. Maybe we should just default to the current behavior for now and make this optional?
|
@DaveCTurner @original-brownbear Thanks for reviewing. I've added a new recovery setting that controls the number of operation chunks sent in parallel. Could you please take another look? |
|
@dnhatn will take a look soon, but FYI there's some random CS failure here in |
DaveCTurner
left a comment
There was a problem hiding this comment.
Docs & naming LGTM; I haven't reviewed the code change in detail but I think Armin did.
|
run elasticsearch-ci/2 |
|
@elasticmachine update branch |
|
@original-brownbear @DaveCTurner Thanks for reviewing. |
Today, we send operations in phase2 of peer recoveries batch by batch sequentially. Normally that's okay as we should have a fairly small of operations in phase 2 due to the file-based threshold. However, if phase1 takes a lot of time and we are actively indexing, then phase2 can have a lot of operations to replay. With this change, we will send multiple batches concurrently (defaults to 1) to reduce the recovery time. Backport of #58018
Today, we send operations in phase2 of peer recoveries batch by batch sequentially. Normally that's okay as we should have a fairly small of operations in phase 2 due to the file-based threshold. However, if phase1 takes a lot of time and we are actively indexing, then phase2 can have a lot of operations to replay.
With this change, we will send multiple batches concurrently (defaults to 1) to reduce the recovery time.
Relates #58011