Fix Concurrent Index Auto Create Failing Bulk Requests#82541
Fix Concurrent Index Auto Create Failing Bulk Requests#82541original-brownbear merged 3 commits intoelastic:masterfrom original-brownbear:fix-auto-create-system-index-concurrency
Conversation
Batching these requests introduced a bug where auto-create requests for system indices would fail because system indices are always auto-created and thus throw resource-already-exists if multiple equal ones are batched together even though the index doesn't yet exist in the cluster state but only in the intermediary task executor state. This leads to bulk requests ignoring the exeception (thinking that the index already exists) in their auto-create callback when the request doesn't yet exist. Fixed by deduplicating these requests for now, added a TODO to do it a little nicer down the road but this fix is somewhat urgent as it breaks ML integ tests.
|
Pinging @elastic/es-data-management (Team:Data Management) |
| } | ||
| } | ||
| state = allocationService.reroute(state, "auto-create"); | ||
| if (state != currentState) { |
There was a problem hiding this comment.
small optimization in case the full batch doesn't do anything no need to reroute.
There was a problem hiding this comment.
Could we move this to the clusterStatePublished event and pass it off to ClusterService#rerouteService for batching with other pending reroutes too?
There was a problem hiding this comment.
I'm not sure about that. This adds one more CS update if no reroutes are pending and saves a reroute every now and then. A reroute is pretty fast these days though. With all the fixes we added recently I'm not sure it's worth it in 8.1.
There was a problem hiding this comment.
Hmm it's not trivially fast in a large cluster still so I think we still prefer batching. Yes it's one more publication in an otherwise-quiet cluster but that's not really where the problems arise.
| ClusterTasksResult.Builder<CreateIndexTask> builder = ClusterTasksResult.builder(); | ||
| ClusterState state = currentState; | ||
| for (AckedClusterStateUpdateTask task : tasks) { | ||
| final Map<CreateIndexRequest, CreateIndexTask> successfulRequests = new HashMap<>(tasks.size()); |
There was a problem hiding this comment.
NIT: this and line 109 are good candidates to use var to simplify declaration
DaveCTurner
left a comment
There was a problem hiding this comment.
Looks ok but I left some thoughts and suggestions.
| } | ||
| } | ||
| state = allocationService.reroute(state, "auto-create"); | ||
| if (state != currentState) { |
There was a problem hiding this comment.
Could we move this to the clusterStatePublished event and pass it off to ClusterService#rerouteService for batching with other pending reroutes too?
| private final class CreateIndexTask extends AckedClusterStateUpdateTask { | ||
|
|
||
| final CreateIndexRequest request; | ||
| final AtomicReference<String> indexNameRef; |
There was a problem hiding this comment.
Does this need to be an AtomicReference or could it just be a volatile field?
There was a problem hiding this comment.
Annoyingly enough yes, with the weird way the listener is constructed here it currently seems like we need this.
| request.masterNodeTimeout(), | ||
| request.timeout(), | ||
| false | ||
| private final class CreateIndexTask extends AckedClusterStateUpdateTask { |
There was a problem hiding this comment.
Are we getting much benefit from the extends AckedClusterStateUpdateTask here? I think if we dropped that and just implemented AckedClusterStateTaskListener directly we wouldn't need to construct the listener up front and could probably solve the TODO about deduplicating the listeners.
There was a problem hiding this comment.
++ I agree, the main motivation of this PR is to unblock some broken tests quickly (also in ML QA). Are we good doing that kind of refactoring in a follow-up? :)
There was a problem hiding this comment.
Sure, maybe just leave a TODO here tho.
There was a problem hiding this comment.
TODO added!
…stem-index-concurrency
|
Thanks David + Ievgen! |
Batching these requests introduced a bug where auto-create requests for system
indices would fail because system indices are always auto-created and thus
throw resource-already-exists if multiple equal ones are batched together
even though the index doesn't yet exist in the cluster state but only
in the intermediary task executor state.
This leads to bulk requests ignoring the exeception (thinking that the index
already exists) in their auto-create callback when the request doesn't yet
exist.
Fixed by deduplicating these requests for now, added a TODO to do it a little
nicer down the road but this fix is somewhat urgent as it breaks ML integ
tests.
Non-issue as this hasn't been released yet.
relates #82159