Set optimizer.force-single-node-output to false by default#13217
Set optimizer.force-single-node-output to false by default#13217arhimondr merged 2 commits intotrinodb:masterfrom
Conversation
b1d550f to
87056bb
Compare
Stage execution could remain in the same SCHEDULING state while already running some tasks and not transition the state further until more tasks are scheduled.
With this setting enabled the optimizer will insert an additional exchange to make sure the coordinator always consumes query results from a single task. The exchange client used by the coordinator can consume results from as many tasks and stages as needed. The setting was introduced in 2017 by trinodb@6143485 as a transitional setting to migrate from introducing an additional exchange but has never been set to false by default.
87056bb to
32d1f29
Compare
|
Will it cause coordinator to perform actual computations (e.g. final aggregations?). If so, then IMO we should keep it as is. |
I agree. @arhimondr can you verify that after the change, no computations (aggregation, non-identity projection, filter, etc.) get scheduled on the coordinator? |
|
@sopel39 @findepi When the However coordinator can consume query results from any number of nodes with no issues. Here's how the plan looks like for an identical query when For queries with the top most stage of Here is the plan when the |
|
The |
| ImmutableMultimap.of()); | ||
| stageExecution.schedulingComplete(); | ||
| remoteTask.ifPresent(task -> coordinatorTaskManager.addSourceTaskFailureListener(task.getTaskId(), failureReporter)); | ||
| if (queryStateMachine.getQueryState() == STARTING && remoteTask.isPresent()) { |
There was a problem hiding this comment.
The order of states is STARTING -> RUNNING -> FINSHING -> FINISHED | FAILED. If it's already in a later stage there's no need to transition
There was a problem hiding this comment.
Technically you do not need if (queryStateMachine.getQueryState() == STARTING) at all as queryStateMachine.transitionToRunning does the check (it does queryState.setIf(RUNNING, currentState -> currentState.ordinal() < RUNNING.ordinal());)
losipiuk
left a comment
There was a problem hiding this comment.
second commit ok. I need some explanation on the first one.
| ScheduleResult result = stageSchedulers.get(stageExecution.getStageId()) | ||
| .schedule(); | ||
|
|
||
| if (stateMachine.getState() == DistributedStagesSchedulerState.PLANNED && stageExecution.getAllTasks().size() > 0) { |
There was a problem hiding this comment.
same here: if (stateMachine.getState() == DistributedStagesSchedulerState.PLANNED is not needed
There was a problem hiding this comment.
I wanted to avoid calling stageExecution.getAllTasks() every time as it has to acquire a list of tasks under a lock and made the similar code in the coordinator scheduler consistent with this. Likely it won't be a problem, but at the same time I wonder if it introduces enough confusion to justify extra CPU cycles?
Description
With this setting enabled the optimizer will insert an additional exchange
to make sure the coordinator always consumes query results from a single
task. The exchange client used by the coordinator can consume results from
as many tasks and stages as needed. The setting was introduced in 2017
by 6143485
as a transitional setting to migrate from introducing an additional
exchange but has never been set to false by default.
Improvement
Core engine
Non user visiblle
Related issues, pull requests, and links
6143485
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: