Skip to content

Finish query only after the results are fully consumed#13055

Merged
arhimondr merged 1 commit intotrinodb:masterfrom
arhimondr:finish-query-after-results-consumed
Jul 15, 2022
Merged

Finish query only after the results are fully consumed#13055
arhimondr merged 1 commit intotrinodb:masterfrom
arhimondr:finish-query-after-results-consumed

Conversation

@arhimondr
Copy link
Contributor

This commit makes it consistent with QueryState#FINISHED documentation.

This change is needed to allow spooling exchange to be used for query
results delivery. Spooling exchanges in fault tolerant execution are
closed on query completion. It is important not to close an exchange
before the results are fully consumed.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core engine

How would you describe this change to a non-technical end user or system administrator?

With this change a query is considered finished successfully only when the query results are successfully consumed by the client. Before it was possible that a query might've been marked as successful even when client experienced a failure or dropped a connection.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Core
* Queries are now marked as finished successfully only when the query results are successfully consumed by the client

@cla-bot cla-bot bot added the cla-signed label Jul 1, 2022
@arhimondr arhimondr requested a review from a team July 1, 2022 00:30
@arhimondr
Copy link
Contributor Author

Why this change is needed?

In fault tolerant execution we've been using a streaming (direct) exchange to deliver query results to coordinator. With #13025 query results will be delivered using a spooling exchange. Spooling exchanges are closed (wiped out) upon query completion (either success or a failure). It is important not to close a spooling exchange before the query results are fully consumed. To achieve that it is required to only transition a query to a FINISHED state when the results are fully consumed.

This change also makes the behavior consistent with the QueryState#FINISHED documentation:

* Query has finished executing and all output has been consumed.

@losipiuk
Copy link
Member

losipiuk commented Jul 1, 2022

What would happen if a client on purpose reads only a fraction of the result. I think this may be a common use case that a UI tool interacting with Trino (over JDBC or ODBC) just reads a fixed number of rows, up to some limit, even if the result is larger. Would query be marked as FAILED then in Trino? It should not be.

@arhimondr
Copy link
Contributor Author

@losipiuk it will be marked as CANCELED. The only difference is that with this change when results are not read completely the query will be marked as CANCELED always, and before this change it would be marked as FINISHED if all the stages finished running but there are still query results in the buffer.

@arhimondr arhimondr force-pushed the finish-query-after-results-consumed branch 4 times, most recently from 38c3d4c to 2827f7d Compare July 7, 2022 19:32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this should be separate commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I guess I do not fully understand this change. Can you explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a tricky one. For queries with no output stage (e.g.: DDL operations such as DROP TABLE) the exchange is still created. It has to be closed to allow query to finish, otherwise the client will be waiting of ExchangeClient#isBlocked and the query will never finish. The client is closed by the closeExchangeClientIfNecessary. However with this change the query will not transition to the FINISHED state before the results are consumed. So we have to close the ExchangeClient when a query is in FINISHING state to allow the client to pool the results and mark the query as resultsConsumed to allow it to finish.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but let's sync so I am sure I understand it correctly.

@arhimondr arhimondr force-pushed the finish-query-after-results-consumed branch from 2827f7d to ffc6dea Compare July 8, 2022 19:42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I prefered previous shape with explanation you put in the PR; maybe just convert it to comment in the code.
No we are closing exchange on ANY state change - which is weird. The question which arises why do we need to open it at all in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we are closing exchange on ANY state change - which is weird.

Yeah, it is weird. For DDL operations it can only be closed once query info is available (to figure out there is no output stage). Essentially this is simply to wait for the query info to become available.

The question which arises why do we need to open it at all in the first place.

I think it is mostly for convenience (creating ExchangeClient lazily is likely to add more complexity)

@arhimondr arhimondr force-pushed the finish-query-after-results-consumed branch from ffc6dea to 9caf607 Compare July 14, 2022 17:09
@arhimondr
Copy link
Contributor Author

Added a comment

@arhimondr arhimondr force-pushed the finish-query-after-results-consumed branch from 9caf607 to 9b212b4 Compare July 14, 2022 20:25
@arhimondr
Copy link
Contributor Author

CI: #1671

@arhimondr arhimondr force-pushed the finish-query-after-results-consumed branch from 9b212b4 to 7dd8cb5 Compare July 15, 2022 14:54
@losipiuk
Copy link
Member

please squash before merging.

This commit makes it consistent with QueryState#FINISHED documentation.

This change is needed to allow spooling exchange to be used for query
results delivery. Spooling exchanges in fault tolerant execution are
closed on query completion. It is important not to close an exchange
before the results are fully consumed.
@arhimondr arhimondr force-pushed the finish-query-after-results-consumed branch from 7dd8cb5 to 48ceab5 Compare July 15, 2022 16:33
@arhimondr arhimondr merged commit b3d38c3 into trinodb:master Jul 15, 2022
@arhimondr arhimondr deleted the finish-query-after-results-consumed branch July 15, 2022 18:01
@github-actions github-actions bot added this to the 391 milestone Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants