Skip to content

Conversation

@davidpiegza
Copy link
Contributor

@davidpiegza davidpiegza commented Jun 17, 2025

Description

When using MAX_EXECUTION_TIME in a select query which includes a subquery, the optimizer hint gets lost when Vitess splits the query.

Example query:

select
  /*+ MAX_EXECUTION_TIME(1) */
  *
from
  customer
where
  id IN (
    select
      customer.id
    from
      customer
    where
      id > 10
      and id < 3000
  );

vexplain output:

{
	"OperatorType": "Route",
	"Variant": "Scatter",
	"Keyspace": {
		"Name": "main",
		"Sharded": true
	},
	"FieldQuery": "select id, fullname, nationalid, country from customer where 1 != 1",
	"Query": "select id, fullname, nationalid, country from customer where id in (select customer.id from customer where id \u003e 10 and id \u003c 3000)",
	"Table": "customer"
}

Query is missing the optimizer hint.

This PR sets the parsed comments in tryMergeSubqueryWithOuter in the operator so that the optimizer hint is not lost.

Please note: This is an initial attempt to address the issue described for the example query mentioned above. There may be additional cases where parsed comments are not properly set.

See also the added test case that demonstrates this issue.

Open questions

  • Should the change only set the specific MAX_EXECUTION_TIME comment?

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Jun 17, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jun 17, 2025
@github-actions github-actions bot added this to the v23.0.0 milestone Jun 17, 2025
@GrahamCampbell
Copy link
Contributor

GrahamCampbell commented Jun 17, 2025

I'm not so sure this change is right. Vitess forwards the timeout information already, based on the remaining time, and the query gets terminated. See #16735.

@arthurschreiber
Copy link
Member

I'm not so sure this change is right. Vitess forwards the timeout information already, based on the remaining time, and the query gets terminated. See #16735.

@GrahamCampbell I think you're mixing this up with QUERY_TIMEOUT_MS which is a Vitess specific directive. Vitess currently has no handling for the MySQL native MAX_EXECUTION_TIME from what I can see.

The problem this pull request tries to partially address is that we started using MAX_EXECUTION_TIME in many parts of our application to limit the execution time of database queries, and this often works as expected for Vitess and non Vitess clusters. But we noticed that on Vitess clusters, this only works fine as long as Vitess does not perform any bigger query rewrites.

There are a few issues with the way this is implemented. This currently copies all query comments, no matter whether they're applicable to the rewritten / split query or not, and as you mentioned, it does not keep track of the query timeout so queries that "execute later" will still have the same max execution time as the first query executed, which can lead to the query execution time not being honoured correctly.

I think a change we definitely should do is to only copy down the MAX_EXECUTION_TIME hint, and no other hints that might have been specified on the original query. That should ensure we don't accidentally copy hints that don't apply to the rewritten query.

And then eventually, Vitess should correctly handle MAX_EXECUTION_TIME by actually reducing the time set in later queries, and I guess we'll want to normalize the handling of MAX_EXECUTION_TIME and QUERY_TIMEOUT_MS.

@systay systay added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jun 24, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my thinking here is that we should push down all the comments, not just the MAX_EXECUTION_TIME one

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a test for a plan that is not merged into a single route. we want the comment to be there for the subquery if it's sent separately. this test is not showing how we handle comments on the subquery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@systay Thanks for the review! I added another test for a plan with a "complex" expression.

@codecov
Copy link

codecov bot commented Jun 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.52%. Comparing base (dedea93) to head (729549c).
Report is 32 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18369      +/-   ##
==========================================
+ Coverage   67.48%   67.52%   +0.03%     
==========================================
  Files        1603     1607       +4     
  Lines      262391   262687     +296     
==========================================
+ Hits       177067   177367     +300     
+ Misses      85324    85320       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@systay
Copy link
Collaborator

systay commented Jun 24, 2025

For the query test, you need to not compare id on both sides to make sure that the plan is not merged into a single query. The query I'm interested in testing would look something like this:

select /* comment */ user.col from user where foo IN (select id from user where id > 1 and id < 10)

notice that I'm doing where foo IN and not where id IN

@systay
Copy link
Collaborator

systay commented Jun 24, 2025

I forgot to start everything with -

Hi @davidpiegza!

Thank you for your contribution!

@davidpiegza
Copy link
Contributor Author

@systay Could I get another review? Not sure about the failing test, it also fails on main.

Signed-off-by: David Piegza <[email protected]>
This reverts commit 1d146f17fd60159073e681fd48ba04908284a451.

Signed-off-by: David Piegza <[email protected]>
@davidpiegza davidpiegza force-pushed the set-parsed-comments-in-op-for-subqueries branch from 7e3bb41 to 729549c Compare July 1, 2025 07:59
@systay systay merged commit efae039 into vitessio:main Jul 1, 2025
103 of 111 checks passed
@harshit-gangal
Copy link
Member

I'm not so sure this change is right. Vitess forwards the timeout information already, based on the remaining time, and the query gets terminated. See #16735.

@GrahamCampbell I think you're mixing this up with QUERY_TIMEOUT_MS which is a Vitess specific directive. Vitess currently has no handling for the MySQL native MAX_EXECUTION_TIME from what I can see.

The problem this pull request tries to partially address is that we started using MAX_EXECUTION_TIME in many parts of our application to limit the execution time of database queries, and this often works as expected for Vitess and non Vitess clusters. But we noticed that on Vitess clusters, this only works fine as long as Vitess does not perform any bigger query rewrites.

There are a few issues with the way this is implemented. This currently copies all query comments, no matter whether they're applicable to the rewritten / split query or not, and as you mentioned, it does not keep track of the query timeout so queries that "execute later" will still have the same max execution time as the first query executed, which can lead to the query execution time not being honoured correctly.

I think a change we definitely should do is to only copy down the MAX_EXECUTION_TIME hint, and no other hints that might have been specified on the original query. That should ensure we don't accidentally copy hints that don't apply to the rewritten query.

And then eventually, Vitess should correctly handle MAX_EXECUTION_TIME by actually reducing the time set in later queries, and I guess we'll want to normalize the handling of MAX_EXECUTION_TIME and QUERY_TIMEOUT_MS.

I agree to this. Vitess should just handle MAX_EXECUTION_TIME and treat it like QUERY_TIMEOUT_MS

morgo added a commit to morgo/vitess that referenced this pull request Jul 7, 2025
…tests

* origin/master: (32 commits)
  test: Fix race condition in TestStreamRowsHeartbeat (vitessio#18414)
  VReplication: Improve permission check logic on external tablets on SwitchTraffic (vitessio#18348)
  Perform post copy actions in atomic copy (vitessio#18411)
  Update `operator.yaml` (vitessio#18364)
  Feature(onlineddl): Add shard-specific completion to online ddl (vitessio#18331)
  Set parsed comments in operator for subqueries (vitessio#18369)
  `vtorc`: move shard primary timestamp to time type (vitessio#18401)
  `vtorc`: rename `isClusterWideRecovery` -> `isShardWideRecovery` (vitessio#18351)
  `vtorc`: remove dupe keyspace/shard in replication analysis (vitessio#18395)
  Topo: Add NamedLock test for zk2 and consul and get them passing (vitessio#18407)
  Handle MySQL 9.x as New Flavor in getFlavor() (vitessio#18399)
  Add support for sending grpc server backend metrics via ORCA (vitessio#18282)
  asthelpergen: add design documentation (vitessio#18403)
  `vtorc`: add keyspace/shard labels to recoveries stats (vitessio#18304)
  `vtorc`: cleanup `database_instance` location fields (vitessio#18339)
  avoid derived tables for UNION when possible (vitessio#18393)
  [Bugfix] Broken Heartbeat system in Row Streamer (vitessio#18390)
  Update MAINTAINERS.md (vitessio#18394)
  move vmg to emeritus (vitessio#18388)
  Make sure to check if the server is closed in etcd2topo (vitessio#18352)
  ...
davidpiegza added a commit to github/vitess-gh that referenced this pull request Jul 16, 2025
arthurschreiber added a commit to github/vitess-gh that referenced this pull request Aug 1, 2025
…thub

Set parsed comments in operator for subqueries (vitessio#18369)
mhamza15 pushed a commit to github/vitess-gh that referenced this pull request Aug 7, 2025
mhamza15 pushed a commit to github/vitess-gh that referenced this pull request Aug 7, 2025
mhamza15 pushed a commit to github/vitess-gh that referenced this pull request Sep 3, 2025
mhamza15 added a commit to github/vitess-gh that referenced this pull request Sep 3, 2025
* Set parsed comments in operator for subqueries (vitessio#18369)

Signed-off-by: David Piegza <[email protected]>

* rerun ci

Signed-off-by: Mohamed Hamza <[email protected]>

* remove skip_e2e

Signed-off-by: Mohamed Hamza <[email protected]>

* fix test cases

---------

Signed-off-by: David Piegza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Co-authored-by: David Piegza <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants