Skip to content

Conversation

@siddharth16396
Copy link
Contributor

@siddharth16396 siddharth16396 commented Jul 15, 2025

Description

This PR introduces a comprehensive incoming query throttling system for Vitess tablets outlined in RFC #18412 with the following key components:

  • Core throttling framework - IncomingQueryThrottler with strategy pattern for pluggable throttling algorithms
  • Configuration system - File-based config loader reading from /config/throttler-config.json with enabled/disabled flag and strategy selection
  • Strategy implementations - NoOp strategy (default) and Cinnamon strategy placeholder with lifecycle management.
  • Background config refresh - Automatic configuration reloading every minute with strategy switching.
  • Comprehensive testing - Full test coverage for all components including error scenarios and lifecycle management.

The system supports multiple throttling strategies (TabletThrottler, Cinnamon) and can be dynamically reconfigured without restart. The implementation follows Vitess patterns and integrates with the existing throttle client infrastructure.

Tests

  • Unit tests for all new components (IncomingQueryThrottler, FileBasedConfigLoader, strategies)
  • Test strategy lifecycle management and switching
  • Test configuration loading error scenarios
  • Test concurrent access patterns with mutex protection
  • Verify integration with existing throttle client

Ref: #18412

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

…ry throttler

This commit introduces the core interfaces and configuration structures
for the incoming query throttler system as outlined in RFC vitessio#18412.

Key additions:
- ThrottlingStrategyHandler interface for implementing throttling
strategies
- Config struct with support for multiple throttling strategies:
    - TabletThrottler (Vitess native tablet throttler)
    - Cinnamon (Uber's load-shedding system)
- ConfigLoader interface for dynamic configuration loading

This establishes the architectural foundation for query-level throttling that will be built upon in subsequent commits.

Ref: vitessio#18412

Signed-off-by: siddharth16396 <[email protected]>
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Jul 15, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jul 15, 2025
@github-actions github-actions bot added this to the v23.0.0 milestone Jul 15, 2025
@codecov
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

❌ Patch coverage is 90.05525% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.54%. Comparing base (770ae74) to head (13cb472).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
...let/tabletserver/querythrottler/query_throttler.go 88.31% 9 Missing ⚠️
go/vt/vttablet/tabletserver/query_executor.go 33.33% 4 Missing ⚠️
...t/tabletserver/querythrottler/registry/registry.go 92.50% 3 Missing ⚠️
go/vt/vttablet/tabletserver/tabletserver.go 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18449      +/-   ##
==========================================
+ Coverage   67.51%   67.54%   +0.03%     
==========================================
  Files        1607     1613       +6     
  Lines      263357   263742     +385     
==========================================
+ Hits       177815   178155     +340     
- Misses      85542    85587      +45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add file-based configuration loader for the incoming query throttler
system.
This implementation reads throttler configuration from a JSON file at
`/config/throttler-config.json` and supports the Config struct with
enabled flag and throttling strategy selection.

Key features:
- Implements ConfigLoader interface for file-based config loading
- Reads from fixed path /config/throttler-config.json
- Supports JSON unmarshaling of Config struct with enabled/strategy
  fields
  - Comprehensive error handling for file read and JSON parsing failures
  - Extensive test coverage with stubbed dependencies using gostub
      library

      The FileBasedConfigLoader provides a foundation for loading
      throttler  configuration from external files, supporting the
      broader incoming query  throttling system architecture.

      Related to: vitessio#18412

Signed-off-by: siddharth16396 <[email protected]>
Add file-based configuration loader for the incoming query throttler
system.
This implementation reads throttler configuration from a JSON file at
`/config/throttler-config.json` and supports the Config struct with
enabled flag and throttling strategy selection.

Key features:
- Implements ConfigLoader interface for file-based config loading
- Reads from fixed path /config/throttler-config.json
- Supports JSON unmarshaling of Config struct with enabled/strategy fields
- Comprehensive error handling for file read and JSON parsin failures
- Extensive test coverage with stubbed dependencies using gostub library

The FileBasedConfigLoader provides a foundation for loading throttler  configuration from external files, supporting the broader incoming query  throttling system architecture.

Related to: vitessio#18412

Signed-off-by: siddharth16396 <[email protected]>
This commit introduces a new incoming query throttling system for Vitess
tablets with the following key components:

Core Implementation:

* IncomingQueryThrottler: Main throttler with configurable strategies and
automatic config refresh
* Strategy pattern with ThrottlingStrategyHandler interface for pluggable throttling algorithms
* NoOpStrategy: Default no-operation strategy
* CinnamonStrategy: Placeholder for future Cinnamon throttling implementation

Key Features:

* Automatic configuration refresh every minute via background goroutine
* Thread-safe strategy switching with proper lifecycle management (start/stop)
* Integration with existing Vitess throttle infrastructure via throttle.Client
* Graceful shutdown with resource cleanup

Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
type FileBasedConfigLoader struct{}

// NewFileBasedConfigLoader creates a new instance of FileBasedConfigLoader with the given file path.
func NewFileBasedConfigLoader() *FileBasedConfigLoader {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a TopoServerConfig loader as well, but that will be a seperate PR by another team mate

case ThrottlingStrategyCinnamon:
return &CinnamonStrategy{}
case ThrottlingStrategyTabletThrottler:
fallthrough
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be implemented as a seperate PR by me.

@siddharth16396 siddharth16396 changed the title [IncomingQueryThrottler] Add foundational interfaces for throttler [IncomingQueryThrottler] Add foundation code for throttling incoming queries Jul 20, 2025
@siddharth16396 siddharth16396 changed the title [IncomingQueryThrottler] Add foundation code for throttling incoming queries [IncomingQueryThrottler] Implement Incoming Query Throttler with Strategy Pattern Jul 20, 2025

var (
_ ConfigLoader = (*FileBasedConfigLoader)(nil)
_osReadFile = os.ReadFile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not update global state like this. Similar to below, we should pass in configuration like this and also then we don't need to mock / change _osReadFile.

That we add a dependency in go.mod here is a smell I think and also points at these problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've incorporated the changes and used dependency injection for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this? There is no implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it, the implementation was uber specific code that couldn't be open sourced.

Comment on lines 5 to 31
type ThrottlingStrategy string

// Predefined throttling strategies for the IncomingQueryThrottler.
const (
// ThrottlingStrategyTabletThrottler uses Vitess Tablet Throttler to shed load
// from incoming queries when the tablet is under pressure.
// Reference: https://vitess.io/docs/21.0/reference/features/tablet-throttler/
ThrottlingStrategyTabletThrottler ThrottlingStrategy = "TabletThrottler"

// ThrottlingStrategyCinnamon uses Uber's Cinnamon load-shedding system
// to regulate incoming queries under high load conditions.
// Reference: https://www.uber.com/en-IN/blog/cinnamon-using-century-old-tech-to-build-a-mean-load-shedder/
ThrottlingStrategyCinnamon ThrottlingStrategy = "Cinnamon"

// ThrottlingStrategyUnknown is used when the strategy is not known.
ThrottlingStrategyUnknown ThrottlingStrategy = "Unknown"
)

// Config defines the runtime configuration for the IncomingQueryThrottler.
// It specifies whether throttling is enabled and which strategy to use.
type Config struct {
// Enabled indicates whether the throttler should actively apply throttling logic.
Enabled bool `json:"enabled"`

// Strategy selects which throttling strategy should be used.
Strategy ThrottlingStrategy `json:"strategy"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this we can do factory based. where each implementation registers to the factory and then based on the config will load that strategy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done, this strategy is just a enum now.

Comment on lines 16 to 24
var (
_configRefreshTicket = time.NewTicker(1 * time.Minute)
)

type IncomingQueryThrottler struct {
ctx context.Context
throttleClient *throttle.Client
mu sync.RWMutex
// cfg holds the current configuration for the throttler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use the global ticker here and pass the timer in constructor.
This way we will also remove the usage of gostub in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taken care of now.

Comment on lines 104 to 107
newCfg, err := i.cfgLoader.Load(i.ctx)
if err != nil {
log.Errorf("Error loading config: %v", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error loading the config, we cannot continue with this. It will panic with nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, i will be taking care of this

@harshit-gangal
Copy link
Member

"EnforceThrottlingIfNodeOverloaded" is called from
tabletserver.Execute. I will add this to this diff. We can plug this into QueryExecutor as well but we've plugged it into tabletserver.execute for now. I would love to know your thoughts on it

Both places have pros and cons. The way you call it now in execute() means that if the query is throttled, then you're saving all the query executor/plan computation.

On the other hand, the query plan parses the SQL statement, something that you're going to do as well. So there's waste of computation if the query does not get throttled.

I'm thinking the best solution would be to generate the plan, then use TabletPlan.FullStmt which is a sqlparser.Statement, as input to EnforceThrottlingIfNodeOverloaded. @harshit-gangal your opinion?

I think it make sense that we perform query throttling in query_executor. We anyways cache the plan so we would not be planning the queries under load as well.

@siddharth16396
Copy link
Contributor Author

siddharth16396 commented Sep 8, 2025

I think it make sense that we perform query throttling in query_executor. We anyways cache the plan so we would not be planning the queries under load as well.

@harshit-gangal : Just to be clear, are you saying we move the call to EnforceThrottlingIfNodeOverloaded inside of query_executor.go ?

Or can we keep it in tabletserver.go with the TabletPlan.FullStmt change and call it before calling the qre.Execute() ?

@harshit-gangal
Copy link
Member

stmtRules

Moving this to query_executor.go makes sense, and checking to throttle there is fine.
See how we use TxThrottler over there. Can we just name the method Throttle? The extra descriptive naming doesn’t seem necessary; the throttler implementation should determine when to throttle.

Regarding your Evaluate code: that logic should be done outside the hot path. The hot path call should return from a cached value instead of evaluating on the fly.

@siddharth16396
Copy link
Contributor Author

I checked the Plan struct

// Plan contains the parameters for executing a request.
type Plan struct {
	PlanID PlanType
	// When the query indicates a single table
	Table *schema.Table
	// This indicates all the tables that are accessed in the query.
	AllTables []*schema.Table

	// Permissions stores the permissions for the tables accessed in the query.
	Permissions []Permission

	**// FullQuery will be set for all plans.
	FullQuery *sqlparser.ParsedQuery**

	// NextCount stores the count for "select next".
	NextCount evalengine.Expr

	// WhereClause is set for DMLs. It is used by the hot row protection
	// to serialize e.g. UPDATEs going to the same row.
	WhereClause *sqlparser.ParsedQuery

	**// FullStmt can be used when the query does not operate on tables
	FullStmt sqlparser.Statement**

	// NeedsReservedConn indicates at a reserved connection is needed to execute this plan
	NeedsReservedConn bool
}

This says // FullStmt can be used when the query does not operate on tables

So should we be using FullQuery instead of FullStmt? @shlomi-noach

I'm thinking the best solution would be to generate the plan, then use TabletPlan.FullStmt which is a sqlparser.Statement, as input to EnforceThrottlingIfNodeOverloaded. @harshit-gangal your opinion?

@siddharth16396
Copy link
Contributor Author

Also, to move the EnforceThrottlingIfNodeOverloaded function to query_executor, i will have to add ctx , transactionID to QueryExecutor struct, which seems safe but still giving a heads-up

@siddharth16396
Copy link
Contributor Author

Incorporated all the requested changes @harshit-gangal | @shlomi-noach

return nil, err
}

if reqThrottledErr := qre.tsv.queryThrottler.Throttle(qre.ctx, qre.targetTabletType, qre.plan.FullQuery, qre.logStats.TransactionID, qre.options); reqThrottledErr != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use connID

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connID and TransactionID is same

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor last styling comment, otherwise good to go!

Comment on lines 94 to 95
tCfg := qt.cfg
tStrategy := qt.strategy
Copy link
Contributor

@shlomi-noach shlomi-noach Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an intentional reason why these two values are copied?

  • qt.cfg is not a pointer, the entire config is duplicated. But I don't see the need.
  • qt.strategy is an interface so there's no additional cost, but then why not just decision := qt.strategy.Evaluate(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a reason initially because we used to have read locks on this, but that caused performance degradations so we removed the locks and so we can remove the variable assignments. let me incorp that.

Thanks for pointing out.

removing unnecessary variable assignments

Co-authored-by: Shlomi Noach <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
@shlomi-noach shlomi-noach enabled auto-merge (squash) September 14, 2025 17:20
auto-merge was automatically disabled September 14, 2025 17:29

Head branch was pushed to by a user without write access

@siddharth16396
Copy link
Contributor Author

@shlomi-noach : bd07b5a added the same comment twice, so had to remove it 😅

@siddharth16396 siddharth16396 changed the title [QueryThrottler] Implement Query Throttler with Strategy Pattern [QueryThrottler] Implement Query Throttler Part #1 Sep 14, 2025
@siddharth16396 siddharth16396 changed the title [QueryThrottler] Implement Query Throttler Part #1 [QueryThrottler] Implement Query Throttler [Part 1] Sep 14, 2025
@siddharth16396 siddharth16396 changed the title [QueryThrottler] Implement Query Throttler [Part 1] [QueryThrottler] Implement Query Throttler Sep 14, 2025
@shlomi-noach shlomi-noach enabled auto-merge (squash) September 15, 2025 05:10
@shlomi-noach shlomi-noach merged commit 384b553 into vitessio:main Sep 15, 2025
103 of 106 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Throttler Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants