Skip to content

feat(matching): Provide DynamicConfig option to override RPS of a specific TaskList#7557

Merged
natemort merged 5 commits intocadence-workflow:masterfrom
joannalauu:feat/override-tasklist-rps
Jan 2, 2026
Merged

feat(matching): Provide DynamicConfig option to override RPS of a specific TaskList#7557
natemort merged 5 commits intocadence-workflow:masterfrom
joannalauu:feat/override-tasklist-rps

Conversation

@joannalauu
Copy link
Contributor

@joannalauu joannalauu commented Dec 22, 2025

Provide new DynamicConfig option matching.overrideTaskListRps to override the RPS of a specific TaskList in TaskListLimiter

Why?

  • During incident mitigation we may want to rapidly increase/decrease the RPS value of a TaskList. Providing a DynamicConfig to manipulate RPS can reduce remediation time.

Part of: #7296

How did you test it?

  • Unit tests

Potential risks

  • The system cannot adapt dynamically to changing load while the override is active, so it may impact system reliability if not carefully managed

Release notes

Documentation Changes

@joannalauu joannalauu force-pushed the feat/override-tasklist-rps branch from 40756e3 to 1e627cb Compare December 22, 2025 19:37
@joannalauu joannalauu changed the title Provide DynamicConfig option to override RPS of a specific TaskList feat(matching): Provide DynamicConfig option to override RPS of a specific TaskList Dec 22, 2025
Copy link
Member

@natemort natemort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! I'm going to have one other colleague review it, but this is great. The tests are solid


// Try to update if rps < current, ttl elapsed,
// or overrideRPS is set and different from current
if !(overrideRPS != 0 && overrideRPS == current) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to:

overrideRps == 0 || overrideRps != current

which is a bit simpler to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we don't expect, but shouldn't we protect against negative values? Instead of using == 0 maybe using <= 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treating negative values the same as zero seems reasonable, yeah. Good call.

// new value
if rps < current || (ttlElapsed && !l.lastReceivedValue.Load().After(now.Add(-l.ttl))) {
newRPS := current
if overrideRPS != 0 && overrideRPS != current {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use overrideRPS > 0

}

func (l *taskListLimiter) ReportLimit(rps float64) {
overrideRPS := l.overrideRPS()
Copy link
Member

@davidporter-id-au davidporter-id-au Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to more or less echo what Felipe and Nate were saying, may I suggest flattening the choice of using the passed in RPS or the dynconfig override RPS?

One way to do this is to override the rps float:

func (l *taskListLimiter) ReportLimit(rps float64) {
    // at the start, just figure out what the RPS value should be
	overrideRPS := l.overrideRPS()
    if overrideRPS > 0 {
        rps = overrideRPS
     }

     // rest of the function remains the same I think?

The reason for doing so is that if/else branching multiplies the codepaths that each execution could take, making it harder for programmers - looking at the code- to know which one it took, or which one is even valid.

I say this with some acknowledgement upfront that I've not looked at this code for a while and I might be missing some important context, so feel free to disagree.

Copy link
Member

@davidporter-id-au davidporter-id-au Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah... I did a little more reading of the code here and I noticed a possible problem: The setting of the limit here will only take effect if the maxDispatchPersecond is already set by the customer, implying that this limit cannot be applied if they've added that config already.

May I suggest, a second thing, which, I know is annoying, but which would address this:

If you pull this override check up one level, into taskListManagerImpl, perhaps the getTask() method, there you can check the override pass it in without adding too much complexity to the RPS values.

eg:

    rps := 0
	if maxDispatchPerSecond != nil {
		rps = maxDispatchPerSecond
	}
   rpsOverride := l.overrideRPS()
   if rpsOverride > 0 {
       rps = rpsOverride
   }
  if tasklistRPS > 0 {
     c.limiter.ReportLimit(rps)  // unrelated to your change, the name 'Report' here is really confusing 😿. I guess we should change it at some point. 
   }

Copy link
Member

@natemort natemort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your work on this, I'll approve this without the changes to task_list_limiter.go.

if now.Sub(l.lastUpdate.Load()) < l.ttl {
current := l.value.Load()
if rps > current {
if rps > current && (l.overrideRPS() <= 0 || rps != l.overrideRPS()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change here is the most correct way to implement this, but I think for the sake of simplicity we don't need to add the override check here. The override applying to the incoming values is good enough and defines a very clear contract of how the components work.

If the override is smaller than the current RPS then it'll immediately take, and if it's larger then it'll take after the TTL.

Having a delay of up to a minute (the default TTL) isn't a big deal, the time for a user to react and make the config change is significantly higher.

scope metrics.Scope
ttl time.Duration
minBurst int
overrideRPS func() float64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we'll be using it only within task_list_manager.go we no longer need to add this here. I think this file ultimately will be unchanged, apologies for the iteration. I think David's proposal (applying it to incoming values) for how to structure it is better than mine.

return result
}

func TestTaskListUsesOverrideRPS(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, nice work on the tests. This code is not easy to test.

overrideRPS: 25.0,
maxDispatchPerSecond: common.Float64Ptr(100.0),
expectedRPS: 25.0,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could also consider testing the default case (no maxDispatchPerSecond, no overrideRps) and the override only case (no maxDispatchPerSecond, overrideRps > 0).

… `taskListManagerImpl`

Signed-off-by: Joanna Lau <118241363+joannalauu@users.noreply.github.com>
Signed-off-by: Joanna Lau <118241363+joannalauu@users.noreply.github.com>
…tRPS`

Signed-off-by: Joanna Lau <118241363+joannalauu@users.noreply.github.com>
Signed-off-by: Joanna Lau <118241363+joannalauu@users.noreply.github.com>
Signed-off-by: Joanna Lau <118241363+joannalauu@users.noreply.github.com>
@joannalauu joannalauu force-pushed the feat/override-tasklist-rps branch from 590996f to 0dcb767 Compare January 1, 2026 20:42
@joannalauu joannalauu requested a review from natemort January 2, 2026 09:04
@natemort natemort merged commit 89362f9 into cadence-workflow:master Jan 2, 2026
41 of 42 checks passed
@joannalauu joannalauu deleted the feat/override-tasklist-rps branch January 3, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants