ai/live: Store and forward parameter updates. by j0sh · Pull Request #3682 · livepeer/go-livepeer

j0sh · 2025-07-18T06:30:40Z

This allows us to correctly handle cases where:

We haven't selected an orchestrator yet but the client sent a control API update.
We've swapped orchestrators.
We're mid-swap without an orchestrator.

In all these cases, the newly selected orchestrator will receive the most recent parameter update as soon as it's selected.

Also add a 1 MB limit for reading parameters.

This allows us to correctly handle cases where: * We haven't selected an orchestrator yet but the client sent a control API update. * We've swapped orchestrators. * We're mid-swap without an orchestrator. In all these cases, the newly selected orchestrator will receive the most recent parameter update as soon as it's selected. Also add a 1 MB limit for reading parameters.

codecov · 2025-07-18T06:42:05Z

Codecov Report

Attention: Patch coverage is 0% with 63 lines in your changes missing coverage. Please review.

Project coverage is 31.94626%. Comparing base (7ce6ca5) to head (e492fc3).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
server/ai_live_video.go	0.00000%	37 Missing ⚠️
server/ai_mediaserver.go	0.00000%	26 Missing ⚠️

Additional details and impacted files

@@                 Coverage Diff                 @@
##              master       #3682         +/-   ##
===================================================
- Coverage   31.97787%   31.94626%   -0.03161%     
===================================================
  Files            156         156                 
  Lines          47364       47414         +50     
===================================================
+ Hits           15146       15147          +1     
- Misses         31325       31374         +49     
  Partials         893         893

Files with missing lines	Coverage Δ
core/livepeernode.go	`60.22099% <ø> (ø)`
server/ai_mediaserver.go	`4.42145% <0.00000%> (-0.08911%)`	⬇️
server/ai_live_video.go	`0.00000% <0.00000%> (ø)`

... and 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ce6ca5...e492fc3. Read the comment docs.

Files with missing lines	Coverage Δ
core/livepeernode.go	`60.22099% <ø> (ø)`
server/ai_mediaserver.go	`4.42145% <0.00000%> (-0.08911%)`	⬇️
server/ai_live_video.go	`0.00000% <0.00000%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mjh1 · 2025-07-18T11:48:53Z

server/ai_live_video.go

 	}

+	// Send any cached control params in a goroutine outside the lock.
+	msg := sess.Params


Don't we need to lock when reading this as it gets updated in the UpdateLiveVideo() function? Also wondering if there's a race condition if UpdateLiveVideo() is called while we're applying the cached params here and we end up applying older cached params rather than latest.

Don't we need to lock when reading this as it gets updated in the UpdateLiveVideo() function?

The lock is acquired earlier in the function, but we don't want to hold it while doing the network write, hence the goroutine.

Also wondering if there's a race condition if UpdateLiveVideo() is called

Yep, it's not watertight. The raciness is also there within UpdateLiveVideo, since we could swap orchestrators and it could end up sending to the wrong orch.

We could:

Block all other trickle control messages (for all streams on the gateway, not just this one) while we send the message. That was the old behavior.

Add another per-stream mutex within the LivePipelines struct, but that could still block forward progress within the same stream.

Re-create the LivePipeline entry so we're not dealing with a shared object and a per-stream lock. (We'd still need the global LiveMu lock though.) That doesn't solve the fundamental problem, which is that there could be a param update while swapping orchs, or orch swap while updating params.

Do it like we do now and ignore the problem 🙈 Not great, but we can at least make forward progress, even if the result is that the param update didn't get applied. For our case, I think that is better than not being able to send params at all, because we're blocked by another orchestrator.

It should be rare for this to happen ... it will probably happen at scale, but we're not there yet.

This is the shape of the problem:

IMO, it's ok to accept this "race condition" for now. At this point, I think we have tons of other bugs, which are causing more issues. At some point, we'll need to address this one, but not a super high priority right now.

Yep totally ok with that. Thanks for the detail Josh, will be useful for later on

leszko · 2025-07-18T12:03:56Z

server/ai_mediaserver.go

 			},
 		}

+		registerControl(ctx, params)


With your change, I see that we'll mark params.node.LivePipelines sooner in the process. It will be set before we even start the Orchestrator discovery vs before we marked it after the discovery during the tickle subscription startup.

This will make inputStreamExists() return true sooner. And it will also mean that we if the discovery fails, we'll need to clean up params.node.LivePipelines. Have you checked these 2 scenarios? From the top of my head, the first scenario may be ok, but the 2nd was a problem in the past, because we used to not clean up correctly params.node.LivePipelines, when the segmentation failed before the Orchestrator discovery.

. It will be set before we even start the Orchestrator discovery vs before we marked it after the discovery during the tickle subscription startup.

Correct, because orch selection could be a bit delayed, but we still want to be able to receive parameter updates as soon as the stream is connected.

This will make inputStreamExists() return true sooner.

Correct ... that's technically more accurate behavior. Do you suspect an issue there? I am not sure that I see anything, but could be missing something.

And it will also mean that we if the discovery fails, we'll need to clean up params.node.LivePipelines.

Yes, that behavior has not changed. We do register sooner, but we haven't changed how clean up works. We still clean up LivePipelines after both of the following happen:

After orch selection completes (whether successfully or not)

After the input stream disconnects

BTW I just tested this to be sure and ... we do have a problem! It's not from this PR though. Basically, processStream never returns if there is an error selecting the first orchestrator, so we'd be stuck here forever and not make any progress in cleaning up the rest of the stack. Here's a fix for that: 0ca0c9e

Ahh, thanks for the fix.

I looked again at the code and I think it should fine.

leszko · 2025-07-21T08:11:17Z

server/ai_mediaserver.go

 			},
 		}

+		registerControl(ctx, params)


Ahh, thanks for the fix.

I looked again at the code and I think it should fine.

leszko · 2025-07-21T08:13:29Z

server/ai_live_video.go

 	}

+	// Send any cached control params in a goroutine outside the lock.
+	msg := sess.Params


IMO, it's ok to accept this "race condition" for now. At this point, I think we have tons of other bugs, which are causing more issues. At some point, we'll need to address this one, but not a super high priority right now.

j0sh requested review from leszko, mjh1 and victorges July 18, 2025 06:30

github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Jul 18, 2025

mjh1 reviewed Jul 18, 2025

View reviewed changes

leszko reviewed Jul 18, 2025

View reviewed changes

Unblock processStream if selecting the first orchestrator fails.

0ca0c9e

leszko approved these changes Jul 21, 2025

View reviewed changes

mjh1 approved these changes Jul 21, 2025

View reviewed changes

Merge branch 'master' into ja/resend-control

b98d2b0

j0sh enabled auto-merge (squash) July 21, 2025 16:19

Merge branch 'master' into ja/resend-control

e492fc3

j0sh merged commit b6e42bc into master Jul 21, 2025
17 of 18 checks passed

j0sh deleted the ja/resend-control branch July 21, 2025 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai/live: Store and forward parameter updates.#3682

ai/live: Store and forward parameter updates.#3682
j0sh merged 4 commits intomasterfrom
ja/resend-control

j0sh commented Jul 18, 2025

Uh oh!

codecov bot commented Jul 18, 2025 •

edited

Loading

Uh oh!

mjh1 Jul 18, 2025

Uh oh!

j0sh Jul 18, 2025

Uh oh!

leszko Jul 21, 2025

Uh oh!

mjh1 Jul 21, 2025

Uh oh!

leszko Jul 18, 2025

Uh oh!

j0sh Jul 18, 2025

Uh oh!

leszko Jul 21, 2025

Uh oh!

leszko Jul 21, 2025

Uh oh!

leszko Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

j0sh commented Jul 18, 2025

Uh oh!

codecov bot commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jul 18, 2025 •

edited

Loading