Skip to content

fix(ai): fix combined orchestrator capacity management#3389

Merged
ad-astra-video merged 10 commits intomasterfrom
fix-combined-orchestrator-capacity-management
Jun 10, 2025
Merged

fix(ai): fix combined orchestrator capacity management#3389
ad-astra-video merged 10 commits intomasterfrom
fix-combined-orchestrator-capacity-management

Conversation

@ad-astra-video
Copy link
Collaborator

What does this pull request do? Explain your changes. (required)

Fixes capacity managment for combined Orchestrator/AI workers using external containers.

Specific updates (required)

  • Updates CheckAICapacity to reserve the capacity at the Orchestrator
  • returns channel to release the capacity when the request is completed

How did you test each of these updates (required)

built go-livepeer and sent requests to use all capacity for combined and separate ai-workers.

Does this pull request close any open issues?

Checklist:

@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Feb 13, 2025

if !hasCapacity {
return false, nil
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since we're doing a return on the line above then we can remove the else and the extra indentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines +420 to +424
if pipeline == "live-video-to-video" {
orch.node.ReleaseAICapability(pipeline, modelID)
close(releaseCapacity)
return true, nil
}
Copy link
Member

@mjh1 mjh1 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just remove this because we already track capacity with this containers map. @victorges could you check?
We don't have any call to ReserveAICapability for live video so this call to ReleaseAICapability would cause the Capacity value to always increase and never decrease.
In my PR I was planning to just set the Capacity field based to the number of entries in the containers map for live video.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the updates in this function to keep same functionality for live-video-to-video using local ai-worker. This allowed removing this short circuit code noted above.

@mjh1
Copy link
Member

mjh1 commented May 8, 2025

@ad-astra-video ad-astra-video force-pushed the fix-combined-orchestrator-capacity-management branch from 0a9227d to c27a376 Compare May 29, 2025 05:09
@codecov
Copy link

codecov bot commented May 29, 2025

Codecov Report

Attention: Patch coverage is 46.66667% with 24 lines in your changes missing coverage. Please review.

Project coverage is 30.88542%. Comparing base (bf709ec) to head (032612e).

Files with missing lines Patch % Lines
server/ai_http.go 0.00000% 13 Missing ⚠️
core/ai_orchestrator.go 70.00000% 8 Missing and 1 partial ⚠️
server/rpc.go 0.00000% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3389         +/-   ##
===================================================
+ Coverage   30.83933%   30.88542%   +0.04609%     
===================================================
  Files            154         154                 
  Lines          45977       46012         +35     
===================================================
+ Hits           14179       14211         +32     
- Misses         30973       30975          +2     
- Partials         825         826          +1     
Files with missing lines Coverage Δ
server/rpc.go 66.74877% <0.00000%> (-0.16481%) ⬇️
core/ai_orchestrator.go 32.30944% <70.00000%> (+1.27900%) ⬆️
server/ai_http.go 9.60961% <0.00000%> (-0.13164%) ⬇️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf709ec...032612e. Read the comment docs.

Files with missing lines Coverage Δ
server/rpc.go 66.74877% <0.00000%> (-0.16481%) ⬇️
core/ai_orchestrator.go 32.30944% <70.00000%> (+1.27900%) ⬆️
server/ai_http.go 9.60961% <0.00000%> (-0.13164%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ad-astra-video
Copy link
Collaborator Author

@ad-astra-video could you check the compile errors? https://github.com/livepeer/go-livepeer/actions/runs/14384181138/job/40335296309?pr=3389

I fixed the issue specific to this PR. Looks like the runners are having other issues now tho.

@ad-astra-video
Copy link
Collaborator Author

After re-running a few times all actions are now passing.

@ad-astra-video ad-astra-video requested a review from mjh1 May 29, 2025 19:23
@ad-astra-video ad-astra-video merged commit f4e2234 into master Jun 10, 2025
17 of 18 checks passed
@ad-astra-video ad-astra-video deleted the fix-combined-orchestrator-capacity-management branch June 10, 2025 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Issues and PR related to the AI-video branch. go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants