INF-314 AI prometheus metrics to set correct "pipeline" and "model_name" labels by pwilczynskiclearcode · Pull Request #3699 · livepeer/go-livepeer

pwilczynskiclearcode · 2025-08-07T10:40:57Z

What does this pull request do? Explain your changes. (required)
Normalising AI metrics to have correct pipeline, model_name and orchestratorUri tags.

Specific updates (required)

ai_container_in_use to set correct mode_name label (orchestratorUri got removed)
ai_container_idle to set correct mode_name label
ai_gpus_idle to set mode_name label (orchestratorUri got removed)
ai_current_live_pipelines to set model_name label (orchestratorUri got removed)
ai_live_attempt to introduce model_name label

How did you test each of these updates (required)
...

Does this pull request close any open issues?
no

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
README and other documentation updated
Pending changelog updated

linear · 2025-08-07T10:41:00Z

INF-314 implement missing pipeline label in go-livepeer metrics

monitor/census.go

codecov · 2025-08-07T13:29:18Z

Codecov Report

❌ Patch coverage is 28.94737% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.90880%. Comparing base (531a043) to head (c3a5874).

Files with missing lines	Patch %	Lines
monitor/census.go	20.83333%	19 Missing ⚠️
server/ai_mediaserver.go	0.00000%	5 Missing ⚠️
ai/worker/docker.go	66.66667%	3 Missing ⚠️

Additional details and impacted files

@@                 Coverage Diff                 @@
##              master       #3699         +/-   ##
===================================================
- Coverage   31.92117%   31.90880%   -0.01237%     
===================================================
  Files            156         156                 
  Lines          47445       47454          +9     
===================================================
- Hits           15145       15142          -3     
- Misses         31405       31417         +12     
  Partials         895         895

Files with missing lines	Coverage Δ
ai/worker/docker.go	`72.79412% <66.66667%> (-0.18382%)`	⬇️
server/ai_mediaserver.go	`4.66853% <0.00000%> (ø)`
monitor/census.go	`61.04053% <20.83333%> (-0.45582%)`	⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 531a043...c3a5874. Read the comment docs.

Files with missing lines	Coverage Δ
ai/worker/docker.go	`72.79412% <66.66667%> (-0.18382%)`	⬇️
server/ai_mediaserver.go	`4.66853% <0.00000%> (ø)`
monitor/census.go	`61.04053% <20.83333%> (-0.45582%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ai/worker/docker.go

leszko

Looks good. My only concern is that we all the time assuming that the O is running a single pipeline. It doesn't need to be in this PR, but we can think how to make it working for multiple pipelines. On other words, how to make GetCapacity() working with the pipeline param.

leszko · 2025-08-08T06:53:01Z

ai/worker/docker.go

-		monitor.AIContainersInUse(capacity.ContainersInUse, "", "")
-		monitor.AIContainersIdle(capacity.ContainersIdle, "", "")
-		monitor.AIGPUsIdle(len(m.gpus) - len(m.gpuContainers)) // Indicates a misconfiguration so we should alert on this
+		monitor.AIContainersInUse(capacity.ContainersInUse, pipeline)


Note that this will only work if a given Orchestrator is running one single pipeline. It won't work if the O is running 2 separate pipelines. I think it may be ok for now, because in our infra we're using only a single pipeline for an orchestrator. But at least add a comment and some // TODO. We should work on supporting multiple pipelines in the metrics, because I believe that soon we'll have more pipelines.

I think that each pipeline+model would have its own DockerManager reporting metrics separately, right?

No, unfortunately, DockerManager is shared. So you can have 1 O running different pipelines.

monitor/census.go

pwilczynskiclearcode · 2025-08-08T07:12:00Z

server/ai_mediaserver.go

 			})
+
+			clog.V(common.VERBOSE).Infof(ctx, "AI Live video attempt")
+			monitor.AILiveVideoAttempt(pipeline)


This pipeline is actually our modelID 🤦🏻

pwilczynskiclearcode · 2025-08-08T13:26:24Z

ai/worker/docker.go

-		monitor.AIContainersIdle(capacity.ContainersIdle, "", "")
-		monitor.AIGPUsIdle(len(m.gpus) - len(m.gpuContainers)) // Indicates a misconfiguration so we should alert on this
+		monitor.AIContainersInUse(capacity.ContainersInUse, pipeline, modelID)
+		monitor.AIContainersIdle(capacity.ContainersIdle, pipeline, modelID, "")


so here (worker) ai_container_idle is produced with pipeline and model_name and here (orch.) with just model and orchestratorUri. Should we have separate metrics from O and W?

Now, I think it would be better to have a separate metric from Gateway and a separate from Orchestrator. Because it's confusing right now.

Anyway, I'm ok if it's done later, as a separate PR.

github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Aug 7, 2025

pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch 4 times, most recently from 20729c6 to 358e217 Compare August 7, 2025 11:01

pwilczynskiclearcode commented Aug 7, 2025

View reviewed changes

monitor/census.go Outdated Show resolved Hide resolved

pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch 3 times, most recently from 0e4b4ce to c3a5874 Compare August 7, 2025 13:16

pwilczynskiclearcode commented Aug 7, 2025

View reviewed changes

ai/worker/docker.go Outdated Show resolved Hide resolved

pwilczynskiclearcode marked this pull request as ready for review August 7, 2025 14:41

pwilczynskiclearcode requested review from hjpotter92, leszko and mjh1 August 7, 2025 14:42

pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch from c3a5874 to 93f598f Compare August 8, 2025 07:03

leszko approved these changes Aug 8, 2025

View reviewed changes

pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch from 93f598f to feb6065 Compare August 8, 2025 07:08

pwilczynskiclearcode commented Aug 8, 2025

View reviewed changes

pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch 2 times, most recently from 033e559 to 8786b0e Compare August 8, 2025 07:13

livepeer deleted a comment from leszko Aug 8, 2025

pwilczynskiclearcode requested a review from leszko August 8, 2025 07:13

pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch 2 times, most recently from 81d9d25 to 1b3534c Compare August 8, 2025 07:26

pwilczynskiclearcode changed the title ~~INF-314 AI prometheus metrics to include "pipeline" label~~ INF-314 AI prometheus metrics to set correct "pipeline" and "model_name" labels Aug 8, 2025

INF-314 AI prometheus metrics to include "pipeline" label

3e2dc61

pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch from 1b3534c to 3e2dc61 Compare August 8, 2025 13:20

pwilczynskiclearcode commented Aug 8, 2025

View reviewed changes

leszko approved these changes Aug 11, 2025

View reviewed changes

pwilczynskiclearcode merged commit 1238d1b into master Aug 11, 2025
16 checks passed

pwilczynskiclearcode deleted the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch August 11, 2025 08:02

pwilczynskiclearcode requested a review from victorges August 13, 2025 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INF-314 AI prometheus metrics to set correct "pipeline" and "model_name" labels#3699

INF-314 AI prometheus metrics to set correct "pipeline" and "model_name" labels#3699
pwilczynskiclearcode merged 1 commit intomasterfrom
pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics

pwilczynskiclearcode commented Aug 7, 2025 •

edited

Loading

Uh oh!

linear bot commented Aug 7, 2025

Uh oh!

Uh oh!

codecov bot commented Aug 7, 2025

Uh oh!

Uh oh!

leszko left a comment

Uh oh!

leszko Aug 8, 2025

Uh oh!

pwilczynskiclearcode Aug 8, 2025

Uh oh!

leszko Aug 8, 2025

Uh oh!

Uh oh!

pwilczynskiclearcode Aug 8, 2025

Uh oh!

pwilczynskiclearcode Aug 8, 2025

Uh oh!

leszko Aug 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pwilczynskiclearcode commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Aug 7, 2025

Uh oh!

Uh oh!

codecov bot commented Aug 7, 2025

Codecov Report

Uh oh!

Uh oh!

leszko left a comment

Choose a reason for hiding this comment

Uh oh!

leszko Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

pwilczynskiclearcode Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

leszko Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pwilczynskiclearcode Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

pwilczynskiclearcode Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

leszko Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pwilczynskiclearcode commented Aug 7, 2025 •

edited

Loading