Skip to content

INF-314 AI prometheus metrics to set correct "pipeline" and "model_name" labels#3699

Merged
pwilczynskiclearcode merged 1 commit intomasterfrom
pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics
Aug 11, 2025
Merged

INF-314 AI prometheus metrics to set correct "pipeline" and "model_name" labels#3699
pwilczynskiclearcode merged 1 commit intomasterfrom
pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics

Conversation

@pwilczynskiclearcode
Copy link
Contributor

@pwilczynskiclearcode pwilczynskiclearcode commented Aug 7, 2025

What does this pull request do? Explain your changes. (required)
Normalising AI metrics to have correct pipeline, model_name and orchestratorUri tags.

Specific updates (required)

  • ai_container_in_use to set correct mode_name label (orchestratorUri got removed)
  • ai_container_idle to set correct mode_name label
  • ai_gpus_idle to set mode_name label (orchestratorUri got removed)
  • ai_current_live_pipelines to set model_name label (orchestratorUri got removed)
  • ai_live_attempt to introduce model_name label

How did you test each of these updates (required)
...

Does this pull request close any open issues?
no

Checklist:

@linear
Copy link

linear bot commented Aug 7, 2025

@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Aug 7, 2025
@pwilczynskiclearcode pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch 4 times, most recently from 20729c6 to 358e217 Compare August 7, 2025 11:01
@pwilczynskiclearcode pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch 3 times, most recently from 0e4b4ce to c3a5874 Compare August 7, 2025 13:16
@codecov
Copy link

codecov bot commented Aug 7, 2025

Codecov Report

❌ Patch coverage is 28.94737% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.90880%. Comparing base (531a043) to head (c3a5874).

Files with missing lines Patch % Lines
monitor/census.go 20.83333% 19 Missing ⚠️
server/ai_mediaserver.go 0.00000% 5 Missing ⚠️
ai/worker/docker.go 66.66667% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3699         +/-   ##
===================================================
- Coverage   31.92117%   31.90880%   -0.01237%     
===================================================
  Files            156         156                 
  Lines          47445       47454          +9     
===================================================
- Hits           15145       15142          -3     
- Misses         31405       31417         +12     
  Partials         895         895                 
Files with missing lines Coverage Δ
ai/worker/docker.go 72.79412% <66.66667%> (-0.18382%) ⬇️
server/ai_mediaserver.go 4.66853% <0.00000%> (ø)
monitor/census.go 61.04053% <20.83333%> (-0.45582%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 531a043...c3a5874. Read the comment docs.

Files with missing lines Coverage Δ
ai/worker/docker.go 72.79412% <66.66667%> (-0.18382%) ⬇️
server/ai_mediaserver.go 4.66853% <0.00000%> (ø)
monitor/census.go 61.04053% <20.83333%> (-0.45582%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pwilczynskiclearcode pwilczynskiclearcode marked this pull request as ready for review August 7, 2025 14:41
@pwilczynskiclearcode pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch from c3a5874 to 93f598f Compare August 8, 2025 07:03
Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. My only concern is that we all the time assuming that the O is running a single pipeline. It doesn't need to be in this PR, but we can think how to make it working for multiple pipelines. On other words, how to make GetCapacity() working with the pipeline param.

monitor.AIContainersInUse(capacity.ContainersInUse, "", "")
monitor.AIContainersIdle(capacity.ContainersIdle, "", "")
monitor.AIGPUsIdle(len(m.gpus) - len(m.gpuContainers)) // Indicates a misconfiguration so we should alert on this
monitor.AIContainersInUse(capacity.ContainersInUse, pipeline)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will only work if a given Orchestrator is running one single pipeline. It won't work if the O is running 2 separate pipelines. I think it may be ok for now, because in our infra we're using only a single pipeline for an orchestrator. But at least add a comment and some // TODO. We should work on supporting multiple pipelines in the metrics, because I believe that soon we'll have more pipelines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that each pipeline+model would have its own DockerManager reporting metrics separately, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, unfortunately, DockerManager is shared. So you can have 1 O running different pipelines.

@pwilczynskiclearcode pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch from 93f598f to feb6065 Compare August 8, 2025 07:08
})

clog.V(common.VERBOSE).Infof(ctx, "AI Live video attempt")
monitor.AILiveVideoAttempt(pipeline)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pipeline is actually our modelID 🤦🏻

@pwilczynskiclearcode pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch 2 times, most recently from 033e559 to 8786b0e Compare August 8, 2025 07:13
@livepeer livepeer deleted a comment from leszko Aug 8, 2025
@pwilczynskiclearcode pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch 2 times, most recently from 81d9d25 to 1b3534c Compare August 8, 2025 07:26
@pwilczynskiclearcode pwilczynskiclearcode changed the title INF-314 AI prometheus metrics to include "pipeline" label INF-314 AI prometheus metrics to set correct "pipeline" and "model_name" labels Aug 8, 2025
@pwilczynskiclearcode pwilczynskiclearcode force-pushed the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch from 1b3534c to 3e2dc61 Compare August 8, 2025 13:20
monitor.AIContainersIdle(capacity.ContainersIdle, "", "")
monitor.AIGPUsIdle(len(m.gpus) - len(m.gpuContainers)) // Indicates a misconfiguration so we should alert on this
monitor.AIContainersInUse(capacity.ContainersInUse, pipeline, modelID)
monitor.AIContainersIdle(capacity.ContainersIdle, pipeline, modelID, "")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so here (worker) ai_container_idle is produced with pipeline and model_name and here (orch.) with just model and orchestratorUri. Should we have separate metrics from O and W?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I think it would be better to have a separate metric from Gateway and a separate from Orchestrator. Because it's confusing right now.

Anyway, I'm ok if it's done later, as a separate PR.

@pwilczynskiclearcode pwilczynskiclearcode merged commit 1238d1b into master Aug 11, 2025
16 checks passed
@pwilczynskiclearcode pwilczynskiclearcode deleted the pawel/inf-314-implement-missing-pipeline-label-in-go-livepeer-metrics branch August 11, 2025 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Issues and PR related to the AI-video branch. go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants