Support fetching PSI stats for cgroupv2 containers#3358
Support fetching PSI stats for cgroupv2 containers#3358dqminh wants to merge 2 commits intoopencontainers:mainfrom
Conversation
| const examplePsiData = `some avg10=1.71 avg60=2.36 avg300=2.57 total=230548833 | ||
| full avg10=1.00 avg60=1.01 avg300=1.00 total=157622356` | ||
|
|
||
| func TestStatCPUPsi(t *testing.T) { |
There was a problem hiding this comment.
Can we have integration tests too?
There was a problem hiding this comment.
added one for runc events
|
Can we please have |
31e4197 to
5699010
Compare
kolyshkin
left a comment
There was a problem hiding this comment.
a few typos in helpers.bash, otherwise LGTM
|
ping @opencontainers/runc-maintainers |
|
ping @opencontainers/runc-maintainers again |
|
@dqminh can you please rebase? We have changed something in the tests and they need to be re-run. |
a07e8f8 to
521a41d
Compare
|
@kolyshkin thanks, rebase is done. I had to add |
|
ping @kolyshkin, this would be great to get in, especially for use downstream in cAdvisor / k8s |
| if err := statCpu(m.dirPath, st); err != nil && !os.IsNotExist(err) { | ||
| errs = append(errs, err) | ||
| } | ||
| // psi ( since kernel 4.20) |
There was a problem hiding this comment.
nit: extra space after (
| } | ||
|
|
||
| func parsePSIData(psi []string) (data cgroups.PSIData, err error) { | ||
| for i := 0; i < len(psi); i++ { |
There was a problem hiding this comment.
I think it'll be more readable to use
for _, f := range psi {(and use f instead of psi[i] below).
| "github.com/opencontainers/runc/libcontainer/cgroups" | ||
| ) | ||
|
|
||
| const examplePsiData = `some avg10=1.71 avg60=2.36 avg300=2.57 total=230548833 |
kolyshkin
left a comment
There was a problem hiding this comment.
A few nits, otherwise looks good.
1f36e91 to
c495b2b
Compare
We read output from the following files if they exists: - cpu.pressure - memory.pressure - io.pressure Each are in format: ``` some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 ``` Signed-off-by: Daniel Dao <dqminh89@gmail.com>
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
|
@kolyshkin updated the nits, also added a fix to check for ENOTSUP when kernel compiled with PSI_DEFAULT_DISABLED, so reading *.pressure file return ENOTSUP as seen in https://cirrus-ci.com/task/4929935776153600 |
|
ping @kolyshkin for another round of reviews :) |
| // open *.pressure file returns | ||
| // - ErrNotExist when kernel < 4.20 or CONFIG_PSI is disabled | ||
| // - ENOTSUP when we requires psi=1 in kernel command line to enable PSI support | ||
| if err := statPSI(m.dirPath, "cpu.pressure", &st.CpuStats.PSI); err != nil && !errors.Is(err, os.ErrNotExist) && !errors.Is(err, syscall.ENOTSUP) { |
There was a problem hiding this comment.
It's probably better to use unix.ENOTSUP from golang.org/x/sys/unix here (as syscall is considered an obsoleted package).
| psi) | ||
| # If PSI is not compiled in the kernel, the file will not exist. | ||
| # If PSI is compiled, but not enabled, read will fail with ENOTSUPP. | ||
| if [[ ! $(cat /sys/fs/cgroup/cpu.pressure) ]]; then |
There was a problem hiding this comment.
No need to use [[ here.
if ! cat /sys/fs/cgroup/cpu.pressure &>/dev/null; thenshould be sufficient.
|
gentle ping to @dqminh regarding the updates mentioned. would be great to get this in so we can consume PSI metrics downstream in cAdvisor / k8s |
|
@kolyshkin Maybe we can address those issues after merging? |
|
Hey @dqminh this is a great PR. There has been no activity for some time so I am wondering if you are still interested? Otherwise I would be willing to take over and make the requested changes. |
|
@kolyshkin Given that this PR has stalled, how would you like to proceed? I can make the requested changes. Would be great if we could make PSI metrics available. |
| Some PSIData `json:"some,omitempty"` | ||
| Full PSIData `json:"full,omitempty"` |
There was a problem hiding this comment.
Wondering if it would make sense to make these pointers, as it looks like that's how they're consumed in various places, plus it would allow parsePSIData to return nil, err (which may be slightly cleaner in in tent.
| for _, f := range psi { | ||
| kv := strings.SplitN(f, "=", 2) | ||
| if len(kv) != 2 { | ||
| return data, fmt.Errorf("invalid psi data: %q", f) |
There was a problem hiding this comment.
nit: these should probably use PSI (capitalised) as well
| type PSIData struct { | ||
| Avg10 float64 `json:"avg10"` | ||
| Avg60 float64 `json:"avg60"` | ||
| Avg300 float64 `json:"avg300"` | ||
| Total uint64 `json:"total"` | ||
| } | ||
|
|
||
| type PSIStats struct { | ||
| Some PSIData `json:"some,omitempty"` | ||
| Full PSIData `json:"full,omitempty"` | ||
| } |
There was a problem hiding this comment.
Wondering if we could just make these an alias for cgroups.PSIData and cgroups.PSIStat. That way we wouldn't need the conversion function.
| case "avg10": | ||
| v, err := strconv.ParseFloat(kv[1], 64) | ||
| if err != nil { | ||
| return data, fmt.Errorf("invalid psi value: %q", f) | ||
| } | ||
| data.Avg10 = v | ||
| case "avg60": | ||
| v, err := strconv.ParseFloat(kv[1], 64) | ||
| if err != nil { | ||
| return data, fmt.Errorf("invalid psi value: %q", f) | ||
| } | ||
| data.Avg60 = v | ||
| case "avg300": |
There was a problem hiding this comment.
These are quire repetitive; perhaps we could combine them 🤔
| type CpuStats struct { | ||
| CpuUsage CpuUsage `json:"cpu_usage,omitempty"` | ||
| ThrottlingData ThrottlingData `json:"throttling_data,omitempty"` | ||
| PSI PSIStats `json:"psi,omitempty"` |
There was a problem hiding this comment.
I think we need this to be a pointer for the omitempty to actually do something; https://go.dev/play/p/R68xTF5hreC
|
@thaJeztah i'm really sorry about the delay. We have been running with the patch for so long (!) that i forgot it hasn't been merged. I left a few comments in the PR on my fork. Generally i think we can try to make the refactoring of existing practices separately from the current change. |
|
@dqminh I read the comments and tried to fix everything and opened dqminh#37 . I have here issue I can't compile it and test wait for fetching containers since 15 minutes. Tests run and fail for some reason, that might be problem of my system, not the code, but let's see what the test suite says. I hope this helps to get it merged. :) |
|
Stray comment: As of linux 6.1 there's an interface called |
|
FYI: I created #3679 as follow up to get this merged |
This supports fetching PSI stats for cgroupv2 containers. We read the PSI metrics if they are available from:
See more about PSI at https://www.kernel.org/doc/html/latest/accounting/psi.html