Skip to content

feat: secure kepler's /metrics endpoint with kube rbac proxy#554

Merged
sthaha merged 1 commit intosustainable-computing-io:v1alpha1from
KaiyiLiu1234:kube-rbac-proxy
Jul 9, 2025
Merged

feat: secure kepler's /metrics endpoint with kube rbac proxy#554
sthaha merged 1 commit intosustainable-computing-io:v1alpha1from
KaiyiLiu1234:kube-rbac-proxy

Conversation

@KaiyiLiu1234
Copy link
Copy Markdown
Collaborator

Power Monitor CR has been updated with a Security Mode to protect kepler's /metrics endpoint. When Mode is switched to 'rbac', the power monitor daemonset will be updated with kube rbac proxy sidecar which will enforce which service accounts can access /metrics based on static configuration and clusterrole/role rules. TLS is also enabled using kube rbac proxy and certs produced by openshift. Power Monitor API has been updated to include 'allowedSANames' which will allow users to request certain service accounts to have access.

@github-actions github-actions bot added the feat A new feature or enhancement label Jul 3, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Jul 3, 2025

Codecov Report

Attention: Patch coverage is 18.78558% with 856 lines in your changes missing coverage. Please review.

Project coverage is 28.35%. Comparing base (664edad) to head (44a6e70).
Report is 4 commits behind head on v1alpha1.

Files with missing lines Patch % Lines
pkg/utils/test/framework.go 0.39% 509 Missing ⚠️
pkg/reconciler/security.go 0.00% 261 Missing ⚠️
pkg/components/power-monitor/deployment.go 78.75% 47 Missing and 4 partials ⚠️
pkg/utils/k8s/k8s.go 29.16% 17 Missing ⚠️
pkg/utils/test/power_monitor_internal_builder.go 0.00% 9 Missing ⚠️
pkg/reconciler/service_monitor.go 0.00% 6 Missing ⚠️
pkg/reconciler/power-monitor.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           v1alpha1     #554      +/-   ##
============================================
- Coverage     29.80%   28.35%   -1.46%     
============================================
  Files            23       25       +2     
  Lines          2124     3164    +1040     
============================================
+ Hits            633      897     +264     
- Misses         1478     2248     +770     
- Partials         13       19       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@KaiyiLiu1234
Copy link
Copy Markdown
Collaborator Author

@sthaha Unit tests have been added in deployment_test.go. e2e tests have been added. Only change left is moving kube-rbac-proxy-image to the csv which will be added tomorrow asap.

@KaiyiLiu1234 KaiyiLiu1234 force-pushed the kube-rbac-proxy branch 13 times, most recently from e75df84 to 45110c4 Compare July 3, 2025 14:39
Copy link
Copy Markdown
Collaborator

@vimalk78 vimalk78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick 2 comments

- image: quay.io/sustainable_computing_io/kepler-reboot:v0.0.10
name: kepler-reboot
replaces: kepler-operator.v0.17.0
replaces: kepler-operator.v0.0.3-refactorsecurity
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont add this change

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

func newKubeRBACProxyContainer(pmi *v1alpha1.PowerMonitorInternal) corev1.Container {
return corev1.Container{
Name: KubeRBACProxyContainerName,
Image: "quay.io/brancz/kube-rbac-proxy:v0.19.0",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont hard code image

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Image can be added via csv just like the kepler-reboot image

@KaiyiLiu1234 KaiyiLiu1234 force-pushed the kube-rbac-proxy branch 7 times, most recently from 608d711 to a402e36 Compare July 5, 2025 00:36
Image string `json:"image"`

// +kubebuilder:validation:MinLength=3
// +kubebuilder:default="quay.io/brancz/kube-rbac-proxy:v0.19.0"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the default for here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}

pmis := &v1alpha1.PowerMonitorInternalList{}
err := r.List(ctx, pmis)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err:= r.Client.List(ctx,pmis)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they both work the same but the precommit checks fail if using r.Client instead of r.List. I believe it's because r.List is more idiomatic than go given the fact client.Client is embedded field. So I am forced by precommit to use r.List.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal ... and consider these comments as nit

}

pmis := &v1alpha1.PowerMonitorInternalList{}
err := r.List(ctx, pmis)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err := r.Client.List(ctx,pmis)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with r.Client.List() but like @KaiyiLiu1234 said, lets fix the pre-commit ( in a totally separate pr) :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's golangci-lint which is expects r.List instead of r.Client.List

c = c.Watches(&corev1.Secret{},
handler.EnqueueRequestsFromMapFunc(r.mapSecretToPowerMonitorRequests),
builder.WithPredicates(
predicate.GenerationChangedPredicate{},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have GenerationChanged ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in is the preference to switch to ResourceVersionChanged (which should cover all changes)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait didn't we want to use Generationchanged so we did not have to use Resourceversionchanged which may result in many reconciliations for being to sensitive?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, lets keep it this way .. I didn't realize secrets (certs) may be updated externally. Could you please add a comment why GenerationChanged is needed despite of having AnnotationChanged ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment

}
config.Authorization.StaticRules = append(config.Authorization.StaticRules, newRule)
}
yamlData, _ := yaml.Marshal(&config)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not ignore (any) error .. can we propagate it or log it some how if it isn't easy, lets file an issue and the address this in a generic way.

Copy link
Copy Markdown
Collaborator

@sthaha sthaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good job @KaiyiLiu1234 !!

Note: minor comments to be be fixed in follow up PRs ..
@KaiyiLiu1234 could you please create tracking issue for all unresolved comments?

Args: []string{
fmt.Sprintf("--secure-listen-address=0.0.0.0:%d", SecurePort),
fmt.Sprintf("--upstream=http://127.0.0.1:%d", PowerMonitorDSPort),
fmt.Sprintf("--auth-token-audiences=%s.%s.svc", pmi.Name, pmi.Namespace()),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need auth-token-audiences? With this kuberbac expects user token's with audience power-monitor.power-monitor.svc to be present in order to access

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is important to have. The purpose of the audience is to deliberately limit the token to the intended service (in this case our power-monitor service). We do not want the token to be used for any other purposes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it make sense to have this if we had more than one service? And we want SA to be for a specific service

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can provide many service accounts they want kepler to allow access to. Each service account will need to pass a token (for authorization). Users can have many service accounts for many different target services (not just power-monitor service). This can help with accidental use of a token or with stolen tokens on the user side.

Power Monitor CR has been updated with a Security Mode to protect kepler's /metrics endpoint.
When Mode is switched to 'rbac', the power monitor daemonset will be updated with kube rbac proxy
sidecar which will enforce which service accounts can access /metrics based on static
configuration and clusterrole/role rules. TLS is also enabled using kube rbac proxy and certs
produced by openshift. Power Monitor API has been updated to include 'allowedSANames' which will
allow users to request certain service accounts to have access.

Signed-off-by: Kaiyi Liu <kaliu@redhat.com>
@sthaha sthaha merged commit 718e00f into sustainable-computing-io:v1alpha1 Jul 9, 2025
17 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat A new feature or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants