Skip to content

Conversation

@anik120
Copy link
Member

@anik120 anik120 commented Sep 23, 2025

Description of the change:

Implements native metrics authentication and authorization for OLM and catalog operators using controller-runtime
filters. Adds TLS support with automatic certificate management via cert-manager, replacing unprotected HTTP metrics
endpoints with authenticated HTTPS endpoints on port 8443.

Motivation for the change:

Current metrics endpoints are unprotected and accessible to anyone with cluster access, creating potential security
risks. This change secures metrics access by requiring proper Kubernetes RBAC authentication and authorization,
following the same pattern used by operator-controller for production deployments.

Architectural changes:

  • Integrates controller-runtime's WithAuthenticationAndAuthorization filter for metrics endpoints
  • Adds cert-manager integration for automatic TLS certificate lifecycle management
  • Implements dynamic certificate watching and reloading using existing filemonitor package
  • Disables HTTP/2 to mitigate known CVEs, enforcing HTTP/1.1 only
  • Updates both operators to use HTTPS (port 8443) with client certificate authentication
  • Maintains fallback to unprotected metrics when TLS is disabled for development scenarios

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@anik120 anik120 requested a review from joelanford September 23, 2025 19:15
@anik120 anik120 requested review from tmshort and removed request for ankitathomas and perdasilva September 23, 2025 19:15
@anik120 anik120 force-pushed the native-metrics-authnz branch 3 times, most recently from a62b02f to f63b885 Compare September 23, 2025 20:28
@tmshort
Copy link
Contributor

tmshort commented Sep 23, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2025
@anik120
Copy link
Member Author

anik120 commented Sep 23, 2025

Looks like I have to make change to the metrics e2e tests coz the current ones are not authenticating themselves which is why they're failing. Great sign that the changes are working, working on the modifications to the e2e tests.....

@anik120 anik120 force-pushed the native-metrics-authnz branch from f63b885 to 7704b39 Compare September 24, 2025 13:44
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 24, 2025
@anik120 anik120 force-pushed the native-metrics-authnz branch from 7704b39 to 9bac784 Compare September 24, 2025 13:51
@anik120 anik120 force-pushed the native-metrics-authnz branch from 9bac784 to 63ab287 Compare September 24, 2025 14:32
KIND_CLUSTER_NAME="kind-olmv0-${i}" \
KIND_CREATE_OPTS="--kubeconfig=${E2E_KUBECONFIG_ROOT}/kubeconfig-${i}" \
HELM_INSTALL_OPTS="--kubeconfig ${E2E_KUBECONFIG_ROOT}/kubeconfig-${i}" \
HELM_INSTALL_OPTS="--kubeconfig ${E2E_KUBECONFIG_ROOT}/kubeconfig-${i} --set certManager.enabled=false" \
Copy link
Member Author

@anik120 anik120 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This presented itself as the easiest way to do all the tests we have for metrics (since these tests are about testing the metrics omitted, eg "creating a subscription emits these metrics", and not the security aspect of the endpoints).

path: /healthz
port: {{ .Values.olm.service.internalPort }}
scheme: {{ if .Values.olm.tlsSecret }}HTTPS{{ else }}HTTP{{end}}
port: {{ if .Values.certManager.enabled }}{{ .Values.olm.service.internalPortHttps }}{{ else }}{{ .Values.olm.service.internalPort }}{{ end }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means the templates had to be updated to configure different endpoints based on the presence of the cert-manager

e2e-local: e2e-build kind-create e2e-local-deploy e2e

.PHONY: e2e-local-deploy
e2e-local-deploy: $(KIND) $(HELM) #HELP Deploy OLM for e2e testing (without cert-manager)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also had to make a new deploy target to deploy olm without cert-manager for e2e testing

@tmshort
Copy link
Contributor

tmshort commented Sep 24, 2025

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Oct 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 589a5d0 into operator-framework:master Oct 10, 2025
14 checks passed
camilamacedo86 added a commit to camilamacedo86/operator-lifecycle-manager that referenced this pull request Oct 23, 2025
anik120 added a commit to anik120/operator-lifecycle-manager that referenced this pull request Nov 11, 2025
**Problem:**

PR operator-framework#3660 introduced cert-manager as a hard dependency for OLM deployments, causing installation failures when
cert-manager CRDs are not present:

error getting resource "olm/olm-cert" with GVK "cert-manager.io/v1, Kind=Certificate":
no matches for kind "Certificate" in version "cert-manager.io/v1"

This is a breaking change for existing users who don't have cert-manager installed.

**Solution:**

Make secured metrics endpoints an opt-in feature by setting `certManager.enabled: false` by default in Helm values.
Users who want authenticated metrics must explicitly enable cert-manager.

**Changes:**

- Set `certManager.enabled: false` in `deploy/chart/values.yaml`
- Remove `cert-manager-install` dependency from `make run-local`
- Remove `--set certManager.enabled=true` override from `make deploy`
- Remove automatic cert-manager cleanup from `make undeploy`

**Behavior:**

- Default (cert-manager disabled): HTTP metrics on port 8080, no authentication
- Opt-in (`certManager.enabled: true`): HTTPS metrics on port 8443 with authentication/authorization

Fixes the breaking change introduced in operator-framework#3660 while preserving the secured metrics feature for users who want it.
openshift-merge-bot bot pushed a commit that referenced this pull request Nov 11, 2025
**Problem:**

PR #3660 introduced cert-manager as a hard dependency for OLM deployments, causing installation failures when
cert-manager CRDs are not present:

error getting resource "olm/olm-cert" with GVK "cert-manager.io/v1, Kind=Certificate":
no matches for kind "Certificate" in version "cert-manager.io/v1"

This is a breaking change for existing users who don't have cert-manager installed.

**Solution:**

Make secured metrics endpoints an opt-in feature by setting `certManager.enabled: false` by default in Helm values.
Users who want authenticated metrics must explicitly enable cert-manager.

**Changes:**

- Set `certManager.enabled: false` in `deploy/chart/values.yaml`
- Remove `cert-manager-install` dependency from `make run-local`
- Remove `--set certManager.enabled=true` override from `make deploy`
- Remove automatic cert-manager cleanup from `make undeploy`

**Behavior:**

- Default (cert-manager disabled): HTTP metrics on port 8080, no authentication
- Opt-in (`certManager.enabled: true`): HTTPS metrics on port 8443 with authentication/authorization

Fixes the breaking change introduced in #3660 while preserving the secured metrics feature for users who want it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants