gatewayclass: Enable Horizontal Pod Autoscaling by Miciah · Pull Request #1326 · openshift/cluster-ingress-operator

Miciah · 2025-12-16T21:51:26Z

Enable Horizontal Pod Autoscaling (HPA) on Istio. Hard-code the autoscaling parameters based on the cluster infrastructure config:

If the infrastructure topology is "SingleReplica", set minimum to 1.
Otherwise, set minimum to 2.
In any case, set maximum to 10.

Summary by CodeRabbit

New Features
- Added infrastructure topology awareness to gateway configuration, enabling automatic HPA (Horizontal Pod Autoscaling) settings based on cluster topology (single-replica or highly-available modes).
Refactor
- Improved controller initialization to apply manager-level defaults consistently across gateway controllers.

openshift-ci · 2025-12-16T21:51:30Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Miciah · 2025-12-16T21:51:33Z

/test e2e-aws-operator

openshift-ci · 2025-12-16T21:51:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gcs278 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Miciah · 2025-12-17T00:47:35Z

e2e-aws-operator passed. deployments.json shows that the istiod pod has 2 replicas:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2001047586798047232/artifacts/e2e-aws-operator/gather-extra/artifacts/deployments.json' | jq '.items.[]|select(.metadata.name=="istiod-openshift-gateway")|.spec.replicas'
2
%

However, there are some errors due to missing metrics:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2001047586798047232/artifacts/e2e-aws-operator/gather-extra/artifacts/events.json' | jq -r '[.items.[]|select(.metadata.namespace=="openshift-ingress" and .source.component=="horizontal-pod-autoscaler")]|sort_by(.firstTimestamp//.metadata.creationTimestamp)|.[]|.involvedObject.name+": "+.message'
istiod-openshift-gateway: New size: 2; reason: Current number of replicas below Spec.MinReplicas
istiod-openshift-gateway: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: New size: 2; reason: Current number of replicas below Spec.MinReplicas
istiod-openshift-gateway: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
%

Moreover, the proxy pods do not get scaled out more than 1 replica:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2001047586798047232/artifacts/e2e-aws-operator/gather-extra/artifacts/events.json' | jq -r '[.items.[]|select(.metadata.namespace=="openshift-ingress" and (.metadata.name|startswith("test-gateway")) and .source.component=="deployment-controller")]|sort_by(.firstTimestamp//.metadata.creationTimestamp)|.[]|.involvedObject.name+": "+.message'
test-gateway-openshift-default: Scaled up replica set test-gateway-openshift-default-74dff54c89 from 0 to 1
test-gateway-openshift-default: Scaled up replica set test-gateway-openshift-default-577b5f5db7 from 0 to 1
test-gateway-openshift-default: Scaled down replica set test-gateway-openshift-default-74dff54c89 from 1 to 0
test-gateway-update-openshift-default: Scaled up replica set test-gateway-update-openshift-default-75c4dbd84f from 0 to 1
%

rikatz · 2025-12-17T14:00:00Z

pkg/operator/controller/gatewayclass/controller.go


+	// Watch the cluster infrastructure config in case the infrastructure
+	// topology changes.
+	if err := c.Watch(source.Kind[client.Object](operatorCache, &configv1.Infrastructure{}, reconciler.enqueueRequestForSomeGatewayClass())); err != nil {


question here is, if you change Sail Operator parameter will it reflect on Pilot config?
Also, in case this parameter is also used for Gateway replica deployment, will Sail reconcile all the gateways and HPA?

(Giving some thoughts for test)

answering myself:
the change of HPA parameter from pilot does not impact on the GatewayClass, it needs its own definition from spec.infrastructure.paramRefs

rikatz · 2025-12-17T14:11:20Z

testing on my own cluster (AWS, HA):

kubectl get hpa -A
NAMESPACE           NAME                       REFERENCE                             TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
openshift-ingress   istiod-openshift-gateway   Deployment/istiod-openshift-gateway   cpu: <unknown>/80%   2         10        2          36s

It is scaling pilot, but not sure about the gateways, let me test

rikatz · 2025-12-17T17:00:47Z

IIUC we can pass to sail the configs of the Gateway class on the Istio resource, and it will create the configmap for us (even the gateway class being created by the user): https://github.com/openshift-service-mesh/sail-operator/blob/f3a3c6d7e6f2c2ae412b09ce1a78bc93258b4db4/resources/v1.28.0/charts/istiod/templates/gateway-class-configmap.yaml

We can make CIO set this property and the configmap will be created for us (this includes even setting the annotations for internal only clusters!), the problem is that today the json.RawMessage used by Sail API is not accepting the content correctly. I have asked the sail operator team about it and if they have been using it, otherwise we can improve this workflow on Sail Operator and consume on CIO

Miciah · 2025-12-22T03:21:32Z

/test e2e-aws-operator

Enable Horizontal Pod Autoscaling (HPA) on Istio. Hard-code the autoscaling parameters based on the cluster infrastructure config: - If the infrastructure topology is "SingleReplica", set minimum to 1. - Otherwise, set minimum to 2. - In any case, set maximum to 10. * pkg/operator/controller/gatewayclass/controller.go (gatewayclassControllerIndexFieldName): New const. (NewUnmanaged): Watch infrastructures. Initialize fieldIndexer in the reconciler so that it can be used to create an index over gatewayclasses later. The index cannot be created directly in NewUnmanaged as the gatewayclasses resource might not yet exist when NewUnmanaged is called. (reconciler): Add fieldIndexer and startGatewayclassControllerIndex. (Reconcile): Get the cluster infrastructure object and pass it to ensureIstio. Create an index over gatewayclasses by spec.controllerName, using fieldIndexer from the reconciler, gatewayclassControllerIndexFieldName for the index field name, and startGatewayclassControllerIndex to ensure the index is only created once, on first reconciliation. * pkg/operator/controller/gatewayclass/controller_test.go (Test_Reconcile): Add test cases for missing cluster infrastructure config. Add a test case for SingleReplica topology mode. Add a test case with multiple gatewayclasses. Add the cluster infrastructure config object to existingObjects for the existing test cases. Add the expected HPA configuration to the expected Istio CRs in test cases. Initialize fieldIndexer with a fake indexer in the reconciler. (FakeIndexer, (FakeIndexer).IndexField): New type and method, used in Test_Reconcile. * pkg/operator/controller/gatewayclass/istio.go (ensureIstio): Add an infraConfig parameter, and pass the argument to desiredIstio. Use the new index to list gatewayclasses with the OpenShift gateway controller name, and pass the list of gatewayclasses to desiredIstio as well. * pkg/operator/controller/gatewayclass/istio.go (desiredIstio): Add an infraConfig parameter and a gatewayclasses parameter, and use the arguments to look up the infrastructure topology mode and and configure HPA accordingly.

Miciah · 2025-12-22T14:28:17Z

/test e2e-aws-operator

Miciah · 2025-12-22T21:19:59Z

/test e2e-aws-operator

Don't overwrite the gateway variable's value from createGatewayWithListeners with a possibly nil value from assertGatewaySuccessful. The gateway variable is used in a cleanup handler, which would panic if the variable had a nil value. The gateway value from assertGatewaySuccessful isn't really needed, so it can be safely ignored. * test/e2e/gateway_api_test.go (testGatewayAPIDNSListenerWithNoHostname) (testGatewayAPIDNSListenerUpdate): Ignore the return value from assertGatewaySuccessful.

Miciah · 2025-12-23T02:46:11Z

/test e2e-aws-operator

Set default values, such as the logger, in the controller options for unmanaged controllers, namely the gateway-labeler, gateway-service-dns, and gatewayclass controllers. Before controller-runtime v0.21.0, controller-runtime implicitly set default values for controller options for managed and unmanaged controllers alike. Controller-runtime internally used the DefaultFromConfig method to do so. Since controller-runtime v0.21.0, these default values are not implicitly set for unmanaged controllers[1]. In particular, this means that the logger is not initialized, and so the controller initialization and any reconciliation errors are not logged. This commit explicitly sets default values by calling DefaultFromConfig for unmanaged controllers. Follow-up to commit 66485b8. 1. kubernetes-sigs/controller-runtime@d9ff283 * pkg/operator/controller/gateway-labeler/controller.go (NewUnmanaged): * pkg/operator/controller/gateway-service-dns/controller.go (NewUnmanaged): * pkg/operator/controller/gatewayclass/controller.go (NewUnmanaged): Use DefaultFromConfig to set default values for controller options.

coderabbitai · 2026-03-04T17:03:55Z

Walkthrough

Three controller implementations adopt manager-supplied default options for consistent configuration. The gatewayclass controller gains infrastructure topology awareness, dynamic field indexing of GatewayClass objects, and infrastructure-driven Istio configuration generation. Tests expand to cover multiple infrastructure topologies and field indexer integration.

Changes

Cohort / File(s)	Summary
Gateway Controller Default Options `pkg/operator/controller/gateway-labeler/controller.go`, `pkg/operator/controller/gateway-service-dns/controller.go`	Both controllers switch from direct controller construction to a two-step approach: create Options with reconciler, apply manager defaults via `DefaultFromConfig()`, then pass to `controller.NewUnmanaged()`. No functional changes to reconciliation logic.
GatewayClass Controller Infrastructure Integration `pkg/operator/controller/gatewayclass/controller.go`	Adds infrastructure topology awareness: imports configv1, introduces field indexer for GatewayClass controller names, watches cluster infrastructure config, creates dynamic index on first reconciliation, fetches infrastructure config during reconciliation, and gates index creation with synchronized startup. Extends Reconcile to handle infrastructure-driven logic.
GatewayClass Controller Test Expansion `pkg/operator/controller/gatewayclass/controller_test.go`	Expands test coverage with infraConfig helper, supports multiple topology modes (SingleReplica, HighlyAvailable), introduces FakeIndexer implementation with field indexing for GatewayClass spec.controllerName, and adds test cases for various gatewayclass configurations.
Istio Configuration Generation `pkg/operator/controller/gatewayclass/istio.go`	Updates `ensureIstio()` and `desiredIstio()` function signatures to accept infrastructure config and gatewayclass list; generates gateway class-specific HPA configuration and serializes to JSON; returns error from `desiredIstio()` for error handling.
E2E Test Adjustment `test/e2e/gateway_api_test.go`	Changes two test assertions to discard gateway return value instead of assigning to variable; only error result is used.

Sequence Diagram(s)

sequenceDiagram
    participant Client as GatewayClass Reconciler
    participant Cache as Field Index Cache
    participant API as Kubernetes API
    participant Infra as Infrastructure Config
    participant Istio as Istio Resource

    Client->>Client: Start reconciliation
    Client->>Cache: Check if index created
    alt Index not created
        Client->>Cache: Create index on spec.controllerName
    end
    Client->>API: Fetch infrastructure config
    API-->>Client: Return infrastructure topology
    Client->>API: List all GatewayClass objects
    API-->>Client: Return gatewayclass list
    Client->>Client: Generate HPA config per topology
    Client->>Client: Serialize gatewayclass config to JSON
    Client->>Istio: Create/update Istio with config
    Istio-->>Client: Confirm creation

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	Test assertions lack meaningful failure messages to aid diagnosis, and E2E changes discard return values that could mask bugs by preventing variable updates with freshly fetched data.	Add explicit assertion messages to all calls, structure setup using BeforeEach/AfterEach blocks, and reconsider E2E changes where return values are discarded to capture gateway objects or justify using stale references.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the pull request - enabling Horizontal Pod Autoscaling for GatewayClass controllers managing Istio resources.
Stable And Deterministic Test Names	✅ Passed	Test files use static test function names without dynamic information that changes between runs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Miciah · 2026-03-04T17:09:18Z

#1350 merged, so Istio.spec.values.gatewayClasses should be functional now.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

pkg/operator/controller/gatewayclass/istio.go (1)

111-116: Consider adding nil-safety for infraConfig parameter.

The function accesses infraConfig.Status.InfrastructureTopology directly at line 114. While the current call site in ensureIstio ensures a non-nil infraConfig is passed, adding a defensive check would prevent potential panics if this function is called from additional locations in the future.

🛡️ Proposed defensive check

 func desiredIstio(name types.NamespacedName, ownerRef metav1.OwnerReference, istioVersion string, enableInferenceExtension bool, infraConfig *configv1.Infrastructure, gatewayclasses []gatewayapiv1.GatewayClass) (*sailv1.Istio, error) {
 	var minReplicas = 2
 	const maxReplicas = 10
+	if infraConfig == nil {
+		return nil, fmt.Errorf("infraConfig must not be nil")
+	}
 	if infraConfig.Status.InfrastructureTopology == configv1.SingleReplicaTopologyMode {
 		minReplicas = 1
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/istio.go` around lines 111 - 116, The
function desiredIstio reads infraConfig.Status.InfrastructureTopology without
nil-checking; add a defensive nil check at the start of desiredIstio to handle a
nil infraConfig (e.g., treat as non-single-replica/default behavior or return an
error), so the access to infraConfig.Status is safe; update logic around
minReplicas (currently set to 2 then changed to 1 if InfrastructureTopology ==
SingleReplicaTopologyMode) to only inspect infraConfig when infraConfig != nil,
and ensure callers such as ensureIstio still work with the chosen default or
propagated error.

pkg/operator/controller/gatewayclass/controller_test.go (1)

134-142: JSON key ordering assumption may cause test flakiness.

The test helper generates JSON by iterating over the gatewayclasses slice in the order provided. The production code in istio.go uses json.Marshal on a map[string]any, which sorts keys alphabetically. The test at line 268 passes []string{"openshift-default", "openshift-internal"} which happens to be alphabetically sorted, so it works.

If a test case passes gatewayclasses in non-alphabetical order, the byte comparison would fail despite semantically equivalent JSON.

Consider using json.Marshal in the test helper for consistency, or add a JSON-aware comparison in the test assertions.

♻️ Suggested approach using json.Marshal for consistency

-	ret.Spec.Values.GatewayClasses = []byte(fmt.Sprintf(`{%s}`, strings.Join(func() []string {
-		var result []string
-
-		for _, name := range gatewayclasses {
-			result = append(result, fmt.Sprintf(`"%s":{"horizontalPodAutoscaler":{"spec":{"maxReplicas":10,"minReplicas":%d}}}`, name, minReplicas))
-		}
-
-		return result
-	}(), ",")))
+	gatewayclassConfig := map[string]any{
+		"horizontalPodAutoscaler": map[string]any{
+			"spec": map[string]any{
+				"minReplicas": minReplicas,
+				"maxReplicas": 10,
+			},
+		},
+	}
+	gatewayclassesConfig := map[string]any{}
+	for _, name := range gatewayclasses {
+		gatewayclassesConfig[name] = gatewayclassConfig
+	}
+	ret.Spec.Values.GatewayClasses, _ = json.Marshal(gatewayclassesConfig)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/controller_test.go` around lines 134 -
142, The test helper builds ret.Spec.Values.GatewayClasses by manually joining
strings from the gatewayclasses slice which assumes a stable key order and can
cause flakiness versus production istio.go that json.Marshal's a map; change the
helper to construct a map[string]any (keys = names from gatewayclasses, values =
the HPA spec/minReplicas structure) and call json.Marshal on that map to produce
GatewayClasses bytes (or alternatively, keep the helper but update the test
assertion to perform a JSON-aware comparison of ret.Spec.Values.GatewayClasses
using json.Unmarshal into a map and compare maps) so ordering differences no
longer break tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/controller/gatewayclass/controller.go`:
- Around line 263-279: The current use of
r.startGatewayclassControllerIndex.Do(...) prevents retries if
fieldIndexer.IndexField fails; change the pattern so the index is only marked
"done" after a successful IndexField call (e.g., replace sync.Once.Do with a
small protected boolean or atomic flag plus a mutex or atomic.CompareAndSwap):
run gatewayclassControllerIndexFn and call r.fieldIndexer.IndexField(...) and if
it returns nil set the success flag (e.g., r.gatewayclassIndexInitialized =
true) so future reconciliations skip creation, but if it returns an error do not
set the flag and return/append the error so the reconcile can retry; keep
references to gatewayclassControllerIndexFn,
gatewayclassControllerIndexFieldName and r.fieldIndexer.IndexField when
implementing this change.

---

Nitpick comments:
In `@pkg/operator/controller/gatewayclass/controller_test.go`:
- Around line 134-142: The test helper builds ret.Spec.Values.GatewayClasses by
manually joining strings from the gatewayclasses slice which assumes a stable
key order and can cause flakiness versus production istio.go that json.Marshal's
a map; change the helper to construct a map[string]any (keys = names from
gatewayclasses, values = the HPA spec/minReplicas structure) and call
json.Marshal on that map to produce GatewayClasses bytes (or alternatively, keep
the helper but update the test assertion to perform a JSON-aware comparison of
ret.Spec.Values.GatewayClasses using json.Unmarshal into a map and compare maps)
so ordering differences no longer break tests.

In `@pkg/operator/controller/gatewayclass/istio.go`:
- Around line 111-116: The function desiredIstio reads
infraConfig.Status.InfrastructureTopology without nil-checking; add a defensive
nil check at the start of desiredIstio to handle a nil infraConfig (e.g., treat
as non-single-replica/default behavior or return an error), so the access to
infraConfig.Status is safe; update logic around minReplicas (currently set to 2
then changed to 1 if InfrastructureTopology == SingleReplicaTopologyMode) to
only inspect infraConfig when infraConfig != nil, and ensure callers such as
ensureIstio still work with the chosen default or propagated error.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b377a103-d8f8-48fe-a7ec-6462842dcb0c

📥 Commits

Reviewing files that changed from the base of the PR and between 81c961d and 09b0c49.

📒 Files selected for processing (6)

pkg/operator/controller/gateway-labeler/controller.go
pkg/operator/controller/gateway-service-dns/controller.go
pkg/operator/controller/gatewayclass/controller.go
pkg/operator/controller/gatewayclass/controller_test.go
pkg/operator/controller/gatewayclass/istio.go
test/e2e/gateway_api_test.go

coderabbitai · 2026-03-04T17:10:42Z

pkg/operator/controller/gatewayclass/controller.go

+	// Create the Index gatewayclasses once on first reconciliation.  The
+	// index cannot be created in NewUnmanaged as the CRD might not exist
+	// when NewUnmanaged is called, and so we create the index here.
+	r.startGatewayclassControllerIndex.Do(func() {
+		gatewayclassControllerIndexFn := client.IndexerFunc(func(o client.Object) []string {
+			gatewayclass, ok := o.(*gatewayapiv1.GatewayClass)
+			if !ok {
+				return []string{}
+			}
+
+			return []string{string(gatewayclass.Spec.ControllerName)}
+		})
+		if err := r.fieldIndexer.IndexField(context.Background(), &gatewayapiv1.GatewayClass{}, gatewayclassControllerIndexFieldName, gatewayclassControllerIndexFn); err != nil {
+			log.Error(err, "failed to create index for gatewayclasses", "request", request)
+			errs = append(errs, err)
+		}
+	})


⚠️ Potential issue | 🟡 Minor

Index creation failure is not retriable due to sync.Once.

If IndexField fails transiently (e.g., temporary API server issue), the sync.Once prevents any retry. Subsequent reconciliations will consistently fail when ensureIstio attempts to list with the non-existent index.

While the error is logged and the reconciliation will fail with an error (surfacing the issue), the system cannot self-heal without operator restart.

Consider using a different synchronization pattern that allows retry on failure, such as tracking success state explicitly.

🔧 Alternative pattern allowing retry on failure

- // startGatewayclassControllerIndex ensures we create the gatewayclasses index at - // most once. - startGatewayclassControllerIndex sync.Once + // gatewayclassControllerIndexCreated tracks whether the index was successfully created. + gatewayclassControllerIndexCreated bool + gatewayclassControllerIndexMu sync.Mutex } // In Reconcile: - r.startGatewayclassControllerIndex.Do(func() { + r.gatewayclassControllerIndexMu.Lock() + if !r.gatewayclassControllerIndexCreated { gatewayclassControllerIndexFn := client.IndexerFunc(func(o client.Object) []string { gatewayclass, ok := o.(*gatewayapiv1.GatewayClass) if !ok { return []string{} } return []string{string(gatewayclass.Spec.ControllerName)} }) if err := r.fieldIndexer.IndexField(context.Background(), &gatewayapiv1.GatewayClass{}, gatewayclassControllerIndexFieldName, gatewayclassControllerIndexFn); err != nil { log.Error(err, "failed to create index for gatewayclasses", "request", request) errs = append(errs, err) + } else { + r.gatewayclassControllerIndexCreated = true } - }) + } + r.gatewayclassControllerIndexMu.Unlock()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/operator/controller/gatewayclass/controller.go` around lines 263 - 279, The current use of r.startGatewayclassControllerIndex.Do(...) prevents retries if fieldIndexer.IndexField fails; change the pattern so the index is only marked "done" after a successful IndexField call (e.g., replace sync.Once.Do with a small protected boolean or atomic flag plus a mutex or atomic.CompareAndSwap): run gatewayclassControllerIndexFn and call r.fieldIndexer.IndexField(...) and if it returns nil set the success flag (e.g., r.gatewayclassIndexInitialized = true) so future reconciliations skip creation, but if it returns an error do not set the flag and return/append the error so the reconcile can retry; keep references to gatewayclassControllerIndexFn, gatewayclassControllerIndexFieldName and r.fieldIndexer.IndexField when implementing this change.

This is a fair point. On the one hand, the current logic for initializing the index follows pattern we established with the logic for initializing the watch on istios. On the other hand, this seems simple enough to fix, and I expect the watch on istios will go away in a couple releases. I would like to allow the current CI run to finish, but I can update the logic after that.

@Miciah, sounds good! The suggested mutex+boolean pattern is straightforward to apply whenever you're ready. Since the startIstioWatch watch initialization has the same limitation, you may want to apply a similar retry-capable pattern there as well — though as you noted, that code is expected to be removed in a future release, so it may not be worth the effort.

openshift-ci · 2026-03-04T20:35:13Z

@Miciah: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-azure-operator	`09b0c49`	link	true	`/test e2e-azure-operator`
ci/prow/hypershift-e2e-aks	`09b0c49`	link	true	`/test hypershift-e2e-aks`
ci/prow/e2e-gcp-operator	`09b0c49`	link	true	`/test e2e-gcp-operator`
ci/prow/e2e-hypershift	`09b0c49`	link	true	`/test e2e-hypershift`
ci/prow/e2e-aws-ovn-serial-2of2	`09b0c49`	link	true	`/test e2e-aws-ovn-serial-2of2`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Miciah · 2026-03-05T03:43:09Z

The results from the last e2e-aws-operator run look better—replica sets for proxies are getting scaled to 2 replicas:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2029241388389896192/artifacts/e2e-aws-operator/gather-extra/artifacts/events.json' | jq -r '[.items.[]|select(.metadata.namespace=="openshift-ingress" and (.metadata.name|startswith("test-gateway")) and .source.component=="deployment-controller")]|sort_by(.firstTimestamp//.metadata.creationTimestamp)|.[]|.involvedObject.name+": "+.message'
test-gateway-openshift-default: Scaled up replica set test-gateway-openshift-default-6b647f479b from 0 to 1
test-gateway-openshift-default: Scaled up replica set test-gateway-openshift-default-6b647f479b from 1 to 2
test-gateway-update-openshift-default: Scaled up replica set test-gateway-update-openshift-default-7f4bb66dc4 from 0 to 1
test-gateway-update-openshift-default: Scaled up replica set test-gateway-update-openshift-default-7f4bb66dc4 from 1 to 2
test-gateway-infra-annotations-openshift-default: Scaled up replica set test-gateway-infra-annotations-openshift-default-7c5db78c9c from 0 to 1
test-gateway-infra-annotations-openshift-default: Scaled up replica set test-gateway-infra-annotations-openshift-default-7c5db78c9c from 1 to 2
test-gateway-internal-lb-openshift-default: Scaled up replica set test-gateway-internal-lb-openshift-default-6665d79d55 from 0 to 1
%

I'm not sure why the events only report scale-out to 1 replica for the "test-gateway-internal-lb-openshift-default" gateway.

There are some did not receive metrics for targeted pods (pods might be unready) errors; I don't know whether these are transient failures or something we should be concerned about:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2029241388389896192/artifacts/e2e-aws-operator/gather-extra/artifacts/events.json' | jq -r '[.items.[]|select(.metadata.namespace=="openshift-ingress" and .source.component=="horizontal-pod-autoscaler")]|sort_by(.firstTimestamp//.metadata.creationTimestamp)|.[]|.involvedObject.name+": "+.message'
istiod-openshift-gateway: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
test-gateway-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
istiod-openshift-gateway: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-openshift-default: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
test-gateway-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
test-gateway-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
istiod-openshift-gateway: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
gw1-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
gw2-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
gw3-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
gw4-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
gw4-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
gw4-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
gw3-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
gw3-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-update-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
test-gateway-update-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-update-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-nohost-gateway-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
test-nohost-gateway-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-nohost-gateway-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-infra-annotations-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
%

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2025

rikatz reviewed Dec 17, 2025

View reviewed changes

Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from 1ea51fc to f407b29 Compare December 22, 2025 03:20

Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from f407b29 to 3e6e8c5 Compare December 22, 2025 14:28

Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from fad49ff to c566d6a Compare December 23, 2025 02:45

Miciah marked this pull request as ready for review March 4, 2026 17:03

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2026

openshift-ci bot requested review from Thealisyed and alebedev87 March 4, 2026 17:04

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

Miciah mentioned this pull request Mar 13, 2026

OCPBUGS-78523: gatewayapi_controller: Replace sync.Once with retry for GatewayClass field indexer setup #1382

Open

Conversation

Miciah commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Dec 16, 2025

Uh oh!

Miciah commented Dec 16, 2025

Uh oh!

openshift-ci bot commented Dec 16, 2025

Uh oh!

Miciah commented Dec 17, 2025

Uh oh!

rikatz Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

rikatz Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

rikatz commented Dec 17, 2025

Uh oh!

rikatz commented Dec 17, 2025

Uh oh!

Miciah commented Dec 22, 2025

Uh oh!

Miciah commented Dec 22, 2025

Uh oh!

Miciah commented Dec 22, 2025

Uh oh!

Miciah commented Dec 23, 2025

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

Miciah commented Mar 4, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Miciah Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Mar 4, 2026

Uh oh!

Miciah commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Miciah commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

coderabbitai bot Mar 4, 2026 •

edited

Loading