Skip to content

OCPBUGS-77056: Make external cert validation asynchronous#745

Open
bentito wants to merge 8 commits intoopenshift:masterfrom
bentito:OCPBUGS-77056-async-sar
Open

OCPBUGS-77056: Make external cert validation asynchronous#745
bentito wants to merge 8 commits intoopenshift:masterfrom
bentito:OCPBUGS-77056-async-sar

Conversation

@bentito
Copy link
Contributor

@bentito bentito commented Mar 3, 2026

This PR implements asynchronous and cached SubjectAccessReview checks for spec.tls.externalCertificate during router startup.

Previously, each external certificate triggered synchronous API validations that blocked the main thread. When scaling to thousands of external cert routes, the router hit O(N * API_latency) delays and would hit CrashLoopBackOff from failing liveness probes.

This fix maintains full RBAC security checks but executes them asynchronously in the background. A global sync.Map acts as an asyncSARCache to drastically speed up repeated checks for the same underlying secret.

Fixes: OCPBUGS-77056

Summary by CodeRabbit

  • New Features

    • External certificate validation is now asynchronous, cached and rate‑limited; routes auto re‑evaluate when checks complete and emit SAR-related status updates (SAR completed, secret loaded).
  • Tests

    • Tests updated for concurrency, async SAR completion, cache clearing, and adjusted expected status/update/rejection sequences to reflect additional SAR-related events.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 3, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Mar 3, 2026
@openshift-ci-robot
Copy link
Contributor

@bentito: This pull request references Jira Issue OCPBUGS-77056, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This PR implements asynchronous and cached SubjectAccessReview checks for spec.tls.externalCertificate during router startup.

Previously, each external certificate triggered synchronous API validations that blocked the main thread. When scaling to thousands of external cert routes, the router hit O(N * API_latency) delays and would hit CrashLoopBackOff from failing liveness probes.

This fix maintains full RBAC security checks but executes them asynchronously in the background. A global sync.Map acts as an asyncSARCache to drastically speed up repeated checks for the same underlying secret.

Fixes: OCPBUGS-77056

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2026
@openshift-ci openshift-ci bot requested a review from lihongan March 3, 2026 23:38
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign grzpiotrowski for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bentito bentito changed the title WIP: OCPBUGS-77056: Make external cert validation asynchronous OCPBUGS-77056: Make external cert validation asynchronous Mar 3, 2026
@bentito bentito force-pushed the OCPBUGS-77056-async-sar branch from 325f313 to 214b603 Compare March 3, 2026 23:41
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2026
@bentito bentito marked this pull request as ready for review March 3, 2026 23:41
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 3, 2026
@bentito
Copy link
Contributor Author

bentito commented Mar 3, 2026

/hold

@openshift-ci openshift-ci bot requested review from Miciah and grzpiotrowski March 3, 2026 23:42
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds asynchronous, per-secret SAR validation and caching for external TLS certificates, two new status reasons (SARCompleted, SecretLoaded), secret-event driven cache invalidation and SAR-completion emission, and test updates for concurrency and async completion.

Changes

Cohort / File(s) Summary
Status Reason Constants
pkg/router/controller/contention.go
Added ExtCrtStatusReasonSARCompleted and ExtCrtStatusReasonSecretLoaded; included them in ignoreIngressConditionReason.
Route Secret Manager
pkg/router/controller/route_secret_manager.go
Invalidate async SAR cache on secret add/update/delete; pass an onComplete callback into TLS external cert validation; handle async SAR completion by fetching the latest Route and emitting SARCompleted updates or rejections.
Validation API & Async Cache
pkg/router/routeapihelpers/validation.go
Introduced asyncSARCache and asyncSARResult with per-entry mutex and callbacks; added InvalidateAsyncSARCache and ClearAsyncSARCacheForTest; added MaxConcurrentSecretSyncs semaphore; refactored ValidateTLSExternalCertificate to support onComplete, return pending results, register callbacks, and spawn a single async validation goroutine per secret.
Tests & Test Helpers
pkg/router/controller/route_secret_manager_test.go, pkg/router/routeapihelpers/validation_test.go
Made statusRecorder concurrency-safe (Mutex, GetRejections, GetUpdates); added locking around recorder mutations; updated tests to wait for async SAR completion, clear async cache between rounds, and adjust expected status updates/rejections to include SecretLoaded/SARCompleted events and changed validation error expectations.

Sequence Diagram

sequenceDiagram
    participant SM as RouteSecretManager
    participant VS as ValidationModule
    participant Cache as AsyncSARCache
    participant SAR as SARClient
    participant Secret as SecretsGetter
    participant Status as StatusRecorder
    participant Callback as onComplete\n(namespace, name)

    SM->>VS: ValidateTLSExternalCertificate(route, fldPath, sarc, secretsGetter, Callback)
    VS->>VS: Check ExternalCertificate presence
    alt has ExternalCertificate
        VS->>Cache: Lookup (namespace/secretName)
        alt cache miss or not done
            VS->>Cache: Insert pending asyncSARResult (done=false)
            VS->>VS: Spawn async goroutine
            rect rgba(100,150,200,0.5)
                VS->>SAR: Perform SubjectAccessReview(s)
                VS->>Secret: Fetch secret
                VS->>VS: Validate keypair (tls.X509KeyPair)
                VS->>Cache: Update asyncSARResult (errs, done=true)
                VS->>Callback: Invoke(namespace, secretName)
            end
            Callback->>SM: Trigger route re-evaluation
            SM->>Status: Record RouteUpdate or RouteRejection (SARCompleted / SecretLoaded)
        else cache hit & done
            VS->>Callback: Invoke immediately (or return cached errs)
        end
    else no ExternalCertificate
        VS->>VS: Return early (no async work)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: making external certificate validation asynchronous, which is the core focus across all modified files (validation.go, route_secret_manager.go, and supporting test files).
Stable And Deterministic Test Names ✅ Passed The pull request modifies standard Go test files using the testing package, not Ginkgo BDD-style tests. No Ginkgo test definitions exist in the modified files, so the custom check targeting Ginkgo test names is not applicable and passes.
Test Structure And Quality ✅ Passed Test files use standard Go table-driven testing with proper mocking, fixtures, and meaningful assertions following Go testing conventions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/router/controller/route_secret_manager.go`:
- Around line 288-298: The current Add handler unconditionally calls
RecordRouteRejection when a secret is replayed; change it to mirror the
UpdateFunc/SAR-completion logic: check the route's admission (via
populateRouteTLSFromSecret result or route.Status.Conditions) and if the route
is already admitted emit a normal update event instead of calling
RecordRouteRejection, otherwise record the rejection; extract the string
"ExternalCertificateSecretLoaded" into a shared constant (e.g.,
ExtCrtStatusReasonSecretLoaded) and add that constant to the
ignoreIngressConditionReason set in contention.go so the admission-ignoring
logic treats this replay reason as benign. Ensure you update references to
RecordRouteRejection, populateRouteTLSFromSecret, UpdateFunc,
ExternalCertificateSecretLoaded, and ignoreIngressConditionReason accordingly.

In `@pkg/router/routeapihelpers/validation.go`:
- Around line 503-556: The async SAR flow stores only a single pending error
(pendingErr) in asyncSARCache for a cacheKey, so subsequent calls (in validate
flow) that hit the cache drop their onComplete callback and never get
re-enqueued; change the cache value to include a list of waiting callbacks (or a
small struct with errs + []onComplete callbacks) keyed by asyncSARCache so that
when you detect a pending entry you append the current onComplete for
route.Namespace/secretName, and in the goroutine after
asyncSARCache.Store(cacheKey, errs) iterate and invoke all stored callbacks (not
just the first) and then replace the cache entry with errs only; update code
paths that read the cache (the cache-hit branch and the goroutine completion) to
handle this new struct and ensure onComplete is invoked for every waiting route.
- Around line 485-490: The global asyncSARCache currently stores completed
validation results forever; change it to avoid permanent sticky entries by (a)
replacing the raw sync.Map value with a small struct that includes the cached
field.ErrorList plus an expiration timestamp or a source type flag, (b) only
writing non-transient successful validation results (or results with a short
TTL) into asyncSARCache, and (c) adding explicit invalidation helpers (e.g.,
InvalidateAsyncSARCache(namespace, secretName) and
InvalidateAllAsyncSARCacheForSecret(namespace, secretName)) and call them from
the secret add/update/delete/recreate paths in the secret manager
(route_secret_manager.go) so secret changes force revalidation; also update
ClearAsyncSARCacheForTest to clear the new structure. Ensure references to
asyncSARCache and ClearAsyncSARCacheForTest (and the code paths mentioned around
the 503-556 region) are updated to use the new value type and invalidation APIs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e07f375c-945a-4e58-b5ec-3347eaff109c

📥 Commits

Reviewing files that changed from the base of the PR and between d8ed355 and 093ad83.

📒 Files selected for processing (5)
  • pkg/router/controller/contention.go
  • pkg/router/controller/route_secret_manager.go
  • pkg/router/controller/route_secret_manager_test.go
  • pkg/router/routeapihelpers/validation.go
  • pkg/router/routeapihelpers/validation_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
pkg/router/routeapihelpers/validation.go (1)

485-497: ⚠️ Potential issue | 🟠 Major

Completed SAR entries are still permanent.

Only secret add/update/delete clears this cache. A finished allow/deny result for namespace/secretName survives route deletion, later RBAC grants/revocations, and transient SAR/API failures, so a new route that reuses the same secret name can inherit a stale decision without performing a fresh check. Please put completed entries behind a TTL/generation and avoid caching transient internal errors indefinitely.

Also applies to: 593-605

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/routeapihelpers/validation.go` around lines 485 - 497, The cache
currently stores completed asyncSARResult entries forever; update asyncSARResult
(and uses of asyncSARCache) to include a timestamp or generation counter and
enforce a TTL/generation check before returning cached results so that completed
allow/deny decisions expire after a short interval or when generation changes.
Ensure only stable allow/deny outcomes are cached long-term; if
asyncSARResult.errs contains transient/internal errors, set a much shorter TTL
(or do not mark done) so they trigger fresh SARs. Update the cache read path to
validate timestamp/generation and the write path to record it, and keep
InvalidateAsyncSARCache(namespace, secretName) as-is to support manual
invalidation.
pkg/router/controller/route_secret_manager.go (1)

262-304: ⚠️ Potential issue | 🟠 Major

Don't clear the fresh SAR result on the initial SecretLoaded replay.

RegisterRoute() is only reached after validate() has already completed successfully, so this non-deleted AddFunc path is replaying an already-validated secret. Clearing asyncSARCache here throws away that fresh result, defeats the per-secret cache for shared secrets, and can bounce the route back into a second "authorization check pending" / ExternalCertificateValidationFailed cycle on the SecretLoaded requeue. Keep invalidation on actual secret changes (recreated/updated/deleted), but not on the initial load replay.

Suggested shape
 		AddFunc: func(obj interface{}) {
 			secret := obj.(*kapi.Secret)
 			log.V(4).Info("Secret added for route", "namespace", namespace, "secret", secret.Name, "route", routeName)
-			routeapihelpers.InvalidateAsyncSARCache(namespace, secret.Name)
 
 			// Secret re-creation scenario
 			// Check if the route key exists in the deletedSecrets map, indicating that the secret was previously deleted for this route.
 			// If it exists, it means the secret is being recreated. Remove the key from the map and proceed with handling the route.
@@
 			key := generateKey(namespace, routeName)
 			if _, deleted := p.deletedSecrets.LoadAndDelete(key); deleted {
+				routeapihelpers.InvalidateAsyncSARCache(namespace, secret.Name)
 				log.V(4).Info("Secret recreated for route", "namespace", namespace, "secret", secret.Name, "route", routeName)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/controller/route_secret_manager.go` around lines 262 - 304, The
AddFunc for secret events currently calls
routeapihelpers.InvalidateAsyncSARCache unconditionally, which discards fresh
per-secret async SAR results during the initial cache-sync replay; to fix,
remove the unconditional call to routeapihelpers.InvalidateAsyncSARCache from
the top of the AddFunc in RegisterRoute's secret handler and only invoke
routeapihelpers.InvalidateAsyncSARCache when the secret is actually
changed/recreated/deleted (e.g., inside the p.deletedSecrets.LoadAndDelete
"recreated" branch and in the Update/Delete handlers), using
generateKey(namespace, routeName) and p.deletedSecrets to determine real
recreation so you don't invalidate the SAR cache on the initial load.
🧹 Nitpick comments (1)
pkg/router/controller/route_secret_manager_test.go (1)

1355-1372: Put a timeout around these async waits.

These bare channel receives will hang the whole suite on any missed informer callback. Now that the tests intentionally depend on async completion, a tiny helper with select + time.After will turn those deadlocks into normal test failures.

Suggested helper
+func waitForRecorderEvent(t *testing.T, ch <-chan struct{}) {
+	t.Helper()
+	select {
+	case <-ch:
+	case <-time.After(2 * time.Second):
+		t.Fatal("timeout waiting for recorder event")
+	}
+}

Then replace each bare receive with:

-<-recorder.doneCh
+waitForRecorderEvent(t, recorder.doneCh)

Also applies to: 1434-1444, 1496-1516

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/controller/route_secret_manager_test.go` around lines 1355 - 1372,
Replace bare channel receives on recorder.doneCh with a timed wait helper to
avoid test hangs: add a small helper (e.g., waitForDone(ctxTimeout, ch) or
waitWithTimeout(t, ch, duration)) that uses select with the channel and
time.After to fail the test on timeout, then update every occurrence of
"<-recorder.doneCh" in route_secret_manager_test.go (including the occurrences
around the Secret Update flow that use kubeClient.CoreV1().Secrets(...).Update
and the initial Add wait) to call that helper instead so missed informer
callbacks produce a deterministic test failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/router/controller/route_secret_manager.go`:
- Around line 262-304: The AddFunc for secret events currently calls
routeapihelpers.InvalidateAsyncSARCache unconditionally, which discards fresh
per-secret async SAR results during the initial cache-sync replay; to fix,
remove the unconditional call to routeapihelpers.InvalidateAsyncSARCache from
the top of the AddFunc in RegisterRoute's secret handler and only invoke
routeapihelpers.InvalidateAsyncSARCache when the secret is actually
changed/recreated/deleted (e.g., inside the p.deletedSecrets.LoadAndDelete
"recreated" branch and in the Update/Delete handlers), using
generateKey(namespace, routeName) and p.deletedSecrets to determine real
recreation so you don't invalidate the SAR cache on the initial load.

In `@pkg/router/routeapihelpers/validation.go`:
- Around line 485-497: The cache currently stores completed asyncSARResult
entries forever; update asyncSARResult (and uses of asyncSARCache) to include a
timestamp or generation counter and enforce a TTL/generation check before
returning cached results so that completed allow/deny decisions expire after a
short interval or when generation changes. Ensure only stable allow/deny
outcomes are cached long-term; if asyncSARResult.errs contains
transient/internal errors, set a much shorter TTL (or do not mark done) so they
trigger fresh SARs. Update the cache read path to validate timestamp/generation
and the write path to record it, and keep InvalidateAsyncSARCache(namespace,
secretName) as-is to support manual invalidation.

---

Nitpick comments:
In `@pkg/router/controller/route_secret_manager_test.go`:
- Around line 1355-1372: Replace bare channel receives on recorder.doneCh with a
timed wait helper to avoid test hangs: add a small helper (e.g.,
waitForDone(ctxTimeout, ch) or waitWithTimeout(t, ch, duration)) that uses
select with the channel and time.After to fail the test on timeout, then update
every occurrence of "<-recorder.doneCh" in route_secret_manager_test.go
(including the occurrences around the Secret Update flow that use
kubeClient.CoreV1().Secrets(...).Update and the initial Add wait) to call that
helper instead so missed informer callbacks produce a deterministic test
failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c800f527-b5e1-4386-b6cf-e20eb4726a31

📥 Commits

Reviewing files that changed from the base of the PR and between 093ad83 and 8c6f16a.

📒 Files selected for processing (4)
  • pkg/router/controller/contention.go
  • pkg/router/controller/route_secret_manager.go
  • pkg/router/controller/route_secret_manager_test.go
  • pkg/router/routeapihelpers/validation.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/router/controller/contention.go

@lihongan
Copy link
Contributor

/retest-required

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 561-568: The semaphore (asyncSARSemaphore) is being acquired on
the caller path before launching the goroutine, causing the caller to block when
the limit is reached; move the acquire into the goroutine so the caller can
spawn all per-secret goroutines without blocking, then release the token inside
that goroutine (use defer to ensure release). Concretely: start the goroutine
immediately, perform asyncSARSemaphore <- struct{}{} at the top of the goroutine
body, and keep the existing defer func() { <-asyncSARSemaphore }() to guarantee
release; ensure any early returns in the goroutine still release the token.
- Around line 532-535: The current guard that returns nil when sarc or
secretsGetter is nil hides wiring bugs and skips RBAC/secret validation;
instead, change the conditional in validation.go so that if sarc == nil ||
secretsGetter == nil you return an internal error (e.g., fmt.Errorf or the
package's internal/server error type) describing "missing validation clients"
rather than nil, or alternatively gate this behind a test-only seam/flag so
production never silently succeeds; update callers to propagate/handle the
returned error as needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 31462f9b-48bb-466f-8d90-5c399b5fab8b

📥 Commits

Reviewing files that changed from the base of the PR and between 8c6f16a and 871f184.

📒 Files selected for processing (1)
  • pkg/router/routeapihelpers/validation.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 526-529: The code currently returns a synthetic pending error on
cache miss without registering a callback if onComplete is nil; change the logic
in the validation path (the block that manipulates cached.callbacks and returns
cached.errs) so that onComplete must never be nil: either (A) make onComplete a
required parameter at the API boundary, or (B) when onComplete == nil execute
the async validation synchronously (drive the same validation routine on the
current goroutine, wait for completion, populate cached.errs, and then return
the real errors) instead of just returning the pending placeholder; if you
choose (B) ensure you reuse the same validation function that enqueues callbacks
(so cached state is updated) and that cached.callbacks is appended only when
onComplete is non-nil or after synchronous completion you return the finalized
errors.
- Around line 572-599: Wrap each SAR and secret GET with a per-request timeout
context (use context.WithTimeout and defer cancel) instead of context.TODO();
for the SAR checks stop using authorizationutil.Authorize and call the SAR
client Create() directly (e.g., sarClient.Create) with that timeout context so
you can inspect both (response, err) separately, treat API/transport errors as
retryable/internal (append field.InternalError or a distinct cached error) and
only append field.Forbidden when the SAR response explicitly denies, and call
secretsGetter.Secrets(route.Namespace).Get with the same bounded context and map
Get errors to NotFound vs InternalError accordingly. Ensure you still reference
routerServiceAccount, secretName, fldPath and preserve existing
field.Forbidden/field.InternalError/field.NotFound usages when classifying the
outcome.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bbebc97e-5cff-40e0-80d7-bff7bf16a68c

📥 Commits

Reviewing files that changed from the base of the PR and between 871f184 and c2f6381.

📒 Files selected for processing (1)
  • pkg/router/routeapihelpers/validation.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/router/routeapihelpers/validation.go (1)

637-639: Consider recovering from callback panics to ensure all callbacks are invoked.

If a callback panics, the remaining callbacks in the slice won't be invoked. This is unlikely but could leave some routes stuck in a pending state if one route's callback handler has a bug.

🛡️ Proposed defensive callback invocation
 		for _, cb := range callbacks {
-			cb(route.Namespace, secretName)
+			func() {
+				defer func() {
+					if r := recover(); r != nil {
+						// Log panic but continue invoking remaining callbacks
+					}
+				}()
+				cb(route.Namespace, secretName)
+			}()
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/routeapihelpers/validation.go` around lines 637 - 639, Wrap each
callback invocation in a deferred recover to prevent a single panic from
aborting the loop: inside the loop over callbacks (callbacks, cb) call each cb
via a small closure that uses defer + recover to catch any panic, log or report
the panic along with the callback context (e.g., route.Namespace and secretName)
and continue to the next cb; ensure the recovered panic does not re-panic so all
callbacks are invoked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/router/routeapihelpers/validation.go`:
- Around line 637-639: Wrap each callback invocation in a deferred recover to
prevent a single panic from aborting the loop: inside the loop over callbacks
(callbacks, cb) call each cb via a small closure that uses defer + recover to
catch any panic, log or report the panic along with the callback context (e.g.,
route.Namespace and secretName) and continue to the next cb; ensure the
recovered panic does not re-panic so all callbacks are invoked.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9c78ea5-dba5-483e-8d75-13dc6f9905d7

📥 Commits

Reviewing files that changed from the base of the PR and between c2f6381 and dc6b383.

📒 Files selected for processing (1)
  • pkg/router/routeapihelpers/validation.go

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2026
bentito added 5 commits March 17, 2026 16:14
Include a concurrency governor (semaphore) to limit the number of
simultaneous SubjectAccessReview checks and secret syncs to 50.
This prevents hitting the API server with thousands of concurrent
requests during router startup.
…eouts

- Enforce synchronous execution if onComplete is nil
- Add 10s timeout to SAR and Secret GET requests
- Call SAR client directly instead of using authorizationutil.Authorize
  to distinguish between internal errors and explicit denials
- Handle DeletedFinalStateUnknown tombstones in secret manager to prevent panics during namespace deletion
- Update library-go dependency to include async secret registration fix (library-go PR #2132)
- Use /tmp for fake haproxy unix sockets to avoid macOS path length limits in tests
@bentito bentito force-pushed the OCPBUGS-77056-async-sar branch from 716e042 to 0a99ee0 Compare March 17, 2026 20:15
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 17, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@bentito: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic 0a99ee0 link true /test e2e-agnostic

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants