Skip to content

Add auth 401 recovery observability to client bug reports#14611

Merged
ccy-oai merged 14 commits intomainfrom
ccy/codex-cli-auth-observability
Mar 14, 2026
Merged

Add auth 401 recovery observability to client bug reports#14611
ccy-oai merged 14 commits intomainfrom
ccy/codex-cli-auth-observability

Conversation

@ccy-oai
Copy link
Copy Markdown
Collaborator

@ccy-oai ccy-oai commented Mar 13, 2026

CXC-392

With 401
401 auth tags in Sentry

- auth_401_*: preserved facts from the last unauthorized response we cared about
- auth_*: latest auth-related request facts for the uploaded session when they are not intentionally suppressed to preserve 401 context
- auth_recovery_*: what the client did after unauthorized, plus whether the follow-up succeeded

Without 401
happy-path auth tags in Sentry

Summary
  • Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request correlation ids.
  • Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior.
  • Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry.
  • Preserve original unauthorized evidence under auth_401_* while keeping follow-up result tags separate.
Rationale (from spec findings)
  • The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth.
  • Client uploads needed to show whether unauthorized recovery ran and what the client tried next.
  • Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate.
  • The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful.
Scope
  • Add auth 401 and unauthorized-recovery observability in codex-rs/core, codex-rs/codex-api, and codex-rs/otel, including Sentry feedback tag surfacing.
  • Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR.
Trade-offs
  • This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies.
  • This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects.
  • Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket.
Client follow-up
  • PR 2 will add misroute/provider and geo-denial observability plus the matching Sentry surfacing.
  • A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence.
  • device_id remains intentionally separate until there is a safe existing source on the feedback upload path.
Testing
  • cargo test -p codex-core emit_feedback_request_tags_
  • cargo test -p codex-core emit_feedback_auth_recovery_tags_
  • cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase
  • cargo test -p codex-core websocket_session_tracks_connection_reuse
  • cargo test -p codex-core extract_response_debug_context_decodes_identity_headers
  • cargo test -p codex-core identity_auth_details
  • cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability
  • cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability
  • cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability

@ccy-oai ccy-oai requested a review from etraut-openai March 13, 2026 17:13
@ccy-oai ccy-oai force-pushed the ccy/codex-cli-auth-observability branch 3 times, most recently from a8d6e5e to ca2ad71 Compare March 13, 2026 18:58
@ccy-oai ccy-oai changed the title Add client auth observability signals Add auth 401 recovery observability to client bug reports Mar 13, 2026
ccy-oai and others added 5 commits March 13, 2026 14:04
Add client-visible auth observability for 401 recovery, endpoint classification, and geo-denial diagnosis without changing auth behavior.

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
@ccy-oai ccy-oai force-pushed the ccy/codex-cli-auth-observability branch from 7407e72 to 0dca1e8 Compare March 13, 2026 21:22
ccy-oai and others added 3 commits March 13, 2026 14:34
Keep auth_401_* as the preserved original unauthorized context while still recording the follow-up request's own ids and auth classification in auth_* fields.

Co-authored-by: Codex <noreply@openai.com>
Update auth_401 feedback tags as a complete latest-401 snapshot so missing fields from a later unauthorized response clear older session metadata.

Co-authored-by: Codex <noreply@openai.com>
Keep the latest unauthorized feedback fields together as one snapshot in core, then flatten them only at the Sentry feedback tag boundary.

Co-authored-by: Codex <noreply@openai.com>
@ccy-oai ccy-oai marked this pull request as ready for review March 13, 2026 23:07
ccy-oai and others added 2 commits March 13, 2026 16:29
Keep the collab tool spec test aligned with current main by including the wait tool in the expected set.

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
ccy-oai and others added 2 commits March 13, 2026 18:56
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Codex <noreply@openai.com>
@etraut-openai
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a6eedfda9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

error_message.as_deref(),
connection_reused,
);
crate::feedback_tags!(auth_connection_reused = connection_reused);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit complete feedback tags for websocket requests

ApiTelemetry::on_ws_request only writes auth_connection_reused. On reused sockets, connect_websocket is not called, so endpoint and other auth fields are left at whatever previous request wrote (e.g. /models or /responses/compact). This can misclassify websocket bug-report tags and hide the true failing route.

Useful? React with 👍 / 👎.

Comment on lines 400 to +405
let api_provider = self.provider.to_api_provider(auth_mode)?;
let api_auth = auth_provider_from_auth(auth.clone(), &self.provider)?;
let transport = ReqwestTransport::new(build_reqwest_client());
let client = ModelsClient::new(transport, api_provider, api_auth);
let request_telemetry: Arc<dyn RequestTelemetry> = Arc::new(ModelsRequestTelemetry {
auth_mode: auth_mode.map(|mode| TelemetryAuthMode::from(mode).to_string()),
auth_header_attached: api_auth.auth_header_attached(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive /models auth.mode from actual request auth

fetch_and_update_models sets telemetry auth_mode from self.auth_manager.auth_mode(), but the actual auth used is api_auth from auth_provider_from_auth(...). When provider API-key auth is active without matching auth-manager mode, /models telemetry/feedback can report empty or wrong auth.mode despite an attached auth header.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@etraut-openai etraut-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. The telemetry additions to the auth module, in particular, are going to come in handy for not only the 401 issue but also some of the token refresh bugs we've been playing whack-a-mole with for a while.

Before committing, there are some clippy errors to fix. Plus, there are some codex code review errors that look like legit (but perhaps edge-case) problems. I recommend running "codex review" a few more times to see if it finds anything more.

Thanks for doing this!

@ccy-oai
Copy link
Copy Markdown
Collaborator Author

ccy-oai commented Mar 14, 2026

Overall, looks good. The telemetry additions to the auth module, in particular, are going to come in handy for not only the 401 issue but also some of the token refresh bugs we've been playing whack-a-mole with for a while.

Before committing, there are some clippy errors to fix. Plus, there are some codex code review errors that look like legit (but perhaps edge-case) problems. I recommend running "codex review" a few more times to see if it finds anything more.

Thanks for doing this!

Fixed the Codex Review and Clippy. Thank you for the review, Eric!

@ccy-oai ccy-oai merged commit d692b74 into main Mar 14, 2026
32 checks passed
@ccy-oai ccy-oai deleted the ccy/codex-cli-auth-observability branch March 14, 2026 22:38
@github-actions github-actions bot locked and limited conversation to collaborators Mar 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants