Skip to content

x509 otel certificate refresh events in cert_refresher library.#3146

Merged
havetisyan merged 6 commits intoAthenZ:masterfrom
balamanova:ATHENS-8722-x509_otel
Dec 19, 2025
Merged

x509 otel certificate refresh events in cert_refresher library.#3146
havetisyan merged 6 commits intoAthenZ:masterfrom
balamanova:ATHENS-8722-x509_otel

Conversation

@balamanova
Copy link
Copy Markdown
Contributor

@balamanova balamanova commented Dec 9, 2025

Description

Added OpenTelemetry metrics for X.509 certificate refresh events in cert_refresher library.

Implementation follows SIA OTel pattern

This implementation mirrors the existing SIA (Go) OTel metrics pattern from libs/go/sia/otel/metricset.go:

Metrics (aligned with SIA)

This PR (Java) SIA (Go)
athenz_cert_refresher.refresh.result_total{function,result} sia.agent_command.result_total{function,result}
athenz_cert_refresher.service_cert.validity.remaining_secs{name} sia.service_cert.validity.remaining_secs{cname}
athenz_cert_refresher.refresh.result_last_timestamp{function,result} New - tracks when context was last updated

Attributes (same as SIA)

  • function - identifies the operation (e.g., "cert_refresh")
  • result - "success" or "failure" (same values as SIA)
  • name - certificate subject name (similar to SIA's cname)

Configuration

  • Disable with: -Dathenz.cert_refresher.otel_disabled=true
  • Component name: -Dathenz.cert_refresher.otel_component_name=my-service

Contribution Checklist:

  • The pull request does not introduce any breaking changes
  • I have read the contribution guidelines.
  • Create an issue and link to the pull request.

Attach Screenshots (Optional)

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @balamanova, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the cert_refresher library by integrating OpenTelemetry for comprehensive monitoring of X.509 certificate refresh operations. The changes introduce new metrics to track the outcomes of certificate reloads (success or failure) and provide real-time insights into the remaining validity and expiry timestamps of the currently loaded certificates, thereby improving the operational visibility and reliability of certificate management within Athenz.

Highlights

  • OpenTelemetry Integration: Introduces OpenTelemetry (OTel) metrics to the cert_refresher library for improved observability of certificate management.
  • New Metric Emitter: Adds OpenTelemetryCertReloadEventEmitter to manage and emit various certificate refresh and validity metrics.
  • Configurable Metrics: Allows disabling OTel metrics via a system property (athenz.cert_refresher.otel_disabled) and configuring the component name.
  • Enhanced Certificate Reload Logic: Modifies KeyRefresher to record successful/failed certificate reloads and export certificate validity metrics using the new OTel emitter.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OpenTelemetry metrics for X.509 certificate refresh events by adding a new OpenTelemetryCertReloadEventEmitter class and integrating it into the KeyRefresher. The implementation is solid, adding valuable observability. My review includes several suggestions for the new OpenTelemetryCertReloadEventEmitter class to improve metric consistency, code clarity, and the utility of the emitted metrics. These changes will make the new telemetry data more robust and easier to consume.

@balamanova balamanova force-pushed the ATHENS-8722-x509_otel branch 2 times, most recently from bc60e29 to c9e32ba Compare December 9, 2025 17:46
ATHENS-8722 adding cert refresh metrics

Signed-off-by: abalamanova <assem.balamanova@yahooinc.com>
@balamanova balamanova force-pushed the ATHENS-8722-x509_otel branch from c9e32ba to 522931a Compare December 9, 2025 17:46
@balamanova balamanova requested a review from psasidhar December 11, 2025 18:07
.build();

refreshResultCounter.add(1, attrs);
resultLastTimestampGauge.set(timestamp, attrs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to send timestamp explicity, won't that be automatically be available with refreshResultCounter?

Copy link
Copy Markdown
Contributor Author

@balamanova balamanova Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using OTel Metrics API (counter + gauge), not OTel Events/Logs API. Prometheus is a pull-based time-series database.

Prometheus only stores scrape timestamps, not event timestamps. I can detect that a refresh happened using increase(), but the timestamp is approximate (within scrape interval).

With this timestamp I save exact timestamp when the refresh occurred

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to generate a timestamp attribute. the metric already tells you that there is a failure in the given time period and the exact timestamp is not really needed and will not be used.

@balamanova balamanova changed the title Athens 8722 x509 otel x509 otel certificate refresh events in cert_refresher library. Dec 12, 2025
…tMetric

Signed-off-by: abalamanova <assem.balamanova@yahooinc.com>
@balamanova balamanova force-pushed the ATHENS-8722-x509_otel branch from 9b54ae6 to 8adce7d Compare December 12, 2025 20:38
.build();

refreshResultCounter.add(1, attrs);
resultLastTimestampGauge.set(timestamp, attrs);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to generate a timestamp attribute. the metric already tells you that there is a failure in the given time period and the exact timestamp is not really needed and will not be used.

Signed-off-by: abalamanova <assem.balamanova@yahooinc.com>
Signed-off-by: abalamanova <assem.balamanova@yahooinc.com>
Signed-off-by: abalamanova <assem.balamanova@yahooinc.com>
@balamanova balamanova force-pushed the ATHENS-8722-x509_otel branch from 5b4d5d8 to 920430e Compare December 18, 2025 22:58
Signed-off-by: abalamanova <assem.balamanova@yahooinc.com>
@havetisyan havetisyan merged commit d546cf0 into AthenZ:master Dec 19, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants