[WIP] Local OTel Buffer & /perf command router (GSoC Idea 5 PoC)#21316
[WIP] Local OTel Buffer & /perf command router (GSoC Idea 5 PoC)#21316anthonychen000 wants to merge 9 commits intogoogle-gemini:mainfrom
Conversation
Wires up the new /perf slash command to the core OpenTelemetry localBuffer. Bypasses the global React SessionContext in favor of a point-in-time snapshot to prevent UI freezing during heavy P99 latency calculations. Resolves monorepo type exports.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a proof-of-concept for a performance monitoring and optimization dashboard. It establishes an in-memory OpenTelemetry buffer to collect performance metrics locally, which can then be accessed via a new Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a local OpenTelemetry buffer and a /perf command router, a significant step towards the Performance Monitoring and Optimization Dashboard (GSoC Idea 5). It integrates a local metric reader into the OpenTelemetry SDK and provides /perf subcommands for latency, memory, and startup metrics. However, the implementation of the local buffer has two significant security vulnerabilities: a memory leak due to the InMemoryMetricExporter never being reset, which could lead to Denial of Service (DoS), and a Prototype Pollution vulnerability in the simplification logic if a metric with a reserved name like __proto__ is registered. Additionally, there are opportunities to improve code maintainability by reducing duplication in perfCommand.ts and to enhance type safety in localBuffer.ts.
Resolves memory leak by resetting InMemoryMetricExporter after snapshot. Prevents prototype pollution by initializing snapshot objects with Object.create(null). Implements strict runtime type guards for OTel DataPointType.HISTOGRAM values. Applies DRY principle to perfCommand router logic.
aishop-lab
left a comment
There was a problem hiding this comment.
Interesting approach — intercepting the existing OTel pipeline is architecturally cleaner than building custom tracking from scratch. A few things I noticed:
1. /perf won't work when telemetry is disabled. The PR modifies initializeTelemetry() to always include localMetricReader, but the caller in config.ts (line 1044) still gates the entire call behind this.telemetrySettings.enabled. When telemetry is off, initializeTelemetry is never called, the reader is never bound to a MeterProvider, and forceFlush() throws "MetricReader is not bound to a MetricProducer". This is arguably the primary use case — users who want local perf visibility without remote export.
2. Module-level singleton breaks re-initialization. localMetricReader is created once at module load. When shutdownTelemetry() runs (e.g., during re-auth), sdk.shutdown() shuts down the reader but doesn't clear its _metricProducer. On re-init, setMetricProducer() throws "MetricReader can not be bound to a MeterProvider again." This breaks the CLI auth flow where telemetry is deferred until credentials arrive.
3. Per-tool granularity is lost. simplifyMetrics sums all data points per metric name, but existing metrics like gemini_cli.tool.call.latency record data with attributes ({ function_name: 'read_file' } vs { function_name: 'shell' }). After aggregation you get one { count: 47, sum: 12345 } with no way to tell which tool is slow. Per-tool and per-model breakdowns are essential for a performance dashboard.
4. On the memory concern — the bot flagged InMemoryMetricExporter as never being reset, but getLocalMetricsSnapshot() does call reset() after reading. The real issue is the 1-second exportIntervalMillis — between /perf invocations, cumulative snapshots accumulate every second. With CUMULATIVE temporality, only the latest matters. A longer interval or on-demand-only flush would avoid the unnecessary churn.
The OTel interception idea has real merit though — looking forward to seeing it evolve.
This commit addresses the architectural feedback regarding the local OpenTelemetry interception for the /perf dashboard: 1. Telemetry Gating: Unconditionally initialized the OTel engine in config.ts. The local metric reader now functions for /perf even when remote telemetry is disabled by the user. 2. Re-initialization Crashes: Refactored localMetricReader from a module-level singleton to a factory pattern (setupLocalMetricsReader / teardownLocalMetrics). The SDK can now safely shut down and re-bind during CLI re-auth flows. 3. Attribute Granularity: Updated PerfSnapshot and simplifyMetrics to preserve data point attributes as arrays. Tool-specific latency (e.g., function_name) and model-specific token usage are no longer lost to aggregation. 4. Memory/CPU Churn: Increased the background exportIntervalMillis to 24 hours. The local buffer now relies strictly on the on-demand forceFlush() when the CLI requests the snapshot, eliminating idle processing overhead. With these backend plumbing fixes in place, the granular OTel metrics are now successfully streaming directly to the Ink terminal UI.
Thanks for these insights @aishop-lab. I've just pushed a backend commit that addresses all four of your architectural points. With these plumbing fixes in place, I was actually able to successfully stream the OTel metrics directly to the Ink TUI.
|
|
Hi there! Thank you for your interest in contributing to Gemini CLI. To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383). We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'. This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community! |

This Draft PR serves as a Proof of Concept for the Performance Monitoring and Optimization Dashboard (GSoC Idea 5).
Instead of building custom tracking arrays or percentile calculators in the core engine, this PoC safely intercepts the existing OpenTelemetry pipeline by attaching an
InMemoryMetricExporter(localBuffer.ts) directly to the OTelNodeSDK.Key Architectural Decisions:
DataPointType.HISTOGRAMbindings to calculate sum, min, max, and percentiles automatically. No rewrites to existingrecord...functions./perfCommand: Isolates heavy telemetry payloads from the lightweight billing/quota logic in the existing/statscommand, preventing UI bloat.ResourceMetricsinto a standardizedPerfSnapshot. This cleanly bridges the backend data to both the future React/Ink UI and the headless JSON exports required for CI integration.Fixes #20313