Skip to content

feat: Add OpenTelemetry distributed tracing support (#263)#264

Merged
Flux159 merged 15 commits intomainfrom
feature/observability-opentelemetry
Mar 1, 2026
Merged

feat: Add OpenTelemetry distributed tracing support (#263)#264
Flux159 merged 15 commits intomainfrom
feature/observability-opentelemetry

Conversation

@rr-paras-patel
Copy link
Collaborator

@rr-paras-patel rr-paras-patel commented Jan 31, 2026

Description

This PR adds OpenTelemetry distributed tracing as a new observability feature for the Kubernetes MCP server. This enables comprehensive monitoring of tool executions, performance tracking, and error analysis.

New Feature: OpenTelemetry Integration

What's Added

  • Distributed tracing with OpenTelemetry SDK
  • 📊 Automatic instrumentation via middleware (zero code changes in tools)
  • 🔍 Request metadata capture (tool name, arguments, K8s context)
  • 📈 Response metadata capture (item counts, sizes, types)
  • 🎯 OTLP export to any observability backend (Jaeger, Tempo, Datadog, etc.)
  • ⚙️ Configurable sampling for cost control
  • 🔒 Privacy controls for sensitive environments

Captured Telemetry

Request Attributes:

  • Tool name and operation type
  • Arguments (count and keys, not values)
  • Kubernetes context, namespace, resource type
  • Execution duration

Response Attributes:

  • response.content_items - Number of content blocks
  • response.content_type - Content type (text, json, etc.)
  • response.text_size_bytes - Response size in bytes
  • response.k8s_items_count - Number of K8s resources returned
  • response.k8s_kind - Resource kind (PodList, NodeList, etc.)

Configuration

Enable Tracing

export ENABLE_TELEMETRY=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

Optional Controls

# Sampling (for production)
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.05  # 5% sampling

# Privacy control
export OTEL_CAPTURE_RESPONSE_METADATA=false  # Disable response metadata

# Custom attributes
export OTEL_RESOURCE_ATTRIBUTES="environment=production,team=platform"

Example Trace

{
  "spanName": "tools/call kubectl_get",
  "duration": "1915ms",
  "attributes": {
    "gen_ai.tool.name": "kubectl_get",
    "gen_ai.operation.name": "execute_tool",
    "k8s.resource_type": "deployments",
    "k8s.namespace": "default",
    "tool.duration_ms": 1915,
    "tool.argument_count": 3,
    "response.k8s_items_count": 92,
    "response.text_size_bytes": 16851,
    "response.content_type": "text"
  },
  "status": "OK"
}

Benefits

  • 🐛 Debugging: Trace request flows, identify failures with full context
  • 📊 Performance: Identify slow tools and bottlenecks
  • 📈 Monitoring: Track response sizes, item counts, data growth
  • 🔍 Observability: Export to enterprise platforms (Jaeger, Grafana, Datadog)
  • 💰 Cost control: Configurable sampling strategies

Supported Backends

  • Jaeger (open source)
  • Grafana Tempo (open source)
  • Grafana Cloud (managed)
  • Datadog, New Relic, Honeycomb (commercial)
  • Any OTLP-compatible backend

Testing

  • ✅ Build succeeds
  • ✅ All tests pass
  • ✅ Traces exported to Jaeger successfully
  • ✅ Response metadata captured correctly
  • ✅ Privacy flag works
  • ✅ No breaking changes

Implementation

Files Modified

  • src/config/telemetry-config.ts - Configuration and SDK initialization
  • src/middleware/telemetry-middleware.ts - Automatic instrumentation
  • docs/OBSERVABILITY.md - Complete documentation

Backward Compatibility

100% backward compatible

  • Disabled by default (opt-in)
  • No changes to existing deployments
  • Graceful degradation
  • No performance impact when disabled

Roadmap

Current: Distributed tracing
Coming soon: Metrics and logs

Closes #263

@rr-paras-patel rr-paras-patel changed the title feat: Add response metadata capture to observability (#263) feat: Add OpenTelemetry distributed tracing support (#263) Jan 31, 2026
Copy link
Owner

@Flux159 Flux159 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine, just update / answer Qs and then can merge.

@Flux159 Flux159 merged commit deafc4f into main Mar 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add OpenTelemetry distributed tracing as a new observability

2 participants