-
Notifications
You must be signed in to change notification settings - Fork 596
Open
Labels
MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeP1: Non-negotiable, critical requirements without which the product is non-functional or unsafechoreLinting, formatting, dependency hygiene, or project maintenance choresLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesManual testing / test planning issuesobservabilityObservability, logging, monitoringObservability, logging, monitoringreadyValidated, ready-to-work-on itemsValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)Testing (unit, e2e, manual, automated, etc)
Milestone
Description
[TESTING][OBSERVABILITY]: Metrics Accuracy, Tracing Completeness, and Dashboard Validation
Goal
Produce a comprehensive manual test plan for validating the observability stack provides accurate and complete operational visibility including Prometheus metrics, distributed tracing, log correlation, and Grafana dashboards.
Why Now?
Observability is essential for production operations:
- Incident Response: Accurate metrics enable fast debugging
- SLA Monitoring: Need reliable data for SLO tracking
- Capacity Planning: Metrics inform scaling decisions
- Audit Trail: Traces provide request lifecycle visibility
- Dashboard Reliability: Broken dashboards waste engineering time
User Stories
US-1: SRE - Metrics Accuracy
As an SRE
I want Prometheus metrics to accurately reflect system state
So that I can trust alerts and dashboards
Acceptance Criteria:
Feature: Metrics Accuracy
Scenario: Request counter accuracy
Given I make 100 API requests
When I query the request counter metric
Then the counter should show exactly 100
And the increment should happen atomicallyUS-2: Developer - Request Tracing
As a developer
I want complete distributed traces for requests
So that I can debug issues across services
Acceptance Criteria:
Feature: Distributed Tracing
Scenario: Full request trace
Given a tool invocation request
When I view the trace in Jaeger
Then I should see spans for gateway, backend, and tool
And parent-child relationships should be correctArchitecture
OBSERVABILITY STACK
+------------------------------------------------------------------------+
| |
| Gateway Collectors Visualization |
| ------- ---------- ------------- |
| |
| +-----------+ +------------+ +-------------+ |
| | Gateway |-------->| Prometheus |-------->| Grafana | |
| | /metrics | +------------+ +-------------+ |
| +-----------+ |
| | |
| | +------------+ +-------------+ |
| +--------------->| Jaeger |-------->| Jaeger UI | |
| | (traces) +------------+ +-------------+ |
| | |
| | +------------+ +-------------+ |
| +--------------->| Loki |-------->| Grafana | |
| | (logs) +------------+ +-------------+ |
| |
| Correlation: |
| request_id links logs <-> traces <-> metrics |
| |
+------------------------------------------------------------------------+
Test Environment Setup
# Environment variables
export GATEWAY_URL="http://localhost:8000"
export PROMETHEUS_URL="http://localhost:9090"
export JAEGER_URL="http://localhost:16686"
export GRAFANA_URL="http://localhost:3000"
# Start observability stack
docker-compose -f docker-compose.observability.yml up -d
# Verify services
curl -s "$PROMETHEUS_URL/-/healthy"
curl -s "$JAEGER_URL/api/services" | jq '.data'
curl -s "$GRAFANA_URL/api/health" | jq '.database'
# Generate test token
export TOKEN=$(python -m mcpgateway.utils.create_jwt_token \
--username admin@example.com --secret "$JWT_SECRET")Manual Test Cases
| Case | Component | Validation | Expected Result |
|---|---|---|---|
| OBS-01 | Metrics endpoint | /metrics accessible | All metrics present |
| OBS-02 | Counter accuracy | Request counts | Exact match |
| OBS-03 | Histogram buckets | Latency distribution | Accurate percentiles |
| OBS-04 | Trace propagation | Parent-child spans | Correct hierarchy |
| OBS-05 | Log correlation | Request ID linking | Logs findable by trace |
| OBS-06 | Dashboard queries | Grafana panels | No broken panels |
| OBS-07 | Alerting rules | Test alerts | Fire correctly |
OBS-01: Metrics Endpoint Validation
Steps:
# Fetch metrics
curl -s "$GATEWAY_URL/metrics" > /tmp/metrics.txt
# Check for required metrics
REQUIRED_METRICS=(
"http_requests_total"
"http_request_duration_seconds"
"mcp_tool_invocations_total"
"mcp_tool_duration_seconds"
"gateway_active_connections"
"redis_connection_pool_size"
"database_connection_pool_size"
)
for metric in "${REQUIRED_METRICS[@]}"; do
grep -q "$metric" /tmp/metrics.txt && \
echo "PASS: $metric present" || \
echo "FAIL: $metric missing"
doneValidation:
# Verify metric format (Prometheus text format)
promtool check metrics < /tmp/metrics.txt && echo "PASS: Valid Prometheus format"Expected Result:
- All documented metrics present
- Metrics in valid Prometheus format
- HELP and TYPE annotations correct
OBS-02: Counter Accuracy Test
Steps:
# Get initial counter value
INITIAL=$(curl -s "$GATEWAY_URL/metrics" | grep 'http_requests_total{' | head -1 | awk '{print $2}')
# Make exactly 100 requests
for i in {1..100}; do
curl -s "$GATEWAY_URL/health" > /dev/null
done
# Get final counter value
FINAL=$(curl -s "$GATEWAY_URL/metrics" | grep 'http_requests_total{' | head -1 | awk '{print $2}')
# Calculate difference
DIFF=$((FINAL - INITIAL))
echo "Requests made: 100, Counter increment: $DIFF"
[ "$DIFF" -eq 100 ] && echo "PASS" || echo "FAIL"Expected Result:
- Counter increments by exactly 100
- No missed or duplicate counts
OBS-03: Histogram Accuracy Test
Steps:
# Make requests with known latency (use slow endpoint if available)
for i in {1..50}; do
curl -s "$GATEWAY_URL/health" > /dev/null
done
# Query histogram
curl -s "$PROMETHEUS_URL/api/v1/query?query=http_request_duration_seconds_bucket" | jq '.data.result'
# Calculate percentiles
curl -s "$PROMETHEUS_URL/api/v1/query?query=histogram_quantile(0.95,rate(http_request_duration_seconds_bucket[5m]))" | jq '.data.result[0].value[1]'Expected Result:
- Histogram buckets populated correctly
- P95 calculation returns reasonable value
- Bucket boundaries match documented values
OBS-04: Trace Propagation Test
Steps:
# Make request and capture trace ID
RESPONSE=$(curl -s -D /tmp/headers.txt -X POST "$GATEWAY_URL/mcp/http" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "test-tool", "arguments": {}}}')
# Extract trace ID from response headers
TRACE_ID=$(grep -i "traceparent" /tmp/headers.txt | awk -F'-' '{print $2}')
echo "Trace ID: $TRACE_ID"
# Query Jaeger for trace
curl -s "$JAEGER_URL/api/traces/$TRACE_ID" | jq '.data[0].spans | length'Validation:
# Verify span hierarchy
curl -s "$JAEGER_URL/api/traces/$TRACE_ID" | jq '.data[0].spans[] | {operationName, spanID, references}'Expected Result:
- Trace ID present in response headers
- Multiple spans in trace (gateway -> backend -> tool)
- Parent-child references correct
- All spans have same trace ID
OBS-05: Log Correlation Test
Steps:
# Make request with known request ID
REQUEST_ID="test-correlation-$(date +%s)"
curl -s -X POST "$GATEWAY_URL/mcp/http" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "X-Request-ID: $REQUEST_ID" \
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}'
# Search logs for request ID
docker logs mcpgateway 2>&1 | grep "$REQUEST_ID"
# Or query Loki
curl -s "$LOKI_URL/loki/api/v1/query?query={job=\"mcpgateway\"}|=\"$REQUEST_ID\"" | jq '.data.result'Expected Result:
- Request ID appears in all log entries for that request
- Can find logs by request ID
- Log entries include trace ID for correlation
OBS-06: Dashboard Validation
Steps:
# List all dashboards
curl -s "$GRAFANA_URL/api/search?type=dash-db" \
-H "Authorization: Bearer $GRAFANA_TOKEN" | jq '.[].title'
# Test each dashboard for broken queries
DASHBOARDS=$(curl -s "$GRAFANA_URL/api/search?type=dash-db" \
-H "Authorization: Bearer $GRAFANA_TOKEN" | jq -r '.[].uid')
for uid in $DASHBOARDS; do
echo "Testing dashboard: $uid"
# Get dashboard JSON
DASHBOARD=$(curl -s "$GRAFANA_URL/api/dashboards/uid/$uid" \
-H "Authorization: Bearer $GRAFANA_TOKEN")
# Extract and test queries
echo "$DASHBOARD" | jq '.dashboard.panels[].targets[].expr' 2>/dev/null | while read query; do
[ -n "$query" ] && curl -s "$PROMETHEUS_URL/api/v1/query?query=$query" | jq '.status'
done
doneExpected Result:
- All dashboards load without errors
- All panel queries return data
- No "No data" or error states
OBS-07: Alerting Rules Test
Steps:
# List configured alerts
curl -s "$PROMETHEUS_URL/api/v1/rules" | jq '.data.groups[].rules[] | {name, state}'
# Trigger test alert (e.g., stop gateway to trigger InstanceDown)
docker stop mcpgateway
sleep 60
# Check for firing alert
curl -s "$PROMETHEUS_URL/api/v1/alerts" | jq '.data.alerts[] | {alertname, state}'
# Restart gateway
docker start mcpgatewayExpected Result:
- Alerts fire when conditions met
- Alert resolves when condition clears
- Alert manager receives notifications (if configured)
Test Matrix
| Component | Metric/Feature | Validation Method | Pass Criteria |
|---|---|---|---|
| Metrics | http_requests_total | Counter accuracy | Exact match |
| Metrics | latency histograms | Percentile calc | P95 accurate |
| Tracing | Span propagation | Jaeger query | Full trace |
| Tracing | Context headers | Header inspection | traceparent present |
| Logging | Request ID | Log search | ID in all logs |
| Dashboards | All panels | Query validation | No errors |
| Alerting | Instance down | Stop service | Alert fires |
Success Criteria
- All documented metrics present in /metrics endpoint
- Request counter accuracy verified (100%)
- Histogram percentiles match actual latency distribution
- Distributed traces capture full request lifecycle
- Log correlation works via request ID
- All Grafana dashboard panels show data
- Alerting rules fire correctly under test conditions
Related Files
mcpgateway/middleware/metrics.py- Metrics middlewaremcpgateway/middleware/tracing.py- Tracing middlewaremcpgateway/utils/logging.py- Structured loggingdeployment/grafana/dashboards/- Dashboard definitionsdeployment/prometheus/rules/- Alerting rules
Related Issues
- [TESTING][FUNCTIONALITY]: Observability manual test plan (metrics, logging, tracing, health) #2435 - Observability (functionality)
- [TESTING][FUNCTIONALITY]: Metrics system manual test plan (buffering, rollup, cleanup, queries) #2450 - Metrics system
- [TESTING][PERFORMANCE]: Load Testing, Stress Testing, and Benchmarks #2473 - Performance testing
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafeP1: Non-negotiable, critical requirements without which the product is non-functional or unsafechoreLinting, formatting, dependency hygiene, or project maintenance choresLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesManual testing / test planning issuesobservabilityObservability, logging, monitoringObservability, logging, monitoringreadyValidated, ready-to-work-on itemsValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)Testing (unit, e2e, manual, automated, etc)