[TESTING][FUNCTIONALITY]: Observability manual test plan (metrics, logging, tracing, health)

# 📊 [TESTING][FUNCTIONALITY]: Observability Manual Test Plan

## Goal

Produce a **comprehensive manual test plan** for observability including metrics, logging, tracing, and health endpoints.

## Why Now?

Observability enables operational insight:

1. **Metrics**: Measure system performance
2. **Logging**: Debug issues and audit
3. **Tracing**: Track requests across services
4. **Health**: Monitor system status

---

## 📖 User Stories

<details>
<summary>US-1: SRE - System Monitoring</summary>

**As an** SRE
**I want** comprehensive observability
**So that** I can monitor and troubleshoot

**Acceptance Criteria:**

```gherkin
Feature: System Monitoring

 Scenario: Access metrics
 Given Prometheus is configured
 When I scrape /metrics
 Then I should see gateway metrics

 Scenario: Access logs
 Given structured logging is enabled
 When I query logs
 Then I should find request details
```

</details>

---

## 🏗 Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ OBSERVABILITY ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────┘

 GATEWAY EXPORTERS BACKENDS
 ─────── ───────── ────────

 ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
 │ Gateway │ │ /metrics │──────▶│ Prometheus │
 │ │ └──────────────┘ └──────────────┘
 │ │ ┌──────────────┐ ┌──────────────┐
 │ │──────▶│ Logs │──────▶│ Loki/ELK │
 │ │ └──────────────┘ └──────────────┘
 │ │ ┌──────────────┐ ┌──────────────┐
 │ │ │ Traces │──────▶│ Jaeger │
 └──────────────┘ └──────────────┘ └──────────────┘
```

---

## 📋 Test Environment Setup

```bash
export GATEWAY_URL="http://localhost:8000"
export LOG_LEVEL=DEBUG
export STRUCTURED_LOGGING_ENABLED=true
export TOKEN=$(python -m mcpgateway.utils.create_jwt_token \
 --username tester@example.com --secret "$JWT_SECRET")
```

---

## 🧪 Manual Test Cases

### Section 1: Metrics

| Case | Scenario | Endpoint | Expected | Validation |
|------|----------|----------|----------|------------|
| MT-01 | Prometheus metrics | /metrics | Metrics | Prometheus format |
| MT-02 | Request counters | After requests | Incremented | Count increases |
| MT-03 | Latency histograms | After requests | Buckets | Distribution |

<details>
<summary>MT-01: Prometheus Metrics Endpoint</summary>

**Steps:**

```bash
# Step 1: Fetch metrics
curl -s "$GATEWAY_URL/metrics" | head -50

# Step 2: Check for gateway-specific metrics
curl -s "$GATEWAY_URL/metrics" | grep "mcpgateway_"

# Step 3: Verify Prometheus format
curl -s "$GATEWAY_URL/metrics" | grep "# HELP"
curl -s "$GATEWAY_URL/metrics" | grep "# TYPE"
```

**Expected Result:**
- Prometheus exposition format
- Gateway-specific metrics present
- HELP and TYPE annotations

</details>

### Section 2: Logging

| Case | Scenario | Level | Expected | Validation |
|------|----------|-------|----------|------------|
| LG-01 | Structured logs | JSON | Valid JSON | Parseable |
| LG-02 | Request logging | Request | Request ID | Traceable |
| LG-03 | Error logging | Error | Stack trace | Debuggable |

<details>
<summary>LG-01: Structured JSON Logging</summary>

**Steps:**

```bash
# Step 1: Make request
curl -s "$GATEWAY_URL/gateways" \
 -H "Authorization: Bearer $TOKEN" > /dev/null

# Step 2: Check logs (assuming stdout or file)
# Logs should be JSON formatted
docker logs mcpgateway 2>&1 | tail -5 | jq .

# Step 3: Verify log fields
# Should have: timestamp, level, message, request_id
```

**Expected Result:**
- Logs in JSON format
- Contains expected fields
- Request ID for correlation

</details>

### Section 3: Health Endpoints

| Case | Scenario | Endpoint | Expected | Validation |
|------|----------|----------|----------|------------|
| HE-01 | Liveness | /health/live | 200 OK | Alive |
| HE-02 | Readiness | /health/ready | 200/503 | Ready state |
| HE-03 | Dependencies | /health | Details | Component status |

<details>
<summary>HE-03: Health with Dependencies</summary>

**Steps:**

```bash
# Step 1: Get full health
curl -s "$GATEWAY_URL/health" | jq .

# Step 2: Check dependency status
curl -s "$GATEWAY_URL/health" | jq '.dependencies'

# Expected: database, redis, etc. with status
```

**Expected Result:**
- Overall health status
- Individual dependency status
- Degraded if any dependency down

</details>

### Section 4: Tracing

| Case | Scenario | Header | Expected | Validation |
|------|----------|--------|----------|------------|
| TR-01 | Trace propagation | X-Request-ID | Propagated | Same ID in logs |
| TR-02 | Span creation | Request | Spans | In Jaeger |

<details>
<summary>TR-01: Trace ID Propagation</summary>

**Steps:**

```bash
# Step 1: Make request with trace ID
curl -s "$GATEWAY_URL/gateways" \
 -H "Authorization: Bearer $TOKEN" \
 -H "X-Request-ID: test-trace-12345" > /dev/null

# Step 2: Check logs for trace ID
docker logs mcpgateway 2>&1 | grep "test-trace-12345"
```

**Expected Result:**
- Trace ID appears in logs
- All related log entries have same ID

</details>

---

## 📊 Test Matrix

| Test Case | Metrics | Logging | Health | Tracing |
|-----------|---------|---------|--------|---------|
| MT-01 | ✓ | | | |
| MT-02 | ✓ | | | |
| MT-03 | ✓ | | | |
| LG-01 | | ✓ | | |
| LG-02 | | ✓ | | ✓ |
| LG-03 | | ✓ | | |
| HE-01 | | | ✓ | |
| HE-02 | | | ✓ | |
| HE-03 | | | ✓ | |
| TR-01 | | | | ✓ |
| TR-02 | | | | ✓ |

---

## ✅ Success Criteria

- [ ] All 11 test cases pass
- [ ] Metrics endpoint works
- [ ] Structured logging enabled
- [ ] Health endpoints respond correctly
- [ ] Trace IDs propagate

---

## 🔗 Related Files

- `mcpgateway/middleware/logging.py`
- `mcpgateway/services/health_service.py`

---

## 🔗 Related Issues

- #2462 - Health Monitoring
- #2476 - Observability accuracy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TESTING][FUNCTIONALITY]: Observability manual test plan (metrics, logging, tracing, health) #2435

📊 [TESTING][FUNCTIONALITY]: Observability Manual Test Plan

Goal

Why Now?

📖 User Stories

🏗 Architecture

📋 Test Environment Setup

🧪 Manual Test Cases

Section 1: Metrics

Section 2: Logging

Section 3: Health Endpoints

Section 4: Tracing

📊 Test Matrix

✅ Success Criteria

🔗 Related Files

🔗 Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Case	Scenario	Endpoint	Expected	Validation
MT-01	Prometheus metrics	/metrics	Metrics	Prometheus format
MT-02	Request counters	After requests	Incremented	Count increases
MT-03	Latency histograms	After requests	Buckets	Distribution

Case	Scenario	Level	Expected	Validation
LG-01	Structured logs	JSON	Valid JSON	Parseable
LG-02	Request logging	Request	Request ID	Traceable
LG-03	Error logging	Error	Stack trace	Debuggable

Case	Scenario	Endpoint	Expected	Validation
HE-01	Liveness	/health/live	200 OK	Alive
HE-02	Readiness	/health/ready	200/503	Ready state
HE-03	Dependencies	/health	Details	Component status

Case	Scenario	Header	Expected	Validation
TR-01	Trace propagation	X-Request-ID	Propagated	Same ID in logs
TR-02	Span creation	Request	Spans	In Jaeger

[TESTING][FUNCTIONALITY]: Observability manual test plan (metrics, logging, tracing, health) #2435

Description

📊 [TESTING][FUNCTIONALITY]: Observability Manual Test Plan

Goal

Why Now?

📖 User Stories

🏗 Architecture

📋 Test Environment Setup

🧪 Manual Test Cases

Section 1: Metrics

Section 2: Logging

Section 3: Health Endpoints

Section 4: Tracing

📊 Test Matrix

✅ Success Criteria

🔗 Related Files

🔗 Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions