[TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention)

# 📊 [TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention)

## Goal

Produce a **comprehensive manual test plan** for metrics maintenance operations including hourly rollup aggregation, retention-based cleanup, and the admin maintenance UI functionality.

## Why Now?

Metrics accumulate rapidly in production environments:

1. **Storage Management**: Prevent unbounded database growth from raw metrics
2. **Query Performance**: Ensure metrics queries remain fast with proper rollups
3. **Compliance**: Validate data retention policies are correctly enforced
4. **Operational Visibility**: Confirm the maintenance UI provides accurate controls

---

## 📖 User Stories

<details>
<summary>US-1: Platform Admin - Metrics Rollup Operations</summary>

**As a** Platform Administrator
**I want** automated hourly metrics aggregation
**So that** raw metrics are summarized for efficient querying

**Acceptance Criteria:**

```gherkin
Feature: Metrics Rollup

 Scenario: Hourly rollup aggregation
 Given raw metrics data for the past hour
 When the rollup job executes
 Then hourly aggregates should be created
 And aggregates should include: count, sum, avg, min, max
 And raw data should be preserved until retention expires

 Scenario: Rollup idempotency
 Given a completed rollup for hour H
 When rollup runs again for hour H
 Then existing aggregates should not be duplicated
 And counts should remain accurate

 Scenario: Rollup for all entity types
 Given metrics for tools, resources, prompts, servers, and agents
 When rollup executes
 Then all entity types should have hourly summaries
 And per-entity statistics should be accurate
```

**Technical Requirements:**
- Rollup creates `*_metrics_hourly` records
- Aggregation includes: invocation_count, total_duration, avg_duration, min_duration, max_duration
- Rollup is idempotent (safe to rerun)
- Rollup handles timezone correctly (UTC)

</details>

<details>
<summary>US-2: DBA - Metrics Cleanup Operations</summary>

**As a** Database Administrator
**I want** automated metrics cleanup based on retention policy
**So that** database size is controlled

**Acceptance Criteria:**

```gherkin
Feature: Metrics Cleanup

 Scenario: Raw metrics cleanup
 Given METRICS_RAW_RETENTION_DAYS=7
 And raw metrics older than 7 days exist
 When cleanup job executes
 Then metrics older than 7 days should be deleted
 And metrics within retention should be preserved
 And cleanup should report deleted count

 Scenario: Hourly metrics cleanup
 Given METRICS_HOURLY_RETENTION_DAYS=90
 And hourly rollups older than 90 days exist
 When cleanup job executes
 Then old hourly data should be deleted
 And recent rollups should be preserved

 Scenario: Batch cleanup for large datasets
 Given 10 million raw metric records
 When cleanup executes
 Then deletion should happen in batches
 And database should not lock for extended periods
 And progress should be trackable
```

**Technical Requirements:**
- Configurable retention: `METRICS_RAW_RETENTION_DAYS`, `METRICS_HOURLY_RETENTION_DAYS`
- Batch deletion (default 10,000 per batch)
- Cleanup reports: deleted count, duration, errors
- Safe to run during active operations

</details>

<details>
<summary>US-3: Operations Engineer - Maintenance UI</summary>

**As an** Operations Engineer
**I want** a maintenance UI for manual metric operations
**So that** I can trigger rollup/cleanup and monitor status

**Acceptance Criteria:**

```gherkin
Feature: Maintenance UI

 Scenario: View maintenance status
 When I access /admin/#maintenance
 Then I should see current metrics statistics
 And last rollup timestamp
 And last cleanup timestamp
 And database size information

 Scenario: Trigger manual rollup
 Given pending raw metrics
 When I click "Run Rollup"
 Then rollup should execute
 And UI should show progress
 And completion status should be displayed

 Scenario: Trigger manual cleanup
 Given metrics beyond retention
 When I click "Run Cleanup"
 Then cleanup should execute
 And deleted count should be displayed
 And errors should be reported if any
```

</details>

---

## 🏗 Architecture

### Metrics Data Flow

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ METRICS DATA LIFECYCLE │
└─────────────────────────────────────────────────────────────────────────────┘

 COLLECTION AGGREGATION RETENTION
 ────────── ─────────── ─────────

 ┌────────────┐ ┌────────────┐ ┌────────────┐
 │ Tool │ │ │ │ │
 │ Invocation │──────────▶│ Metrics │──────────▶│ Raw │
 │ │ │ Buffer │ │ Storage │
 └────────────┘ │ Service │ │ (7 days) │
 └─────┬──────┘ └─────┬──────┘
 ┌────────────┐ │ │
 │ Resource │ │ │
 │ Access │─────────────────┤ │
 └────────────┘ │ │
 │ │
 ┌────────────┐ │ ┌─────────────┐ │
 │ Prompt │ │ │ Hourly │ │
 │ Execution │─────────────────┴────▶│ Rollup │◀───┘
 └────────────┘ │ Job │
 └──────┬──────┘
 │
 ┌──────▼──────┐
 │ Hourly │
 │ Storage │
 │ (90 days) │
 └──────┬──────┘
 │
 ┌──────▼──────┐
 │ Cleanup │
 │ Job │
 └─────────────┘
```

### Database Tables

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ METRICS TABLES │
└─────────────────────────────────────────────────────────────────────────────┘

 RAW METRICS (tool_metrics, resource_metrics, prompt_metrics, etc.)
 ────────────────────────────────────────────────────────────────────
 ┌─────────────────────────────────────────────────────────────────┐
 │ id | entity_id | timestamp | duration_ms | success | user_email │
 ├─────────────────────────────────────────────────────────────────┤
 │ High volume, short retention (7 days default) │
 │ Indexed on: entity_id, timestamp │
 └─────────────────────────────────────────────────────────────────┘

 HOURLY ROLLUPS (tool_metrics_hourly, etc.)
 ────────────────────────────────────────────────────────────────────
 ┌─────────────────────────────────────────────────────────────────┐
 │ id | entity_id | hour | count | sum_ms | avg_ms | min_ms | max │
 ├─────────────────────────────────────────────────────────────────┤
 │ Aggregated, longer retention (90 days default) │
 │ Indexed on: entity_id, hour │
 └─────────────────────────────────────────────────────────────────┘
```

---

## 📋 Test Environment Setup

### Prerequisites

```bash
# Environment configuration
export METRICS_RAW_RETENTION_DAYS=7
export METRICS_HOURLY_RETENTION_DAYS=90
export METRICS_CLEANUP_BATCH_SIZE=10000
export GATEWAY_URL="http://localhost:8000"

# Generate test metrics data
python3 << 'EOF'
import requests
import time

# Generate tool invocations to create metrics
for i in range(100):
 requests.post(
 f"{GATEWAY_URL}/tools/echo/invoke",
 headers={"Authorization": f"Bearer {ADMIN_TOKEN}"},
 json={"message": f"test {i}"}
 )
 time.sleep(0.1)
EOF
```

### Create Historical Metrics

```sql
-- Insert old metrics for cleanup testing (PostgreSQL)
INSERT INTO tool_metrics (tool_id, timestamp, duration_ms, success, user_email)
SELECT
 'test-tool-id',
 NOW() - INTERVAL '30 days' + (random() * INTERVAL '25 days'),
 random() * 1000,
 true,
 'test@example.com'
FROM generate_series(1, 10000);
```

---

## 🧪 Manual Test Cases

### Section 1: Metrics Rollup

| Case | Scenario | Input | Expected | Validation |
|------|----------|-------|----------|------------|
| MR-01 | Basic hourly rollup | 100 raw records | 1 hourly record | Counts match |
| MR-02 | Multi-hour rollup | 500 records, 5 hours | 5 hourly records | Per-hour accuracy |
| MR-03 | Rollup idempotency | Run twice | No duplicates | Same counts |
| MR-04 | Empty hour handling | No data in hour | No rollup created | No errors |

<details>
<summary>MR-01: Basic Hourly Rollup</summary>

**Preconditions:**
- Raw metrics exist for the current hour
- No existing rollup for this hour

**Steps:**

```bash
# Step 1: Check current raw metrics count
curl -s "$GATEWAY_URL/api/admin/metrics/raw/count" \
 -H "Authorization: Bearer $ADMIN_TOKEN" | jq .

# Step 2: Trigger rollup via API
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/rollup" \
 -H "Authorization: Bearer $ADMIN_TOKEN" | jq .

# Step 3: Verify hourly rollup created
curl -s "$GATEWAY_URL/api/admin/metrics/hourly" \
 -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.[-1]'

# Step 4: Validate aggregation accuracy
# Sum of raw durations should equal rollup sum_duration
psql $DATABASE_URL -c "
 SELECT
 COUNT(*) as raw_count,
 SUM(duration_ms) as raw_sum,
 AVG(duration_ms) as raw_avg
 FROM tool_metrics
 WHERE timestamp >= date_trunc('hour', NOW());
"
```

**Expected Result:**
- Rollup record created for current hour
- Count matches raw record count
- Sum/avg/min/max values accurate
- API returns success

</details>

<details>
<summary>MR-03: Rollup Idempotency</summary>

**Preconditions:**
- Rollup already executed for current hour

**Steps:**

```bash
# Step 1: Get current rollup count
BEFORE=$(psql $DATABASE_URL -t -c "SELECT COUNT(*) FROM tool_metrics_hourly;")

# Step 2: Run rollup again
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/rollup" \
 -H "Authorization: Bearer $ADMIN_TOKEN"

# Step 3: Check rollup count unchanged
AFTER=$(psql $DATABASE_URL -t -c "SELECT COUNT(*) FROM tool_metrics_hourly;")

echo "Before: $BEFORE, After: $AFTER"
[ "$BEFORE" = "$AFTER" ] && echo "PASS: Idempotent" || echo "FAIL: Duplicates created"
```

**Expected Result:**
- Rollup count unchanged
- No duplicate records created
- No errors reported

</details>

### Section 2: Metrics Cleanup

| Case | Scenario | Retention | Records | Expected |
|------|----------|-----------|---------|----------|
| MC-01 | Raw cleanup | 7 days | 10k old | All old deleted |
| MC-02 | Preserve recent | 7 days | Mixed | Recent preserved |
| MC-03 | Batch cleanup | 7 days | 100k old | Batched deletion |
| MC-04 | Hourly cleanup | 90 days | 1k old | Old rollups deleted |

<details>
<summary>MC-01: Raw Metrics Cleanup</summary>

**Preconditions:**
- METRICS_RAW_RETENTION_DAYS=7
- Old metrics (>7 days) exist

**Steps:**

```bash
# Step 1: Check old metrics count
OLD_COUNT=$(psql $DATABASE_URL -t -c "
 SELECT COUNT(*) FROM tool_metrics
 WHERE timestamp < NOW() - INTERVAL '7 days';
")
echo "Old metrics to delete: $OLD_COUNT"

# Step 2: Trigger cleanup
RESULT=$(curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/cleanup" \
 -H "Authorization: Bearer $ADMIN_TOKEN")
echo $RESULT | jq .

# Step 3: Verify deletion
REMAINING=$(psql $DATABASE_URL -t -c "
 SELECT COUNT(*) FROM tool_metrics
 WHERE timestamp < NOW() - INTERVAL '7 days';
")
echo "Remaining old metrics: $REMAINING"

# Step 4: Verify recent preserved
RECENT=$(psql $DATABASE_URL -t -c "
 SELECT COUNT(*) FROM tool_metrics
 WHERE timestamp >= NOW() - INTERVAL '7 days';
")
echo "Recent metrics preserved: $RECENT"
```

**Expected Result:**
- All old metrics deleted
- Recent metrics preserved
- Cleanup reports deleted count
- No errors

</details>

<details>
<summary>MC-03: Batch Cleanup for Large Datasets</summary>

**Preconditions:**
- 100,000+ old metrics records
- METRICS_CLEANUP_BATCH_SIZE=10000

**Steps:**

```bash
# Step 1: Create large dataset
psql $DATABASE_URL -c "
 INSERT INTO tool_metrics (tool_id, timestamp, duration_ms, success)
 SELECT 'bulk-test', NOW() - INTERVAL '30 days', random()*100, true
 FROM generate_series(1, 100000);
"

# Step 2: Monitor cleanup progress
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/cleanup?verbose=true" \
 -H "Authorization: Bearer $ADMIN_TOKEN" | jq .

# Step 3: Verify batch behavior (check logs)
# Should see multiple batch deletions, not one large delete
grep "cleanup" /var/log/mcpgateway/app.log | tail -20
```

**Expected Result:**
- Cleanup completes in batches
- No long database locks
- All old records eventually deleted
- Progress reported

</details>

### Section 3: Maintenance UI

| Case | UI Element | Action | Expected |
|------|------------|--------|----------|
| UI-01 | Dashboard | View | Shows stats |
| UI-02 | Rollup button | Click | Triggers rollup |
| UI-03 | Cleanup button | Click | Triggers cleanup |
| UI-04 | Status refresh | Auto | Updates status |

<details>
<summary>UI-01: Maintenance Dashboard</summary>

**Preconditions:**
- Admin user logged in
- Gateway running with metrics

**Steps:**

```
1. Navigate to http://localhost:8080/admin/#maintenance
2. Verify dashboard displays:
 - Current database size
 - Raw metrics count
 - Hourly rollup count
 - Last rollup timestamp
 - Last cleanup timestamp
 - Retention settings
3. Verify statistics are accurate
```

**Expected Result:**
- All statistics displayed
- Values match database
- UI is responsive

</details>

<details>
<summary>UI-02: Manual Rollup via UI</summary>

**Preconditions:**
- Maintenance UI accessible
- Pending raw metrics

**Steps:**

```
1. Navigate to /admin/#maintenance
2. Note current rollup count
3. Click "Run Rollup" button
4. Observe progress indicator
5. Wait for completion
6. Verify rollup count increased
```

**Expected Result:**
- Progress shown during rollup
- Success message displayed
- Rollup count updated
- No page refresh required

</details>

---

## 📊 Test Matrix

| Test Case | Rollup | Cleanup | UI | PostgreSQL | SQLite |
|-----------|--------|---------|-----|------------|--------|
| MR-01 | ✓ | | | ✓ | ✓ |
| MR-02 | ✓ | | | ✓ | ✓ |
| MR-03 | ✓ | | | ✓ | ✓ |
| MR-04 | ✓ | | | ✓ | ✓ |
| MC-01 | | ✓ | | ✓ | ✓ |
| MC-02 | | ✓ | | ✓ | ✓ |
| MC-03 | | ✓ | | ✓ | |
| MC-04 | | ✓ | | ✓ | ✓ |
| UI-01 | | | ✓ | ✓ | ✓ |
| UI-02 | ✓ | | ✓ | ✓ | ✓ |
| UI-03 | | ✓ | ✓ | ✓ | ✓ |
| UI-04 | | | ✓ | ✓ | ✓ |

---

## ✅ Success Criteria

- [ ] All 12 test cases pass
- [ ] Hourly rollup aggregates correctly
- [ ] Rollup is idempotent
- [ ] Cleanup respects retention policies
- [ ] Batch cleanup handles large datasets
- [ ] Maintenance UI functional
- [ ] No data loss during operations
- [ ] Performance acceptable under load

---

## 🔗 Related Files

- `mcpgateway/services/metrics_rollup_service.py`
- `mcpgateway/services/metrics_cleanup_service.py`
- `mcpgateway/services/metrics_buffer_service.py`
- `mcpgateway/routers/metrics_maintenance.py`
- `mcpgateway/admin.py` (maintenance UI)

---

## 🔗 Related Issues

- #2450 - Metrics system functionality testing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention) #2460

📊 [TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention)

Goal

Why Now?

📖 User Stories

🏗 Architecture

Metrics Data Flow

Database Tables

📋 Test Environment Setup

Prerequisites

Create Historical Metrics

🧪 Manual Test Cases

Section 1: Metrics Rollup

Section 2: Metrics Cleanup

Section 3: Maintenance UI

📊 Test Matrix

✅ Success Criteria

🔗 Related Files

🔗 Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Case	Scenario	Input	Expected	Validation
MR-01	Basic hourly rollup	100 raw records	1 hourly record	Counts match
MR-02	Multi-hour rollup	500 records, 5 hours	5 hourly records	Per-hour accuracy
MR-03	Rollup idempotency	Run twice	No duplicates	Same counts
MR-04	Empty hour handling	No data in hour	No rollup created	No errors

Case	Scenario	Retention	Records	Expected
MC-01	Raw cleanup	7 days	10k old	All old deleted
MC-02	Preserve recent	7 days	Mixed	Recent preserved
MC-03	Batch cleanup	7 days	100k old	Batched deletion
MC-04	Hourly cleanup	90 days	1k old	Old rollups deleted

Case	UI Element	Action	Expected
UI-01	Dashboard	View	Shows stats
UI-02	Rollup button	Click	Triggers rollup
UI-03	Cleanup button	Click	Triggers cleanup
UI-04	Status refresh	Auto	Updates status

[TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention) #2460

Description

📊 [TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention)

Goal

Why Now?

📖 User Stories

🏗 Architecture

Metrics Data Flow

Database Tables

📋 Test Environment Setup

Prerequisites

Create Historical Metrics

🧪 Manual Test Cases

Section 1: Metrics Rollup

Section 2: Metrics Cleanup

Section 3: Maintenance UI

📊 Test Matrix

✅ Success Criteria

🔗 Related Files

🔗 Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions