-
Notifications
You must be signed in to change notification settings - Fork 596
Description
📊 [TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention)
Goal
Produce a comprehensive manual test plan for metrics maintenance operations including hourly rollup aggregation, retention-based cleanup, and the admin maintenance UI functionality.
Why Now?
Metrics accumulate rapidly in production environments:
- Storage Management: Prevent unbounded database growth from raw metrics
- Query Performance: Ensure metrics queries remain fast with proper rollups
- Compliance: Validate data retention policies are correctly enforced
- Operational Visibility: Confirm the maintenance UI provides accurate controls
📖 User Stories
US-1: Platform Admin - Metrics Rollup Operations
As a Platform Administrator
I want automated hourly metrics aggregation
So that raw metrics are summarized for efficient querying
Acceptance Criteria:
Feature: Metrics Rollup
Scenario: Hourly rollup aggregation
Given raw metrics data for the past hour
When the rollup job executes
Then hourly aggregates should be created
And aggregates should include: count, sum, avg, min, max
And raw data should be preserved until retention expires
Scenario: Rollup idempotency
Given a completed rollup for hour H
When rollup runs again for hour H
Then existing aggregates should not be duplicated
And counts should remain accurate
Scenario: Rollup for all entity types
Given metrics for tools, resources, prompts, servers, and agents
When rollup executes
Then all entity types should have hourly summaries
And per-entity statistics should be accurateTechnical Requirements:
- Rollup creates
*_metrics_hourlyrecords - Aggregation includes: invocation_count, total_duration, avg_duration, min_duration, max_duration
- Rollup is idempotent (safe to rerun)
- Rollup handles timezone correctly (UTC)
US-2: DBA - Metrics Cleanup Operations
As a Database Administrator
I want automated metrics cleanup based on retention policy
So that database size is controlled
Acceptance Criteria:
Feature: Metrics Cleanup
Scenario: Raw metrics cleanup
Given METRICS_RAW_RETENTION_DAYS=7
And raw metrics older than 7 days exist
When cleanup job executes
Then metrics older than 7 days should be deleted
And metrics within retention should be preserved
And cleanup should report deleted count
Scenario: Hourly metrics cleanup
Given METRICS_HOURLY_RETENTION_DAYS=90
And hourly rollups older than 90 days exist
When cleanup job executes
Then old hourly data should be deleted
And recent rollups should be preserved
Scenario: Batch cleanup for large datasets
Given 10 million raw metric records
When cleanup executes
Then deletion should happen in batches
And database should not lock for extended periods
And progress should be trackableTechnical Requirements:
- Configurable retention:
METRICS_RAW_RETENTION_DAYS,METRICS_HOURLY_RETENTION_DAYS - Batch deletion (default 10,000 per batch)
- Cleanup reports: deleted count, duration, errors
- Safe to run during active operations
US-3: Operations Engineer - Maintenance UI
As an Operations Engineer
I want a maintenance UI for manual metric operations
So that I can trigger rollup/cleanup and monitor status
Acceptance Criteria:
Feature: Maintenance UI
Scenario: View maintenance status
When I access /admin/#maintenance
Then I should see current metrics statistics
And last rollup timestamp
And last cleanup timestamp
And database size information
Scenario: Trigger manual rollup
Given pending raw metrics
When I click "Run Rollup"
Then rollup should execute
And UI should show progress
And completion status should be displayed
Scenario: Trigger manual cleanup
Given metrics beyond retention
When I click "Run Cleanup"
Then cleanup should execute
And deleted count should be displayed
And errors should be reported if any🏗 Architecture
Metrics Data Flow
┌─────────────────────────────────────────────────────────────────────────────┐
│ METRICS DATA LIFECYCLE │
└─────────────────────────────────────────────────────────────────────────────┘
COLLECTION AGGREGATION RETENTION
────────── ─────────── ─────────
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Tool │ │ │ │ │
│ Invocation │──────────▶│ Metrics │──────────▶│ Raw │
│ │ │ Buffer │ │ Storage │
└────────────┘ │ Service │ │ (7 days) │
└─────┬──────┘ └─────┬──────┘
┌────────────┐ │ │
│ Resource │ │ │
│ Access │─────────────────┤ │
└────────────┘ │ │
│ │
┌────────────┐ │ ┌─────────────┐ │
│ Prompt │ │ │ Hourly │ │
│ Execution │─────────────────┴────▶│ Rollup │◀───┘
└────────────┘ │ Job │
└──────┬──────┘
│
┌──────▼──────┐
│ Hourly │
│ Storage │
│ (90 days) │
└──────┬──────┘
│
┌──────▼──────┐
│ Cleanup │
│ Job │
└─────────────┘
Database Tables
┌─────────────────────────────────────────────────────────────────────────────┐
│ METRICS TABLES │
└─────────────────────────────────────────────────────────────────────────────┘
RAW METRICS (tool_metrics, resource_metrics, prompt_metrics, etc.)
────────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────────────────────────────┐
│ id | entity_id | timestamp | duration_ms | success | user_email │
├─────────────────────────────────────────────────────────────────┤
│ High volume, short retention (7 days default) │
│ Indexed on: entity_id, timestamp │
└─────────────────────────────────────────────────────────────────┘
HOURLY ROLLUPS (tool_metrics_hourly, etc.)
────────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────────────────────────────┐
│ id | entity_id | hour | count | sum_ms | avg_ms | min_ms | max │
├─────────────────────────────────────────────────────────────────┤
│ Aggregated, longer retention (90 days default) │
│ Indexed on: entity_id, hour │
└─────────────────────────────────────────────────────────────────┘
📋 Test Environment Setup
Prerequisites
# Environment configuration
export METRICS_RAW_RETENTION_DAYS=7
export METRICS_HOURLY_RETENTION_DAYS=90
export METRICS_CLEANUP_BATCH_SIZE=10000
export GATEWAY_URL="http://localhost:8000"
# Generate test metrics data
python3 << 'EOF'
import requests
import time
# Generate tool invocations to create metrics
for i in range(100):
requests.post(
f"{GATEWAY_URL}/tools/echo/invoke",
headers={"Authorization": f"Bearer {ADMIN_TOKEN}"},
json={"message": f"test {i}"}
)
time.sleep(0.1)
EOFCreate Historical Metrics
-- Insert old metrics for cleanup testing (PostgreSQL)
INSERT INTO tool_metrics (tool_id, timestamp, duration_ms, success, user_email)
SELECT
'test-tool-id',
NOW() - INTERVAL '30 days' + (random() * INTERVAL '25 days'),
random() * 1000,
true,
'test@example.com'
FROM generate_series(1, 10000);🧪 Manual Test Cases
Section 1: Metrics Rollup
| Case | Scenario | Input | Expected | Validation |
|---|---|---|---|---|
| MR-01 | Basic hourly rollup | 100 raw records | 1 hourly record | Counts match |
| MR-02 | Multi-hour rollup | 500 records, 5 hours | 5 hourly records | Per-hour accuracy |
| MR-03 | Rollup idempotency | Run twice | No duplicates | Same counts |
| MR-04 | Empty hour handling | No data in hour | No rollup created | No errors |
MR-01: Basic Hourly Rollup
Preconditions:
- Raw metrics exist for the current hour
- No existing rollup for this hour
Steps:
# Step 1: Check current raw metrics count
curl -s "$GATEWAY_URL/api/admin/metrics/raw/count" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq .
# Step 2: Trigger rollup via API
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/rollup" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq .
# Step 3: Verify hourly rollup created
curl -s "$GATEWAY_URL/api/admin/metrics/hourly" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq '.[-1]'
# Step 4: Validate aggregation accuracy
# Sum of raw durations should equal rollup sum_duration
psql $DATABASE_URL -c "
SELECT
COUNT(*) as raw_count,
SUM(duration_ms) as raw_sum,
AVG(duration_ms) as raw_avg
FROM tool_metrics
WHERE timestamp >= date_trunc('hour', NOW());
"Expected Result:
- Rollup record created for current hour
- Count matches raw record count
- Sum/avg/min/max values accurate
- API returns success
MR-03: Rollup Idempotency
Preconditions:
- Rollup already executed for current hour
Steps:
# Step 1: Get current rollup count
BEFORE=$(psql $DATABASE_URL -t -c "SELECT COUNT(*) FROM tool_metrics_hourly;")
# Step 2: Run rollup again
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/rollup" \
-H "Authorization: Bearer $ADMIN_TOKEN"
# Step 3: Check rollup count unchanged
AFTER=$(psql $DATABASE_URL -t -c "SELECT COUNT(*) FROM tool_metrics_hourly;")
echo "Before: $BEFORE, After: $AFTER"
[ "$BEFORE" = "$AFTER" ] && echo "PASS: Idempotent" || echo "FAIL: Duplicates created"Expected Result:
- Rollup count unchanged
- No duplicate records created
- No errors reported
Section 2: Metrics Cleanup
| Case | Scenario | Retention | Records | Expected |
|---|---|---|---|---|
| MC-01 | Raw cleanup | 7 days | 10k old | All old deleted |
| MC-02 | Preserve recent | 7 days | Mixed | Recent preserved |
| MC-03 | Batch cleanup | 7 days | 100k old | Batched deletion |
| MC-04 | Hourly cleanup | 90 days | 1k old | Old rollups deleted |
MC-01: Raw Metrics Cleanup
Preconditions:
- METRICS_RAW_RETENTION_DAYS=7
- Old metrics (>7 days) exist
Steps:
# Step 1: Check old metrics count
OLD_COUNT=$(psql $DATABASE_URL -t -c "
SELECT COUNT(*) FROM tool_metrics
WHERE timestamp < NOW() - INTERVAL '7 days';
")
echo "Old metrics to delete: $OLD_COUNT"
# Step 2: Trigger cleanup
RESULT=$(curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/cleanup" \
-H "Authorization: Bearer $ADMIN_TOKEN")
echo $RESULT | jq .
# Step 3: Verify deletion
REMAINING=$(psql $DATABASE_URL -t -c "
SELECT COUNT(*) FROM tool_metrics
WHERE timestamp < NOW() - INTERVAL '7 days';
")
echo "Remaining old metrics: $REMAINING"
# Step 4: Verify recent preserved
RECENT=$(psql $DATABASE_URL -t -c "
SELECT COUNT(*) FROM tool_metrics
WHERE timestamp >= NOW() - INTERVAL '7 days';
")
echo "Recent metrics preserved: $RECENT"Expected Result:
- All old metrics deleted
- Recent metrics preserved
- Cleanup reports deleted count
- No errors
MC-03: Batch Cleanup for Large Datasets
Preconditions:
- 100,000+ old metrics records
- METRICS_CLEANUP_BATCH_SIZE=10000
Steps:
# Step 1: Create large dataset
psql $DATABASE_URL -c "
INSERT INTO tool_metrics (tool_id, timestamp, duration_ms, success)
SELECT 'bulk-test', NOW() - INTERVAL '30 days', random()*100, true
FROM generate_series(1, 100000);
"
# Step 2: Monitor cleanup progress
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/cleanup?verbose=true" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq .
# Step 3: Verify batch behavior (check logs)
# Should see multiple batch deletions, not one large delete
grep "cleanup" /var/log/mcpgateway/app.log | tail -20Expected Result:
- Cleanup completes in batches
- No long database locks
- All old records eventually deleted
- Progress reported
Section 3: Maintenance UI
| Case | UI Element | Action | Expected |
|---|---|---|---|
| UI-01 | Dashboard | View | Shows stats |
| UI-02 | Rollup button | Click | Triggers rollup |
| UI-03 | Cleanup button | Click | Triggers cleanup |
| UI-04 | Status refresh | Auto | Updates status |
UI-01: Maintenance Dashboard
Preconditions:
- Admin user logged in
- Gateway running with metrics
Steps:
1. Navigate to http://localhost:8080/admin/#maintenance
2. Verify dashboard displays:
- Current database size
- Raw metrics count
- Hourly rollup count
- Last rollup timestamp
- Last cleanup timestamp
- Retention settings
3. Verify statistics are accurate
Expected Result:
- All statistics displayed
- Values match database
- UI is responsive
UI-02: Manual Rollup via UI
Preconditions:
- Maintenance UI accessible
- Pending raw metrics
Steps:
1. Navigate to /admin/#maintenance
2. Note current rollup count
3. Click "Run Rollup" button
4. Observe progress indicator
5. Wait for completion
6. Verify rollup count increased
Expected Result:
- Progress shown during rollup
- Success message displayed
- Rollup count updated
- No page refresh required
📊 Test Matrix
| Test Case | Rollup | Cleanup | UI | PostgreSQL | SQLite |
|---|---|---|---|---|---|
| MR-01 | ✓ | ✓ | ✓ | ||
| MR-02 | ✓ | ✓ | ✓ | ||
| MR-03 | ✓ | ✓ | ✓ | ||
| MR-04 | ✓ | ✓ | ✓ | ||
| MC-01 | ✓ | ✓ | ✓ | ||
| MC-02 | ✓ | ✓ | ✓ | ||
| MC-03 | ✓ | ✓ | |||
| MC-04 | ✓ | ✓ | ✓ | ||
| UI-01 | ✓ | ✓ | ✓ | ||
| UI-02 | ✓ | ✓ | ✓ | ✓ | |
| UI-03 | ✓ | ✓ | ✓ | ✓ | |
| UI-04 | ✓ | ✓ | ✓ |
✅ Success Criteria
- All 12 test cases pass
- Hourly rollup aggregates correctly
- Rollup is idempotent
- Cleanup respects retention policies
- Batch cleanup handles large datasets
- Maintenance UI functional
- No data loss during operations
- Performance acceptable under load
🔗 Related Files
mcpgateway/services/metrics_rollup_service.pymcpgateway/services/metrics_cleanup_service.pymcpgateway/services/metrics_buffer_service.pymcpgateway/routers/metrics_maintenance.pymcpgateway/admin.py(maintenance UI)
🔗 Related Issues
- [TESTING][FUNCTIONALITY]: Metrics system manual test plan (buffering, rollup, cleanup, queries) #2450 - Metrics system functionality testing