Skip to content

[TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention) #2460

@crivetimihai

Description

@crivetimihai

📊 [TESTING][OPERATIONS]: Metrics Maintenance Manual Test Plan (Rollup, Cleanup, Retention)

Goal

Produce a comprehensive manual test plan for metrics maintenance operations including hourly rollup aggregation, retention-based cleanup, and the admin maintenance UI functionality.

Why Now?

Metrics accumulate rapidly in production environments:

  1. Storage Management: Prevent unbounded database growth from raw metrics
  2. Query Performance: Ensure metrics queries remain fast with proper rollups
  3. Compliance: Validate data retention policies are correctly enforced
  4. Operational Visibility: Confirm the maintenance UI provides accurate controls

📖 User Stories

US-1: Platform Admin - Metrics Rollup Operations

As a Platform Administrator
I want automated hourly metrics aggregation
So that raw metrics are summarized for efficient querying

Acceptance Criteria:

Feature: Metrics Rollup

  Scenario: Hourly rollup aggregation
    Given raw metrics data for the past hour
    When the rollup job executes
    Then hourly aggregates should be created
    And aggregates should include: count, sum, avg, min, max
    And raw data should be preserved until retention expires

  Scenario: Rollup idempotency
    Given a completed rollup for hour H
    When rollup runs again for hour H
    Then existing aggregates should not be duplicated
    And counts should remain accurate

  Scenario: Rollup for all entity types
    Given metrics for tools, resources, prompts, servers, and agents
    When rollup executes
    Then all entity types should have hourly summaries
    And per-entity statistics should be accurate

Technical Requirements:

  • Rollup creates *_metrics_hourly records
  • Aggregation includes: invocation_count, total_duration, avg_duration, min_duration, max_duration
  • Rollup is idempotent (safe to rerun)
  • Rollup handles timezone correctly (UTC)
US-2: DBA - Metrics Cleanup Operations

As a Database Administrator
I want automated metrics cleanup based on retention policy
So that database size is controlled

Acceptance Criteria:

Feature: Metrics Cleanup

  Scenario: Raw metrics cleanup
    Given METRICS_RAW_RETENTION_DAYS=7
    And raw metrics older than 7 days exist
    When cleanup job executes
    Then metrics older than 7 days should be deleted
    And metrics within retention should be preserved
    And cleanup should report deleted count

  Scenario: Hourly metrics cleanup
    Given METRICS_HOURLY_RETENTION_DAYS=90
    And hourly rollups older than 90 days exist
    When cleanup job executes
    Then old hourly data should be deleted
    And recent rollups should be preserved

  Scenario: Batch cleanup for large datasets
    Given 10 million raw metric records
    When cleanup executes
    Then deletion should happen in batches
    And database should not lock for extended periods
    And progress should be trackable

Technical Requirements:

  • Configurable retention: METRICS_RAW_RETENTION_DAYS, METRICS_HOURLY_RETENTION_DAYS
  • Batch deletion (default 10,000 per batch)
  • Cleanup reports: deleted count, duration, errors
  • Safe to run during active operations
US-3: Operations Engineer - Maintenance UI

As an Operations Engineer
I want a maintenance UI for manual metric operations
So that I can trigger rollup/cleanup and monitor status

Acceptance Criteria:

Feature: Maintenance UI

  Scenario: View maintenance status
    When I access /admin/#maintenance
    Then I should see current metrics statistics
    And last rollup timestamp
    And last cleanup timestamp
    And database size information

  Scenario: Trigger manual rollup
    Given pending raw metrics
    When I click "Run Rollup"
    Then rollup should execute
    And UI should show progress
    And completion status should be displayed

  Scenario: Trigger manual cleanup
    Given metrics beyond retention
    When I click "Run Cleanup"
    Then cleanup should execute
    And deleted count should be displayed
    And errors should be reported if any

🏗 Architecture

Metrics Data Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                         METRICS DATA LIFECYCLE                               │
└─────────────────────────────────────────────────────────────────────────────┘

    COLLECTION                AGGREGATION              RETENTION
    ──────────               ───────────              ─────────

  ┌────────────┐           ┌────────────┐           ┌────────────┐
  │ Tool       │           │            │           │            │
  │ Invocation │──────────▶│  Metrics   │──────────▶│  Raw       │
  │            │           │  Buffer    │           │  Storage   │
  └────────────┘           │  Service   │           │  (7 days)  │
                           └─────┬──────┘           └─────┬──────┘
  ┌────────────┐                 │                        │
  │ Resource   │                 │                        │
  │ Access     │─────────────────┤                        │
  └────────────┘                 │                        │
                                 │                        │
  ┌────────────┐                 │     ┌─────────────┐    │
  │ Prompt     │                 │     │   Hourly    │    │
  │ Execution  │─────────────────┴────▶│   Rollup    │◀───┘
  └────────────┘                       │   Job       │
                                       └──────┬──────┘
                                              │
                                       ┌──────▼──────┐
                                       │   Hourly    │
                                       │   Storage   │
                                       │  (90 days)  │
                                       └──────┬──────┘
                                              │
                                       ┌──────▼──────┐
                                       │   Cleanup   │
                                       │   Job       │
                                       └─────────────┘

Database Tables

┌─────────────────────────────────────────────────────────────────────────────┐
│                         METRICS TABLES                                       │
└─────────────────────────────────────────────────────────────────────────────┘

    RAW METRICS (tool_metrics, resource_metrics, prompt_metrics, etc.)
    ────────────────────────────────────────────────────────────────────
    ┌─────────────────────────────────────────────────────────────────┐
    │ id | entity_id | timestamp | duration_ms | success | user_email │
    ├─────────────────────────────────────────────────────────────────┤
    │ High volume, short retention (7 days default)                   │
    │ Indexed on: entity_id, timestamp                                │
    └─────────────────────────────────────────────────────────────────┘

    HOURLY ROLLUPS (tool_metrics_hourly, etc.)
    ────────────────────────────────────────────────────────────────────
    ┌─────────────────────────────────────────────────────────────────┐
    │ id | entity_id | hour | count | sum_ms | avg_ms | min_ms | max  │
    ├─────────────────────────────────────────────────────────────────┤
    │ Aggregated, longer retention (90 days default)                  │
    │ Indexed on: entity_id, hour                                     │
    └─────────────────────────────────────────────────────────────────┘

📋 Test Environment Setup

Prerequisites

# Environment configuration
export METRICS_RAW_RETENTION_DAYS=7
export METRICS_HOURLY_RETENTION_DAYS=90
export METRICS_CLEANUP_BATCH_SIZE=10000
export GATEWAY_URL="http://localhost:8000"

# Generate test metrics data
python3 << 'EOF'
import requests
import time

# Generate tool invocations to create metrics
for i in range(100):
    requests.post(
        f"{GATEWAY_URL}/tools/echo/invoke",
        headers={"Authorization": f"Bearer {ADMIN_TOKEN}"},
        json={"message": f"test {i}"}
    )
    time.sleep(0.1)
EOF

Create Historical Metrics

-- Insert old metrics for cleanup testing (PostgreSQL)
INSERT INTO tool_metrics (tool_id, timestamp, duration_ms, success, user_email)
SELECT
    'test-tool-id',
    NOW() - INTERVAL '30 days' + (random() * INTERVAL '25 days'),
    random() * 1000,
    true,
    'test@example.com'
FROM generate_series(1, 10000);

🧪 Manual Test Cases

Section 1: Metrics Rollup

Case Scenario Input Expected Validation
MR-01 Basic hourly rollup 100 raw records 1 hourly record Counts match
MR-02 Multi-hour rollup 500 records, 5 hours 5 hourly records Per-hour accuracy
MR-03 Rollup idempotency Run twice No duplicates Same counts
MR-04 Empty hour handling No data in hour No rollup created No errors
MR-01: Basic Hourly Rollup

Preconditions:

  • Raw metrics exist for the current hour
  • No existing rollup for this hour

Steps:

# Step 1: Check current raw metrics count
curl -s "$GATEWAY_URL/api/admin/metrics/raw/count" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .

# Step 2: Trigger rollup via API
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/rollup" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .

# Step 3: Verify hourly rollup created
curl -s "$GATEWAY_URL/api/admin/metrics/hourly" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.[-1]'

# Step 4: Validate aggregation accuracy
# Sum of raw durations should equal rollup sum_duration
psql $DATABASE_URL -c "
  SELECT
    COUNT(*) as raw_count,
    SUM(duration_ms) as raw_sum,
    AVG(duration_ms) as raw_avg
  FROM tool_metrics
  WHERE timestamp >= date_trunc('hour', NOW());
"

Expected Result:

  • Rollup record created for current hour
  • Count matches raw record count
  • Sum/avg/min/max values accurate
  • API returns success
MR-03: Rollup Idempotency

Preconditions:

  • Rollup already executed for current hour

Steps:

# Step 1: Get current rollup count
BEFORE=$(psql $DATABASE_URL -t -c "SELECT COUNT(*) FROM tool_metrics_hourly;")

# Step 2: Run rollup again
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/rollup" \
  -H "Authorization: Bearer $ADMIN_TOKEN"

# Step 3: Check rollup count unchanged
AFTER=$(psql $DATABASE_URL -t -c "SELECT COUNT(*) FROM tool_metrics_hourly;")

echo "Before: $BEFORE, After: $AFTER"
[ "$BEFORE" = "$AFTER" ] && echo "PASS: Idempotent" || echo "FAIL: Duplicates created"

Expected Result:

  • Rollup count unchanged
  • No duplicate records created
  • No errors reported

Section 2: Metrics Cleanup

Case Scenario Retention Records Expected
MC-01 Raw cleanup 7 days 10k old All old deleted
MC-02 Preserve recent 7 days Mixed Recent preserved
MC-03 Batch cleanup 7 days 100k old Batched deletion
MC-04 Hourly cleanup 90 days 1k old Old rollups deleted
MC-01: Raw Metrics Cleanup

Preconditions:

  • METRICS_RAW_RETENTION_DAYS=7
  • Old metrics (>7 days) exist

Steps:

# Step 1: Check old metrics count
OLD_COUNT=$(psql $DATABASE_URL -t -c "
  SELECT COUNT(*) FROM tool_metrics
  WHERE timestamp < NOW() - INTERVAL '7 days';
")
echo "Old metrics to delete: $OLD_COUNT"

# Step 2: Trigger cleanup
RESULT=$(curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/cleanup" \
  -H "Authorization: Bearer $ADMIN_TOKEN")
echo $RESULT | jq .

# Step 3: Verify deletion
REMAINING=$(psql $DATABASE_URL -t -c "
  SELECT COUNT(*) FROM tool_metrics
  WHERE timestamp < NOW() - INTERVAL '7 days';
")
echo "Remaining old metrics: $REMAINING"

# Step 4: Verify recent preserved
RECENT=$(psql $DATABASE_URL -t -c "
  SELECT COUNT(*) FROM tool_metrics
  WHERE timestamp >= NOW() - INTERVAL '7 days';
")
echo "Recent metrics preserved: $RECENT"

Expected Result:

  • All old metrics deleted
  • Recent metrics preserved
  • Cleanup reports deleted count
  • No errors
MC-03: Batch Cleanup for Large Datasets

Preconditions:

  • 100,000+ old metrics records
  • METRICS_CLEANUP_BATCH_SIZE=10000

Steps:

# Step 1: Create large dataset
psql $DATABASE_URL -c "
  INSERT INTO tool_metrics (tool_id, timestamp, duration_ms, success)
  SELECT 'bulk-test', NOW() - INTERVAL '30 days', random()*100, true
  FROM generate_series(1, 100000);
"

# Step 2: Monitor cleanup progress
curl -s -X POST "$GATEWAY_URL/api/admin/maintenance/cleanup?verbose=true" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq .

# Step 3: Verify batch behavior (check logs)
# Should see multiple batch deletions, not one large delete
grep "cleanup" /var/log/mcpgateway/app.log | tail -20

Expected Result:

  • Cleanup completes in batches
  • No long database locks
  • All old records eventually deleted
  • Progress reported

Section 3: Maintenance UI

Case UI Element Action Expected
UI-01 Dashboard View Shows stats
UI-02 Rollup button Click Triggers rollup
UI-03 Cleanup button Click Triggers cleanup
UI-04 Status refresh Auto Updates status
UI-01: Maintenance Dashboard

Preconditions:

  • Admin user logged in
  • Gateway running with metrics

Steps:

1. Navigate to http://localhost:8080/admin/#maintenance
2. Verify dashboard displays:
   - Current database size
   - Raw metrics count
   - Hourly rollup count
   - Last rollup timestamp
   - Last cleanup timestamp
   - Retention settings
3. Verify statistics are accurate

Expected Result:

  • All statistics displayed
  • Values match database
  • UI is responsive
UI-02: Manual Rollup via UI

Preconditions:

  • Maintenance UI accessible
  • Pending raw metrics

Steps:

1. Navigate to /admin/#maintenance
2. Note current rollup count
3. Click "Run Rollup" button
4. Observe progress indicator
5. Wait for completion
6. Verify rollup count increased

Expected Result:

  • Progress shown during rollup
  • Success message displayed
  • Rollup count updated
  • No page refresh required

📊 Test Matrix

Test Case Rollup Cleanup UI PostgreSQL SQLite
MR-01
MR-02
MR-03
MR-04
MC-01
MC-02
MC-03
MC-04
UI-01
UI-02
UI-03
UI-04

✅ Success Criteria

  • All 12 test cases pass
  • Hourly rollup aggregates correctly
  • Rollup is idempotent
  • Cleanup respects retention policies
  • Batch cleanup handles large datasets
  • Maintenance UI functional
  • No data loss during operations
  • Performance acceptable under load

🔗 Related Files

  • mcpgateway/services/metrics_rollup_service.py
  • mcpgateway/services/metrics_cleanup_service.py
  • mcpgateway/services/metrics_buffer_service.py
  • mcpgateway/routers/metrics_maintenance.py
  • mcpgateway/admin.py (maintenance UI)

🔗 Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releasechoreLinting, formatting, dependency hygiene, or project maintenance choresmanual-testingManual testing / test planning issuesreadyValidated, ready-to-work-on itemstestingTesting (unit, e2e, manual, automated, etc)

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions