Context
Long-running stability test to catch resource leaks and degradation over time.
Setup
- 10-15 node cluster (GKE or EKS)
- 200-300 HTTPRoutes across multiple namespaces/gateways
- Steady k6 traffic with request ledger
- Route churn every 5 minutes (cron job adding/removing routes)
- Chaos Mesh: random chaperone pod kill every 15 minutes
Watch for
- Memory growth in chaperone, operator, or ghost VMOD
- Goroutine count trending upward
- File descriptor leaks
- Monotonically increasing Varnish counters that suggest resource leaks
- Request ledger: misroute rate over time, convergence time degradation
Pass criteria
- Zero misroutes outside of bounded convergence windows after pod kills
- No monotonic resource growth
- p99 latency stable over the full 72 hours
Depends on: observability stack, k6 ledger, Chaos Mesh scenarios.
Part of pre-beta quality work.
Context
Long-running stability test to catch resource leaks and degradation over time.
Setup
Watch for
Pass criteria
Depends on: observability stack, k6 ledger, Chaos Mesh scenarios.
Part of pre-beta quality work.