Skip to content

Run 72-hour soak test #57

@perbu

Description

@perbu

Context

Long-running stability test to catch resource leaks and degradation over time.

Setup

  • 10-15 node cluster (GKE or EKS)
  • 200-300 HTTPRoutes across multiple namespaces/gateways
  • Steady k6 traffic with request ledger
  • Route churn every 5 minutes (cron job adding/removing routes)
  • Chaos Mesh: random chaperone pod kill every 15 minutes

Watch for

  • Memory growth in chaperone, operator, or ghost VMOD
  • Goroutine count trending upward
  • File descriptor leaks
  • Monotonically increasing Varnish counters that suggest resource leaks
  • Request ledger: misroute rate over time, convergence time degradation

Pass criteria

  • Zero misroutes outside of bounded convergence windows after pod kills
  • No monotonic resource growth
  • p99 latency stable over the full 72 hours

Depends on: observability stack, k6 ledger, Chaos Mesh scenarios.

Part of pre-beta quality work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    betaPre-beta release worktestingTesting infrastructure and QA

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions