broker-router hits ~4% failures at ~4000 concurrent MCP sessions

## Bug: broker-router crashes at ~3600-4000 concurrent MCP sessions

**Describe the bug**

During [performance experiments](https://docs.google.com/document/d/1UB-Cuhu1tm9UPCvTvEEqIxAz6WiTgpvcJnaSVFutgpQ/edit?tab=t.0), the broker-router pod crashes and restarts when the number of concurrent users (MCP sessions) reaches ~4000. The crash happens well within the pod's resource limits (CPU at ~42%, memory at ~27%), so this is not a resource exhaustion issue. After the restart, all active sessions are lost.

**To Reproduce**
We share the full [repo](https://github.com/arielharush96/locust-mcp/tree/main) with dedicated mcp class as well as all the manifests. 
a generic flow:
      1.Deploy mcp-gateway with a [perf-mock-mcp-server](https://quay.io/repository/rh-ee-aharush/perf-mock-server)
      2.Ramp up concurrent MCP sessions at 8 users/sec up to 8192 MCP sesions
      3.Each user: initialize session, list tools, then call tools for the rest of the experiment
      4.At ~3600-4000 concurrent sessions, the broker-router pod crashes
      
**Expected behavior**

The broker-router should handle increasing concurrent sessions without crashing.

**Screenshots**

<img width="2080" height="880" alt="Image" src="https://github.com/user-attachments/assets/2212f9c9-35a6-438f-808d-d0c2ab5ce31e" />

<img width="2377" height="1177" alt="Image" src="https://github.com/user-attachments/assets/bf4b920b-98dc-4213-8e42-7b90b27f41e7" />

**Additional context**

Reproduced twice on OpenShift cluster (Kubernetes v1.33.6):
- Run 1: crash at 4088 concurrent sessions
- Run 2: crash at 3616 concurrent sessions

**The next logs following the 2nd run:** 

1. Pod metrics (cpu_usage.csv) - 3 crash events with gaps:

Crash 1 (~3600 users):
```
11:51:43  cpu=1496m  mem=267Mi   ← last reading
          ── 39 second gap ──    ← pod gone
11:52:22  cpu=1885m  mem=136Mi   ← restarted (267→136Mi)
```

Crash 2 (~5200 users):
```
11:55:44  cpu=1660m  mem=299Mi   ← last reading
          ── 53 second gap ──    ← pod gone
11:56:37  cpu=1864m  mem=254Mi   ← restarted (299→254Mi)
```

Crash 3 (~7600 users):
```
11:57:03  cpu=1816m  mem=259Mi   ← last reading
          ── 66 second gap ──    ← pod gone
11:58:09  cpu=1936m  mem=331Mi   ← restarted
```

2. client-side fails:

```
users=3616  total_fail=3330   fail/s=0       ← first failures appear
users=3640  total_fail=12422  fail/s=404.7   ← avalanche
```

3. Crash log (broker_router_crash.log, lines 164-186):

```
http: superfluous response.WriteHeader call from
  github.com/mark3labs/mcp-go/server.(*StreamableHTTPServer).handlePost.func1.1 (streamable_http.go:419)
ERROR: Failed to write SSE event: http: wrote more than the declared Content-Length
ERROR: Failed to write SSE event: http: wrote more than the declared Content-Length
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6a78b2]

goroutine 488488 [running]:
github.com/mark3labs/mcp-go/server.(*StreamableHTTPServer).handlePost.func1.1.1()
  mcp-go@v0.43.2/server/streamable_http.go:410
github.com/mark3labs/mcp-go/server.(*StreamableHTTPServer).handlePost.func1.1(...)
  mcp-go@v0.43.2/server/streamable_http.go:425
```

all relevant logs:

[broker_router_crash.log](https://github.com/user-attachments/files/25714181/broker_router_crash.log)

[cpu_usage.csv](https://github.com/user-attachments/files/25714192/cpu_usage.csv)

[gateway_failures.csv](https://github.com/user-attachments/files/25714220/gateway_failures.csv)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

broker-router hits ~4% failures at ~4000 concurrent MCP sessions #630

Bug: broker-router crashes at ~3600-4000 concurrent MCP sessions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

broker-router hits ~4% failures at ~4000 concurrent MCP sessions #630

Description

Bug: broker-router crashes at ~3600-4000 concurrent MCP sessions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions