fix(web): prevent browser crash from timer leaks, DOM growth, SSE buffer (#2406) by henrypark133 · Pull Request #2441 · nearai/ironclaw

henrypark133 · 2026-04-14T05:12:11Z

Summary

Timer leak fix: cleanupConnectionState() clears all connection-scoped timers (including gatewayStatusInterval) on SSE reconnect, tab hide, and page unload. Guard added to startGatewayStatusPolling() to prevent double-start; polling restarts on tab restore.
DOM pruning: Cap DOM at 200 message nodes via pruneOldMessages() — called after assistant responses, user messages, and history loads. Streaming messages (data-streaming="true") are preserved.
jobEvents LRU: Cap tracked jobs at 50 with LRU eviction (current job excluded).
SSE buffer: Increase broadcast buffer from 256 to 1024 (configurable via SSE_BROADCAST_BUFFER env var in GatewayConfig, clamped to 65,536 max, zero rejected).
Review fixes (3rd commit): clean gatewayStatusInterval in cleanupConnectionState(), prune after user messages, replace hardcoded 1024 with DEFAULT_BROADCAST_BUFFER in tests, document from_sender() buffer invariant, tighten E2E assertion.

Test plan

cargo fmt — clean
cargo clippy --all --benches --tests --examples --all-features — zero warnings
cargo test — all pass
pytest tests/e2e/scenarios/test_dom_resource_limits.py — DOM pruning, timer leak, streaming preservation
Manual: open web UI, send 200+ messages, verify DOM stays bounded and no "Pages Unresponsive" dialog

🤖 Generated with Claude Code

…fer (#2406) Extended sessions with heavy bot interactions caused Chrome's "Pages Unresponsive" dialog due to accumulated browser resources that were never cleaned up. Fixes: - Add cleanupConnectionState() to clear leaked setInterval/setTimeout timers on SSE reconnect, tab visibility change, and page unload - Cap DOM at 200 message nodes via pruneOldMessages() with streaming- aware pruning (skips data-streaming elements, called at turn boundaries and after loadHistory) - Cap jobEvents Map at 50 entries with LRU eviction (excludes current job from eviction scan) - Increase SSE broadcast buffer from 256 to 1024 (configurable via SSE_BROADCAST_BUFFER env var, with zero-guard to prevent panic) Closes #2406 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… fix E2E timer test Move SSE_BROADCAST_BUFFER env var from direct std::env::var() in sse.rs to GatewayConfig in config/channels.rs, following the convention that all gateway env vars flow through structured config. Add MAX_BROADCAST_BUFFER (65,536) clamp to prevent OOM from misconfiguration. Fix E2E timer leak test to install setInterval monkey-patch via page.add_init_script() before navigation so initialization timers are tracked. Add test_dom_resource_limits.py to E2E CLAUDE.md scenario table. Add unit test for buffer config parsing, zero-rejection, and clamp. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…gs, use constants - Add gatewayStatusInterval to cleanupConnectionState() so it is cleared on reconnect/tab-hide/unload; add guard in startGatewayStatusPolling() to prevent double-start; restart polling on tab visibility restore - Call pruneOldMessages() after addMessage('user', ...) in sendMessage() so DOM stays bounded even during rapid user input - Replace hardcoded broadcast_buffer: 1024 with DEFAULT_BROADCAST_BUFFER in all test construction sites (5 occurrences across 4 files) - Document in from_sender() doc comment why broadcast_buffer is absent - Tighten E2E timer leak assertion from baseline+1 to baseline [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-04-14T05:12:16Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

Addresses Web UI stability issues from #2406 by bounding client-side resource usage (timers, DOM growth) and improving SSE resilience via a configurable broadcast buffer.

Changes:

Add connection-state cleanup and DOM pruning in the Web UI to prevent timer leaks and unbounded message-node growth.
Make SSE broadcast buffer configurable (with sensible defaults and clamping) and plumb the new config through gateway initialization and tests.
Add E2E coverage for DOM pruning, streaming-message preservation, and interval leak regression.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/e2e/scenarios/test_dom_resource_limits.py	New E2E scenarios covering DOM pruning, reconnect interval leak detection, and streaming preservation.
tests/e2e/CLAUDE.md	Documents the new E2E scenario file.
crates/ironclaw_gateway/static/app.js	Adds `cleanupConnectionState()`, `pruneOldMessages()`, jobEvents LRU cap, and polling guard to reduce resource leaks.
src/config/channels.rs	Introduces `broadcast_buffer` in `GatewayConfig`, env override via `SSE_BROADCAST_BUFFER`, clamping, and tests.
src/channels/web/sse.rs	Adds `DEFAULT_BROADCAST_BUFFER` and a new `SseManager` constructor supporting configurable buffer size.
src/channels/web/mod.rs	Wires `GatewayConfig.broadcast_buffer` into `SseManager` creation.
src/tunnel/mod.rs	Updates tunnel tests to provide the new `GatewayConfig.broadcast_buffer` field.
src/channels/web/tests/rebuild_state_preserves_fields.rs	Updates gateway test config to include `broadcast_buffer`.
src/channels/web/tests/no_silent_drop.rs	Updates gateway test config to include `broadcast_buffer`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

henrypark133

Review: Timer/DOM/SSE resource leak fixes (Risk: medium)

Well-structured fix addressing real browser resource exhaustion. The centralized cleanupConnectionState(), DOM cap via pruneOldMessages(), configurable SSE buffer, and job-map eviction are all sound approaches. E2E test coverage is strong — the setInterval monkey-patch via add_init_script is well done.

Positives:

Timer cleanup centralized in one function with clear ownership comment explaining why _doneWithoutResponseTimer is excluded
startGatewayStatusPolling() idempotency guard prevents the interval accumulation bug
SSE buffer config follows existing patterns: env var → clamped config → module factory
data-streaming guard in pruning prevents mid-stream breakage

Concerning: `pruneOldMessages()` in pagination path evicts just-loaded history [Logic]

File: crates/ironclaw_gateway/static/app.js:3037-3044
When the user scrolls up, loadHistory(before) prepends older messages at container.firstChild (line 3037). Then pruneOldMessages() runs (line 3044) and removes elements from the front of the container — exactly the items just prepended. The user scrolls up to see history and it's immediately pruned. The fresh-load path (lines 2978-2991) is fine since it clears the container first.
Suggested fix: Skip pruneOldMessages() when isPaginating is true. Pruning on sendMessage() and assistant-response receipt is sufficient to bound live-session DOM growth; history pagination loads a bounded batch (50 turns) that doesn't need immediate pruning.

Convention notes:

src/channels/web/CLAUDE.md:224 says "Broadcast buffer: 256 events" — should be updated to reflect the new 1024 default and SSE_BROADCAST_BUFFER env var
MAX_DOM_MESSAGES (line 1859) is defined near its function rather than with other constants at lines 92-93 — minor style inconsistency
_loadThreadsTimer is a similar debounce timeout to jobListRefreshTimer but is not cleared in cleanupConnectionState() — benign but asymmetric

- Remove pruneOldMessages() from loadHistory() pagination path to avoid immediately evicting just-prepended older messages - Move MAX_DOM_MESSAGES constant to top-level constants block - Add _loadThreadsTimer to cleanupConnectionState() for consistency - Add assert!(broadcast_buffer > 0) to SseManager constructor with panic doc (tokio broadcast channel requires capacity > 0) - Use Set-based interval tracking in E2E test to prevent counter underflow from double-clear - Update CLAUDE.md broadcast buffer docs (256 → 1024, SSE_BROADCAST_BUFFER) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

henrypark133 · 2026-04-14T05:46:48Z

All items from your review addressed in a96c62a and this follow-up push:

pruneOldMessages() in pagination path — Removed from loadHistory(). Pruning on sendMessage() and assistant-response receipt is sufficient.
CLAUDE.md broadcast buffer docs — Updated from '256 events' to 'SSE_BROADCAST_BUFFER env var (default 1024, clamped to 65,536 max)'.
MAX_DOM_MESSAGES placement — Moved to top-level constants block with JOB_EVENTS_CAP etc.
_loadThreadsTimer — Added to cleanupConnectionState() for consistency.
assert!() CI failure — Removed the assert (no-panics-in-production check), replaced with doc comment noting the precondition.

Replace assert!(broadcast_buffer > 0) with a doc comment noting the precondition. GatewayConfig already rejects 0 at the config layer. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ilblackdragon · 2026-04-14T06:40:57Z

Review

Sound scope and clean Rust side. Request changes on a few correctness issues, and — importantly — the tests here are unit/local only. Please add Playwright coverage under tests/e2e/ so we exercise the real browser behavior (timer cleanup on tab hide / SSE reconnect, DOM cap under sustained streaming, LRU eviction of jobEvents) end-to-end, not just via JSDOM-style unit asserts.

Correctness

web/static/app.js:1867 — pruneOldMessages iterates .message, .activity-group, .time-separator. Pruning by flat index can remove the message following a .time-separator while leaving the separator orphaned. Over long sessions this is a (smaller) unbounded leak. Clean up leading orphan separators after the loop.
tests/e2e/.../test_dom_resource_limits.py:21 — asserts on .message only, but pruneOldMessages counts a superset. The lower bound >= 100 masks the mismatch. Either assert on the same selector the prune uses, or rename the assertion message.
app.js:1071-1080 — jobEvents LRU scans all entries on every event. Map preserves insertion order; jobEvents.keys().next() is O(1). Simpler and faster.
app.js:1871 — degenerate case where everything is data-streaming="true" silently under-prunes. Minor.

v2 coverage

All changes are in the web channel / SSE layer; not engine-specific. No v2-vs-v1 split here. ✅

Tests — please add e2e

Playwright: open two tabs, trigger many messages, hide one tab for > polling interval, restore — assert no duplicate timers firing (count requests to /api/gateway/status before/after).
Playwright: send > 250 messages in a session, assert DOM .message count stays ≤ 200 AND streaming messages are preserved mid-stream.
Playwright: trigger > 50 jobs, assert jobEvents map stays bounded via an instrumentation hook or via memory-pressure proxy.
Rust: unit test that SseManager::with_max_connections_and_buffer actually honors the buffer size (receiver count + send-past-capacity sanity).

Rust side

channels.rs:51 — document that memory = buffer × receivers × event_size. At 65536 × 100 conn × 200B ≈ 1.3GB worst case under lag; ceiling deserves a comment.
sse.rs:76-77 — the "capacity is already baked into tx" comment is under with_max_connections_and_buffer's doc but describes rebuild_state. Move it.

No .unwrap() introduced. parse_optional_env + ConfigError::InvalidValue on zero is the right pattern. 👍

serrrfirat

Paranoid Security Review — APPROVE

PR: #2441 — Prevent browser crash from timer leaks, DOM growth, SSE buffer
Reviewer context: Automated deep security review

Findings

Net security improvement. No new attack surfaces introduced.

No XSS vectors: pruneOldMessages() only calls .remove() on existing DOM nodes. No innerHTML, no dynamic content creation. Existing DOMPurify.sanitize() pipeline unchanged.
Timer DoS surface reduced: cleanupConnectionState() clears five connection-scoped timers on every reconnect path. startGatewayStatusPolling() guard prevents double-interval accumulation.
All exit paths covered: Cleanup on beforeunload, visibilitychange (hidden), and connectSSE() reconnection.
DOM growth bounded: 200-message cap with streaming-node preservation (data-streaming="true" skip).
Job events bounded: 50-entry LRU cap. O(50) eviction — negligible.
SSE buffer capped: Configurable up to 65,536. Only server operators can set env vars — no external DoS vector.
No new .unwrap()/.expect() in production Rust code. Config validation prevents broadcast::channel(0) panic.
E2E tests well-structured: Timer leak test uses add_init_script() monkey-patching before page load.

Optional follow-up

Consider calling pruneOldMessages() after loadHistory() to cover the history-scroll-back growth path.

LGTM.

serrrfirat

lets add playwright tests here

…#2406) Correctness: - pruneOldMessages: clean up orphaned leading time-separators after pruning - jobEvents LRU: replace O(n) scan with O(1) Map insertion-order eviction - Document degenerate all-streaming under-prune case Playwright e2e tests: - Tab hide/restore: no duplicate gateway status polling intervals - DOM cap + streaming: 260 elements prune to ≤200, streaming preserved, no orphan separators - jobEvents bounded: 60 jobs stay capped at ≤50 via LRU eviction - Fix assertion selector to match pruneOldMessages superset, tighten lower bound Rust: - Unit test: SseManager buffer size parameter actually controls lag behavior - Document MAX_BROADCAST_BUFFER memory impact (65K×100×200B ≈ 1.3 GB) - Move "capacity baked into tx" comment from from_sender to rebuild_state Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) The O(1) LRU eviction skips the job that just received an event (moved to end via delete+set), but did not protect the job the user is actively viewing in the detail panel (currentJobId). If the user views a quiet job while 50+ other jobs fire events, the viewed job's events would be evicted and the activity tab would appear empty. Add a currentJobId guard to the eviction loop and a Playwright e2e test that verifies the actively-viewed job survives LRU pressure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

henrypark133 and others added 3 commits April 13, 2026 22:11

Copilot AI review requested due to automatic review settings April 14, 2026 05:12

github-actions bot added scope: channel/web Web gateway channel scope: docs Documentation size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 14, 2026

Copilot started reviewing on behalf of henrypark133 April 14, 2026 05:12 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread crates/ironclaw_gateway/static/app.js

Comment thread tests/e2e/scenarios/test_dom_resource_limits.py Outdated

Comment thread src/channels/web/sse.rs

henrypark133 commented Apr 14, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 14, 2026 05:49

Copilot started reviewing on behalf of henrypark133 April 14, 2026 05:50 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/e2e/scenarios/test_dom_resource_limits.py

Comment thread crates/ironclaw_gateway/static/app.js

serrrfirat approved these changes Apr 14, 2026

View reviewed changes

serrrfirat self-requested a review April 14, 2026 08:10

serrrfirat requested changes Apr 14, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 14, 2026 10:24

Copilot AI reviewed Apr 14, 2026

View reviewed changes

github-actions bot added size: XL 500+ changed lines and removed size: L 200-499 changed lines labels Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(web): prevent browser crash from timer leaks, DOM growth, SSE buffer (#2406)#2441

fix(web): prevent browser crash from timer leaks, DOM growth, SSE buffer (#2406)#2441
henrypark133 merged 8 commits intostagingfrom
worktree-fix-2406-web-ui-crash

henrypark133 commented Apr 14, 2026

Uh oh!

gemini-code-assist bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henrypark133 left a comment

Uh oh!

henrypark133 commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

ilblackdragon commented Apr 14, 2026

Uh oh!

serrrfirat left a comment

Uh oh!

serrrfirat left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

henrypark133 commented Apr 14, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henrypark133 left a comment

Choose a reason for hiding this comment

Review: Timer/DOM/SSE resource leak fixes (Risk: medium)

Concerning: pruneOldMessages() in pagination path evicts just-loaded history [Logic]

Uh oh!

henrypark133 commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ilblackdragon commented Apr 14, 2026

Review

Correctness

v2 coverage

Tests — please add e2e

Rust side

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Paranoid Security Review — APPROVE

Findings

Optional follow-up

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Concerning: `pruneOldMessages()` in pagination path evicts just-loaded history [Logic]