Skip to content

Add backpressure guard for chat text generations#1670

Open
pandego wants to merge 2 commits intoexo-explore:mainfrom
pandego:fix/1664-textgen-backpressure
Open

Add backpressure guard for chat text generations#1670
pandego wants to merge 2 commits intoexo-explore:mainfrom
pandego:fix/1664-textgen-backpressure

Conversation

@pandego
Copy link
Contributor

@pandego pandego commented Mar 6, 2026

Summary

Add backpressure protection for chat text generations in the master API.

Problem

When too many text generations are already in flight, new requests can overload the master process and degrade responsiveness.

Root cause

There was no explicit in-flight guard for text generation requests at the API entry point.

Fix

  • Add a configurable backpressure guard for in-flight text generation requests.
  • Return HTTP 429 when the in-flight count is at or above the configured limit.
  • Add focused tests for both allow and reject behavior.

Validation

  • uv run pytest src/exo/master/tests/test_text_generation_backpressure.py src/exo/master/tests/test_cancel_command.py

Closes #1664.

Copy link
Member

@Evanev7 Evanev7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the direction - can we pass this through the construction of the API & a cli flag rather than an env var? i'm working towards implementing live configuration and it'd be good to start plumbing these things instead of using env.

@pandego
Copy link
Contributor Author

pandego commented Mar 6, 2026

Great suggestion - done in acef4c92.

I replaced the env-based limit with explicit API/CLI plumbing:

  • added CLI flag --max-in-flight-text-generations (default 2)
  • wired Args.max_in_flight_text_generations through Node.create into API(...)
  • updated API to use self.max_in_flight_text_generations in _enforce_text_generation_backpressure
  • updated backpressure tests to assert configured limit behavior without env patching

Validation run:

  • uv run pytest src/exo/master/tests/test_text_generation_backpressure.py -q (2 passed)

@pandego pandego force-pushed the fix/1664-textgen-backpressure branch from acef4c9 to 3b447fe Compare March 15, 2026 08:34
@pandego
Copy link
Contributor Author

pandego commented Mar 15, 2026

Refreshed this branch onto current main and force-pushed the updated PR branch.

Validation blocker from this environment:

  • uv run pytest src/exo/master/tests/test_text_generation_backpressure.py src/exo/master/tests/test_cancel_command.py -q
  • blocked while building the mlx dependency because the local toolchain cannot find metal (xcrun: error: unable to find utility "metal")

The PR diff is still the intended focused scope against current main:

  • src/exo/main.py
  • src/exo/master/api.py
  • src/exo/master/tests/test_text_generation_backpressure.py

@pandego
Copy link
Contributor Author

pandego commented Mar 24, 2026

I rebased this branch onto the current main so the dirty merge state is cleared.

Validation note: I re-ran the focused local check path, but this environment still hits the same mlx Metal Toolchain blocker before the target path can complete, so I could not produce a fresh green local run here.

If you want, I can keep digging on the mlx blocker next, but the branch itself is refreshed and ready for another look.

@pandego pandego force-pushed the fix/1664-textgen-backpressure branch from 3b447fe to b4f9cb3 Compare March 24, 2026 12:11
@pandego
Copy link
Contributor Author

pandego commented Mar 25, 2026

I rebased this branch onto the current main so the dirty merge state is cleared.

Validation note: I re-ran the focused local check path, but this environment still hits the same mlx Metal Toolchain blocker before the target path can complete, so I could not produce a fresh green local run here.

If you want, I can keep digging on the mlx blocker next, but the branch itself is refreshed and ready for another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Cline sub-agents and parallel tool calling crashes cluster

2 participants