Add backpressure guard for chat text generations#1670
Add backpressure guard for chat text generations#1670pandego wants to merge 2 commits intoexo-explore:mainfrom
Conversation
|
Great suggestion - done in I replaced the env-based limit with explicit API/CLI plumbing:
Validation run:
|
acef4c9 to
3b447fe
Compare
|
Refreshed this branch onto current Validation blocker from this environment:
The PR diff is still the intended focused scope against current
|
|
I rebased this branch onto the current Validation note: I re-ran the focused local check path, but this environment still hits the same If you want, I can keep digging on the |
3b447fe to
b4f9cb3
Compare
|
I rebased this branch onto the current Validation note: I re-ran the focused local check path, but this environment still hits the same If you want, I can keep digging on the |
Summary
Add backpressure protection for chat text generations in the master API.
Problem
When too many text generations are already in flight, new requests can overload the master process and degrade responsiveness.
Root cause
There was no explicit in-flight guard for text generation requests at the API entry point.
Fix
Validation
Closes #1664.