feat: Implement Resilient Quota Handling, Smart Auth Rotation, and Stability Improvements#10
Merged
feat: Implement Resilient Quota Handling, Smart Auth Rotation, and Stability Improvements#10
Conversation
…ling
## Overview
Comprehensive improvements to stream response handling, message filtering, and authentication rotation management.
## Key Changes
### 🔧 Core Stream Processing Enhancements
- **Timestamp-based filtering**: Implemented wrapper format `{"ts": float, "data": ...}` with stale data filtering
- **Extraneous message prevention**: Added state flag `is_response_finalized` to ignore messages after response completion
- **Robust accumulation logic**: Enhanced dict processing with boundary detection and thinking-to-answer handover protocol
- **Stale data handling**: Filter out queue data from previous requests based on timestamp comparison
### 🛡️ Authentication Rotation Improvements
- **Emergency profile support**: Added `auth_profiles/emergency/` directory with multiple authentication profiles
- **Rotation failure documentation**: Comprehensive troubleshooting guide for "No available auth profiles found" errors
- **Cooldown management**: Added detailed documentation explaining profile cooldown states and recovery strategies
### 📝 Documentation & Testing
- **TTFB timeout investigation**: Added investigation documentation in `docs/spikes/ttfb_timeout_investigation.md`
- **Test coverage**: Created `tests/test_extraneous_messages.py` for message handling validation
- **Enhanced troubleshooting**: Extended `docs/troubleshooting.md` with authentication rotation failure scenarios
### 🔄 Infrastructure Updates
- **Git ignore updates**: Added `auth_profiles/emergency/` to ignore rules
- **Request processing**: Enhanced timeout parameter passing in stream response calls
- **Proxy server**: Implemented timestamp wrapping for queue payloads
## Technical Improvements
- Prevents duplicate responses and message contamination
- Improves system stability during high-load scenarios
- Provides better error handling and recovery mechanisms
- Enhances observability with detailed logging
## Files Modified
- `api_utils/stream.py`: Core timestamp filtering and message handling logic
- `api_utils/response_generators.py`: Response finalization state management
- `api_utils/request_processor.py`: Enhanced timeout handling
- `stream/proxy_server.py`: Timestamp wrapping implementation
- `docs/troubleshooting.md`: Authentication rotation documentation
- `.gitignore`: Emergency profiles directory exclusion
- `auth_profiles/emergency/`: Emergency authentication profiles (new)
- `docs/spikes/`: TTFB investigation documentation (new)
- `tests/`: Extraneous messages test coverage (new)
## Overview Comprehensive implementation of graceful rotation with dual-threshold quota system and enhanced message filtering for improved content sequencing and client compatibility. ## Key Changes ### 🎛️ Graceful Rotation System - **Dual-Threshold Quota Management**: - Soft Limit (450k tokens): Sets rotation flag for graceful processing - Hard Limit (550k tokens): Immediate kill signal to prevent bans - **Three-Phase Rotation Logic**: - Pre-flight rotation check before request processing - Post-request rotation check after stream completion - Queue worker rotation monitoring during idle periods - **NEEDS_ROTATION Flag**: New soft signal flag for non-disruptive rotation ### 📨 Message Filtering & Content Sequencing - **"The Latch" Mechanism**: Proper ordering of reasoning vs body content - **State Tracking**: Prevents reasoning output after body content starts - **Backfill System**: Synthetic content generation when no body produced - **Enhanced Content Sequencing**: Improved reasoning content handling and delivery ### ⚙️ Configuration & Environment - **New Environment Variables**: - QUOTA_SOFT_LIMIT (default: 450000) - QUOTA_HARD_LIMIT (default: 550000) - **Updated .env.example**: Added quota rotation threshold documentation - **Gitignore Updates**: Added auth_profiles/locked exclusion ### 🔧 Technical Improvements - **GlobalState Enhancements**: Added NEEDS_ROTATION flag and dual-limit logic - **Queue Worker Updates**: Multi-point rotation trigger integration - **Request Processor**: Pre-flight rotation handling for edge cases - **Response Generator**: Complete content sequencing overhaul with latch mechanism ## Files Modified - `api_utils/queue_worker.py`: Graceful rotation integration - `api_utils/request_processor.py`: Pre-flight rotation checks - `api_utils/response_generators.py`: Message filtering and sequencing - `config/global_state.py`: Dual-threshold quota system - `config/settings.py`: Environment configuration updates - `.env.example`: Quota threshold documentation - `.gitignore`: Auth profile lock exclusion ## Benefits - **Reduced Service Disruption**: Graceful rotation allows current streams to complete - **Improved Content Quality**: Proper reasoning vs body content ordering - **Better Client Compatibility**: Enhanced backfill mechanisms prevent client errors - **Flexible Quota Management**: Dual-threshold system provides safety margins - **Enhanced Monitoring**: Better logging and state tracking throughout rotation process
…rity This commit introduces several major enhancements to improve the application's stability, resilience, and configurability. Key changes include: - **Graceful Authentication Rotation:** - Implemented a graceful rotation mechanism to handle quota limits proactively. The system now detects when a quota is nearing its limit and rotates to a new authentication profile without interrupting service. - Added a watchdog to monitor for quota exceeded events and trigger immediate rotation. - **Stream Processing and Message Filtering:** - Enhanced stream processing logic to filter extraneous or out-of-order messages, ensuring a cleaner and more reliable data stream. - Introduced a "latch" and "backfill" mechanism in the response generator to prevent reasoning content from appearing after the main body has started and to provide synthetic content when a model finishes thinking without output. - Improved silence detection and handling of client disconnects during streaming. - **Configuration and Settings:** - Centralized and expanded configuration options in the `config/` directory. - Added new settings for feature toggles and timeouts, configurable via environment variables. - **Bug Fixes and Refinements:** - Improved handling of timeouts and client disconnects throughout the request lifecycle. - Refined UI interaction logic in the `PageController` for better stability. - Addressed various race conditions and edge cases in the queue worker and request processor.
This commit introduces a more sophisticated authentication and quota management system to improve stability and resource utilization. Key enhancements include: - **Model-Specific Quotas**: The system now tracks token usage on a per-model basis. This allows for fine-grained control over API consumption and prevents a single high-traffic model from exhausting the profile for all other models. Cooldowns are now also applied per-model. - **Auto-Rotation on Startup**: A new optional feature (`AUTO_AUTH_ROTATION_ON_STARTUP`) has been added to automatically select a healthy authentication profile during headless startup, improving automation and reducing manual intervention. - **Enhanced Stream Stability**: - Replaced `time.time()` with `time.monotonic()` for calculating stream delays and timeouts, making them immune to system time changes. - Improved silence detection logic to more reliably determine the end of a stream. - Added better error handling and fallbacks in the request processor. - **Robust Page Submission**: The UI interaction logic now includes fallbacks to keyboard-based submission (Enter, Ctrl+Enter) if the primary button click fails, making it more resilient to UI variations. - **Proxy Server Hardening**: Implemented more robust error handling and connection closing logic in the underlying proxy server to prevent crashes and resource leaks. - **New Tests**: Added tests to verify the fixes for auth rotation cooldowns and persistence.
This reverts parts of commit a446129. The following features have been reverted due to issues: - Proxy Server Hardening: Reverted changes to error handling and connection closing in the proxy server. - Enhanced Stream Stability: Reverted changes using time.monotonic, improved silence detection, and error handling fallbacks in the request processor. These changes were causing instability and have been rolled back to the previous versions from commit bb62e9c. The features related to model-specific quotas and auto-rotation on startup from the original commit have been retained.
This commit introduces several improvements to enhance the stability and robustness of the streaming and request handling logic. Key changes include: - **Stop Button Logic:** Hardened the stop button logic in the queue worker to handle potential UI changes or detachments gracefully. - **Dynamic Timeouts:** Implemented a more robust dynamic timeout calculation that enforces the configured response completion timeout as a minimum, preventing premature TTFB timeouts on slow models. - **Graceful Rotation:** Added a `just_rotated` flag to skip chat history cleanup immediately after a credential rotation, as the session is fresh. - **UI Generation Timeout:** Replaced a hardcoded UI check interval with a value derived from the `UI_GENERATION_WAIT_TIMEOUT_MS` configuration, making it more flexible. - **Context Handling:** Increased timeouts for UI elements like the "Clear Chat" button and ensured the input area is editable before interaction, improving reliability. - **Proxy Stability:** Implemented a `_safe_close` method in the proxy server to handle SSL shutdown timeouts and prevent potential connection hangs.
- Fix unused exception variables in logging statements - Ensure 'READY' signal is sent after server starts listening
Introduces a global recovery state to gracefully handle quota limits and authentication rotation events without abruptly terminating active streams. Key Changes: - `GlobalState`: Added `IS_RECOVERING` flag and `RECOVERY_EVENT` to globally signal and manage the recovery state. `start_recovery()` and `finish_recovery()` methods centralize state management. A `LAST_ROTATION_TIMESTAMP` is added to mitigate race conditions. - `queue_worker`: The worker now enters a holding pattern when `IS_RECOVERING` is true, preventing it from aborting requests prematurely. It also includes more sophisticated checks to avoid killing streams if a rotation just completed. - `response_generators`: SSE generator now pauses and sends heartbeats to the client when `IS_RECOVERING` is active, keeping the connection alive until recovery is complete. This prevents client-side timeouts. - `use_stream_response`: Extends timeouts dynamically during recovery and adds checks to wait for recovery to initiate before aborting a request due to quota flags. - `server`: The `quota_watchdog` now wraps rotation calls with `start_recovery()` and `finish_recovery()` to correctly signal the state change across the application. - `settings`: Increased the default quota limits to provide a larger operational buffer.
This commit addresses two issues related to authentication profile handling:
1. **Emergency Folder Inclusion**: The startup scanner now correctly identifies and includes profiles in the `auth_profiles/emergency` directory, making them available for rotation and selection.
2. **Strict Headless Startup**: The headless mode startup logic is now more robust and respects user configuration.
* It now checks `saved` and `emergency` folders if the `active` folder is empty, but only if `AUTO_AUTH_ROTATION_ON_STARTUP` is enabled.
* If auto-rotation on startup is disabled, the application will exit as expected instead of attempting to rotate profiles.
This commit introduces a robust mechanism for graceful shutdown, preventing the application from hanging on exit. It also includes several improvements to the authentication and launch processes. Graceful Shutdown Fix: - Previously, the application could get stuck during shutdown if it was in the middle of a long-running Playwright `wait` operation within `_initialize_page_logic`. - A global `threading.Event` (`GlobalState.IS_SHUTTING_DOWN`) is now used to signal a shutdown request. - The launcher (`launch_camoufox.py`) traps SIGINT/SIGTERM to set this event. - The page initialization logic in `browser_utils/initialization.py` now uses `asyncio.wait` with `FIRST_COMPLETED` to race the Playwright visibility check against the shutdown event. This ensures that a shutdown signal will immediately interrupt the wait and allow the application to exit cleanly. - A new test file, `tests/reproduce_shutdown_fix.py`, has been added to simulate this scenario and verify the fix. Other Improvements: - **Auth Handling**: Improved logic for finding, selecting, and saving authentication profiles in debug mode. - **Launcher**: The `launch_camoufox.py` script is now more robust, with better dependency checking, port conflict resolution, and logging. - **Configuration**: Integrated `python-dotenv` to allow for easier configuration management via a `.env` file.
This commit introduces a robust set of fixes to improve the handling of quota-exceeded events, enhance authentication profile rotation, and prevent stream hangs. Key improvements include: - **Smart Auth Rotation**: Rotation logic now considers model-specific cooldowns, preventing the selection of profiles that are temporarily blocked for a specific model. - **Request Holding**: When a quota limit is hit, in-flight requests are now re-queued and held until the auth rotation is complete, preventing dropped connections and failed requests. - **Zombie Stream Prevention**: A concurrency fix has been added to terminate "zombie" streams from previous requests, ensuring that only the active stream consumer receives data. - **Stale "DONE" Signal Handling**: The stream handler now correctly identifies and ignores stale "DONE" signals that could previously cause the stream to terminate prematurely after a rotation. - **Enhanced Testing**: New tests have been added to verify the fixes for quota rotation, smart cooldowns, and zombie stream prevention.
- **New Exception Class**: Added `QuotaExceededRetry` exception for handling quota-related retry scenarios - **Request Retry Logic**: Implemented `process_request_with_retry` function that handles up to 3 retry attempts when quota walls are encountered, waiting for rotation completion between retries - **Rotation Coordination**: Added `rotation_complete_event` to GlobalState for coordinating rotation operations across concurrent requests - **Mid-Stream Protection**: Enhanced response generators to detect quota exceeded events during streaming and raise `QuotaExceededRetry` to trigger retry logic - **Configurable Timeouts**: Refactored silence detection to use configurable `SILENCE_TIMEOUT_MS` instead of hardcoded 5-second threshold - **Import Structure**: Updated module imports and exports across `models/__init__.py`, `api_utils/request_processor.py`, and `api_utils/response_generators.py` This implementation provides robust handling of quota-related failures by: 1. Catching quota exceeded exceptions during request processing 2. Waiting for ongoing rotation operations to complete before retrying 3. Allowing up to 3 retry attempts to handle temporary quota restrictions 4. Preventing mid-stream failures by detecting quota issues during response generation 5. Providing better user experience through automatic retry rather than immediate failure
This commit addresses the authentication cooldown problem by modifying the logic in `browser_utils/auth_rotation.py`. It also introduces a new test file, `tests/test_auth_rotation_cooldown_wait.py`, to verify the fix and prevent future regressions.
- In `queue_worker`, add a connection check before re-queueing a request that failed mid-processing due to a `QuotaExceededError`. - This prevents the worker from attempting to process requests for clients that have already disconnected, saving resources. - The change introduces a dependency on `_test_client_connection` from the `request_processor`. chore: update .gitignore - Add several generated/temporary file patterns to `.gitignore` to avoid committing them to the repository. - This includes `config/profile_usage.json`, `config/cooldown_status.json`, `auth_profiles/locked`, and `auth_profilesBackup`.
…nism - Add `QUOTA_EXCEEDED_EVENT` and error type tracking (`RATE_LIMIT` vs `QUOTA_EXCEEDED`) to `GlobalState` to better distinguish between temporary and permanent exhaustion. - Implement `process_request_with_retry` wrapper in `request_processor.py` to catch `QuotaExceededRetry`, wait for rotation, and retry the request automatically. - Update `response_generators.py` to detect global quota events mid-stream, raising `QuotaExceededRetry` to trigger the resiliency loop. - Add `tests/reproduce_quota_raise.py` to verify `QuotaExceededError` behavior when limits are hit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a comprehensive set of improvements focused on system stability, quota management, and authentication rotation. It implements a resilient retry mechanism for quota errors, "smart" auth rotation that respects model-specific cooldowns, and a robust recovery mode. It also includes graceful shutdown handling and fixes for queue disconnects and proxy logging.
Key Changes
🔄 Quota Resiliency & Retry Logic
feat(quota): ImplementedQuotaExceededRetryandprocess_request_with_retryto automatically retry requests after rotation when quota limits are hit (up to 3 attempts).fix(robust-quota-recovery): IntroducedGlobalState.IS_RECOVERINGand a recovery mode to gracefully handle quota events without dropping active streams.QuotaExceededRetryto trigger the resiliency loop.🔐 Smart Authentication Rotation
feat(smart-auth-rotation): Logic now respects model-specific cooldowns, preventing selection of profiles blocked for specific models.fix(auth-cooldown-fix): Resolved issues with auth cooldown tracking and persistence.fix(auth-rotation-logic): Improved startup scanner to correctly identifyemergencyprofiles and handle headless startup logic more robustly.🛡️ System Stability & Lifecycle
feat(lifecycle): Implemented graceful shutdown usingGlobalState.IS_SHUTTING_DOWNto prevent the application from hanging on exit (racing Playwright waits).fix(queue): Added connection checks inqueue_workerto handle client disconnects during quota errors, saving resources.fix(proxy-logging-readiness): Improved proxy server logging and ensured the 'READY' signal is sent correctly.fix(quota-rotation-stream-stability): Fixed "zombie" streams by terminating stale connections and handling stale "DONE" signals during rotation.Commits
Testing
QuotaExceededRetrybehavior withtests/reproduce_quota_raise.py.tests/test_auth_rotation_cooldown_wait.py.