Skip to content

fix: undo() peeks without popping, breaking repeated undo and leaking redo stack#71

Merged
ilblackdragon merged 6 commits intonearai:mainfrom
AI-Reviewer-QS:fix/undo-stack-never-pops
Feb 17, 2026
Merged

fix: undo() peeks without popping, breaking repeated undo and leaking redo stack#71
ilblackdragon merged 6 commits intonearai:mainfrom
AI-Reviewer-QS:fix/undo-stack-never-pops

Conversation

@AI-Reviewer-QS
Copy link
Copy Markdown
Contributor

Summary

  • UndoManager::undo() calls self.undo_stack.back() which peeks without removing, so repeated undo always returns the same checkpoint while the redo stack grows unboundedly
  • UndoManager::redo() does not save the current state to the undo stack, so undo-after-redo loses the pre-redo state
  • Fixed undo() to use pop_back() and redo() to accept current state parameters and push to undo stack before popping redo
  • Updated caller in agent_loop.rs to match the new signatures

Test plan

  • test_repeated_undo_advances_through_stack — verifies two undos return different checkpoints and stack shrinks
  • test_undo_redo_cycle_preserves_state — verifies undo→redo→undo cycles correctly
  • test_undo_redo_stack_sizes_consistent — verifies undo_count + redo_count stays constant
  • All existing undo tests still pass
  • cargo clippy --all --all-features clean

… redo stack

undo() used self.undo_stack.back() (peek) instead of pop_back(), so
repeated undo always returned the same checkpoint while pushing to
the redo stack unboundedly.

Additionally, redo() did not save the current state to the undo stack,
breaking the undo/redo cycle.

Changes:
- undo(): change back() to pop_back(), return owned Checkpoint
- redo(): accept current_turn/current_messages params, save current
  state to undo stack before popping from redo stack
- Update process_undo/process_redo callers in agent_loop.rs
- Add tests for repeated undo, undo/redo cycling, stack size invariant
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @AI-Reviewer-QS, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical issues in the undo/redo mechanism, ensuring that these operations behave as expected. The changes prevent state loss, allow for proper traversal through historical states, and maintain the integrity of the undo and redo stacks. The core logic of state management for these features has been refactored and thoroughly tested to provide a robust and reliable experience.

Highlights

  • Undo Functionality Fix: The undo() method in UndoManager was incorrectly peeking at the undo stack instead of popping, leading to repeated undo operations returning the same state and an unbounded growth of the redo stack. This has been corrected to use pop_back().
  • Redo Functionality Enhancement: The redo() method now correctly saves the current state to the undo stack before restoring a previously undone state, preventing loss of the pre-redo state during undo-after-redo cycles. It also accepts current state parameters.
  • API Signature Updates: The signatures of UndoManager::undo() and UndoManager::redo() have been updated to return an owned Checkpoint and accept current state parameters, respectively. Callers in agent_loop.rs have been updated to match these new signatures.
  • Comprehensive Test Coverage: New test cases have been added to validate repeated undo operations, ensure correct undo-redo cycles, and verify consistent stack sizes, alongside updates to existing tests.
Changelog
  • src/agent/agent_loop.rs
    • Updated the undo call to handle the new return type and removed an unnecessary clone.
    • Modified the redo call to pass the current turn number and messages as required by the updated UndoManager::redo signature.
  • src/agent/undo.rs
    • Modified undo method to pop the last checkpoint from the undo stack and return an owned Checkpoint.
    • Updated redo method to accept current state parameters, push the current state to the undo stack, and trim the undo stack if it exceeds its capacity.
    • Updated an existing test case to reflect the new undo and redo method signatures and behavior.
    • Added new test cases to verify repeated undo functionality, correct undo-redo cycles, and consistent stack sizes.
Activity
  • The author has provided a detailed summary of the changes and the rationale behind them.
  • A comprehensive test plan has been outlined and executed, including verification of new and existing undo tests, and cargo clippy checks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses critical bugs in the undo/redo functionality, ensuring the undo operation correctly pops from the stack and the redo operation properly saves the current state. However, a security audit identified two significant issues: a high-severity deadlock risk in agent_loop.rs caused by inconsistent locking order between Session and UndoManager, and a critical IDOR vulnerability in the maybe_hydrate_thread method that allows unauthorized access to conversation threads. It is strongly recommended to implement ownership checks in maybe_hydrate_thread and standardize the locking order to prevent deadlocks. Additionally, there is a suggestion to refactor a small piece of duplicated code for improved maintainability.

Comment thread src/agent/agent_loop.rs Outdated
Comment thread src/agent/undo.rs Outdated
AI-Reviewer-QS and others added 4 commits February 14, 2026 10:35
Address review feedback:
- Standardize lock order (Session before UndoManager) in process_undo
  and process_redo to match process_user_input and prevent deadlocks
- Extract push_undo() helper to deduplicate push-and-trim logic shared
  by checkpoint() and redo()
@AI-Reviewer-QS
Copy link
Copy Markdown
Contributor Author

@ilblackdragon can you help me do a code review for this PR? Thank you!

@tribendu
Copy link
Copy Markdown

Code Review: Fix undo() peeks without popping

PR Details


Summary

This PR fixes a critical bug in the undo/redo stack implementation where undo() was peeking at the checkpoint without popping it, causing two significant issues:

  1. Repeated undo broken: Multiple undos would always return the same checkpoint instead of walking backward through history
  2. Redo stack leak: The redo stack would grow indefinitely since undo never consumed from the undo stack

The fix properly changes undo() to pop from the stack (instead of peeking), changes the return type from reference to ownership (Option<&Checkpoint>Option<Checkpoint>), and symmetrically updates redo() to push the current state to the undo stack. The PR also includes excellent test coverage for the new behavior and fixes a potential deadlock by reordering lock acquisition.


Pros

Correct core fix: Changing self.undo_stack.back() to self.undo_stack.pop_back() correctly addresses the reported issue. The undo operation now properly consumes stack entries.

Symmetric undo/redo design: Both operations now properly save current state before restoring from their respective stacks, maintaining a consistent state management approach.

Excellent test coverage: Added three comprehensive tests:

  • test_repeated_undo_advances_through_stack - Validates that repeated undos return distinct checkpoints
  • test_undo_redo_cycle_preserves_state - Tests full undo/redo round-trip
  • test_undo_redo_stack_sizes_consistent - Validates invariant that total stack count remains constant

Deadlock prevention: Reordered lock acquisition in both undo() and redo() to lock session first, then undo manager. This matches the pattern in process_user_input, preventing potential deadlocks.

Reduced cloning: Removed unnecessary .clone() calls in checkpoint restoration (checkpoint.messages.clone()checkpoint.messages), improving performance.

Code cleanup: Extracted push_undo() helper method to eliminate duplication in checkpoint() and redo().

Return type ownership: Changing from Option<&Checkpoint> to Option<Checkpoint> eliminates lifetime complexity and allows callers to own the returned checkpoint.


Concerns

⚠️ Potential deadlock risk with existing code: The lock ordering change (session first, then undo manager) is correct for consistency, but this needs to be verified against all other code that interacts with these locks. If there's any other location that locks in the opposite order without this fix, deadlocks could occur.

⚠️ Lifetime considerations with Vec<ChatMessage>: The current_messages: Vec<ChatMessage> parameter is moved into both undo() and redo(). This is fine for the current use case, but developers should be aware that calling code must clone if they need to retain the messages after calling these operations.

⚠️ Missing documentation for state ownership: The public API changes to undo() and redo() should mention that these methods now consume the current message vector. The existing docstring for undo() mentions "The current state should be saved to redo stack before calling" but doesn't explicitly note the move semantics.

⚠️ Potential test timing edge case: The test test_repeated_undo_advances_through_stack undos from turn 3 to turn 2, then from turn 2 to turn 1. However, the checkpoint at turn 2 was never explicitly created (only turns 0, 1, 2 were created via checkpoint()). This works because the first undo saves turn 3 state to redo and returns turn 2. The test is correct but relies on understanding that the first undo saves turn 3 implicitly.

⚠️ Removed pop_undo() method: The PR removes pop_undo() (used only for cleanup at session end). This is a good cleanup, but it should be verified that the removal sites are correctly handled and this method wasn't exposed in any public API.


Suggestions

  1. Update public API documentation: Add explicit notes to undo() and redo() docstrings about move semantics:

    /// Returns: The previous checkpoint, if one exists.
    /// Note: Consumes `current_messages` - callers must clone if needed after calling.
    
  2. Verify lock ordering across the codebase: Run a code search to ensure all locations that lock both Session and UndoManager use the same ordering (session first). Search for patterns like:

    • session.lock().await followed by undo_manager.lock().await
    • Ensure no undo_manager.lock().await followed by session.lock().await sequences exist
  3. Consider adding an integration test: While the unit tests are excellent, consider adding a test that simulates the actual Agent::undo() / Agent::redo() flow to verify the lock ordering and end-to-end behavior work correctly.

  4. Document the stack invariant: Add a comment or doc to UndoManager explaining the invariant relationship between undo and redo stacks:

    // Invariant: undo_stack.len() + redo_stack.len() <= max_checkpoints * 2
    // Each operation pops from one stack and pushes to the other, preserving count
  5. Add benchmark comment: Consider adding a comment noting that the removal of .clone() is a performance optimization, especially for long conversation histories.

  6. Verify backward compatibility: Since this changes the public API of UndoManager::undo() and UndoManager::redo(), ensure any external callers (if this was a public crate) are updated. Since this appears to be internal API, this may not be applicable.


Conclusion

Status:Approve

This is a well-structured PR that correctly fixes a critical bug with proper test coverage. The changes are well-thought-out, with good attention to deadlock prevention and performance optimization. The test additions significantly increase confidence in the fix and prevent regression.

The minor concerns are mostly about documentation and verification rather than actual bugs. I recommend merging once the documentation updates are applied (or as a follow-up).

Address review feedback requesting documentation about the ownership
semantics of undo/redo parameters and the stack size invariant.
@AI-Reviewer-QS
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @tribendu!

Addressed the actionable feedback in the latest push (6530614):

Done:

  • Move-semantics documentation — Added notes to both undo() and redo() doc comments stating they take ownership of current_messages and callers must clone if they need to retain a copy (suggestion Move whatsapp channel source to channels-src/ for consistency #1).
  • Stack invariant documented — Added a struct-level doc comment to UndoManager explaining that undo_count() + redo_count() stays constant across undo/redo cycles (suggestion feat: Sandbox jobs #4).

Already addressed in prior commit (05db445):

  • Lock ordering — Verified and standardized to always lock session first, then undo manager. Grep-searched the codebase to confirm no reverse ordering exists (suggestion feat: adding Web UI #2).
  • Code deduplication — Extracted push_undo() helper as suggested by the Gemini review, used in both checkpoint() and redo().

Out of scope / noted:

Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Approve

This PR correctly identifies and fixes two real bugs in UndoManager:

Bug Fixes (Correct)

  1. undo() peeked instead of popping -- undo_stack.back() returned a reference without removing the entry, so repeated undos always returned the same checkpoint and the redo stack grew unboundedly. Changing to pop_back() and returning an owned Checkpoint is the right fix.

  2. redo() did not save current state to undo stack -- After redoing, the pre-redo state was lost, breaking undo-after-redo cycles. Adding current_turn and current_messages parameters to redo() and pushing them onto the undo stack before popping redo mirrors the symmetric design of undo().

Lock Ordering Fix (Correct)

The PR also fixes a latent deadlock hazard in process_undo and process_redo. Previously:

  • process_undo/process_redo: locked undo_mgr first, then session
  • process_user_input (checkpoint code at line ~816): locked session first, then undo_mgr

Opposite lock ordering on two mutexes is a classic deadlock setup. The PR moves session.lock().await before get_undo_manager + undo_mgr.lock().await in both functions, matching the order used in process_user_input. This is a valuable incidental fix.

Code Quality

  • The push_undo() helper consolidates the push-and-trim logic previously duplicated in checkpoint(), reducing the chance of future divergence. Clean refactor.
  • Return type change from Option<&Checkpoint> to Option<Checkpoint> eliminates the need for .clone() in the caller and the awkward "extract values before consuming the reference" dance. Good simplification.
  • The redo() early return guard (if self.redo_stack.is_empty()) is necessary -- without it, calling redo() on an empty redo stack would still push to the undo stack, silently corrupting state.
  • Doc comments on the struct and both methods clearly describe the invariant (undo_count() + redo_count() stays constant across undo/redo cycles). Helpful for future maintainers.

Test Coverage

The three new tests are well-designed:

  • test_repeated_undo_advances_through_stack -- directly reproduces the original bug (two undos must return different checkpoints)
  • test_undo_redo_cycle_preserves_state -- verifies the full undo->redo->undo round-trip
  • test_undo_redo_stack_sizes_consistent -- verifies the count invariant documented on the struct

Existing tests are correctly updated for the new signatures.

Minor Observations (Non-blocking)

  1. pop_undo() is now dead code -- It has zero callers in the codebase (before or after this PR). Consider removing it or adding #[allow(dead_code)] with a comment explaining its purpose if it's part of a public API contract. Not blocking since clippy CI passed (it may be suppressed by the pub visibility).

  2. Trimming in push_undo during redo() -- When max_checkpoints is small and a user does many redo operations, push_undo will trim the oldest undo entries. This is correct bounded-memory behavior, but it means redo can silently reduce how far back you can undo. The doc comment could mention this, but the current behavior is reasonable.

CI is green (fmt, clippy, tests all pass). The core fix is correct, the lock ordering improvement is a welcome bonus, and test coverage is thorough.

@tribendu
Copy link
Copy Markdown

GitHub PR Review Batch 3 - nearai/ironclaw

PR #71: Fix undo/redo deadlock and checkpoint behavior

Summary

This PR addresses critical concurrency issues in the undo/redo system. The main changes are:

  1. Reordering lock acquisitions in undo_turn() and redo_turn() to lock session before undo manager, matching process_user_input() pattern to prevent deadlocks
  2. Changing undo() to return owned Checkpoint instead of &Checkpoint, fixing a bug where repeated undos returned the same checkpoint reference instead of walking backwards through history
  3. Adding current_turn and current_messages parameters to redo() so it can save the current state before popping from redo stack
  4. Adding comprehensive tests to verify undo/redo stack invariants and state preservation

Pros

  • Critical bug fix: The deadlock prevention is well-reasoned and follows existing patterns in the codebase
  • Ownership fix: Returning owned Checkpoint from undo() eliminates the borrow/snapshot bug that prevented repeated undos
  • Better API design: The new redo() signature is more explicit about state management
  • Excellent test coverage: Added 3 new tests covering repeated undo, undo/redo cycles, and stack size consistency
  • Documentation: Added helpful docstrings explaining the stack behavior and invariants

Concerns

  • Missing error handling: No check if thread.restore_from_messages() succeeds/fails before returning success
  • Clone overhead: The undo() function now clones messages when saving to redo stack, but this is unavoidable with the API change
  • Test gap: No test for the deadlock scenario itself (would require concurrent test setup)
  • Migration note: The change from &Checkpoint to Checkpoint is a breaking API change for any external users of UndoManager

Suggestions

  • Consider returning Result<Checkpoint, ...> from undo() and propagate restoration errors
  • Add integration test with concurrent calls to verify deadlock is truly fixed
  • Document the breaking API change in upgrade notes if this is a public library

PR #66: Update README architecture diagram to use Unicode box drawing

Summary

This PR updates the README.md architecture diagram from ASCII art to Unicode box drawing characters. The change improves readability and visual appeal of the system architecture diagram, using characters like ╭───╮, , instead of +, -, |. The diagram structure and content remain unchanged.

Pros

  • Better aesthetics: Unicode box drawing creates cleaner, more professional diagrams
  • Improved readability: The visual distinction between components is clearer
  • Modern standard: Unicode box drawing is the standard for terminal-based diagrams in modern tools
  • Preserves structure: The logical diagram structure is unchanged

Concerns

  • Terminal compatibility: Some older terminals or Windows default fonts may not render Unicode box drawing correctly, showing replacement characters
  • Copy-paste issues: Unicode characters may cause issues in some markdown viewers or plain text contexts
  • Diff noise: This change makes future diff review harder since the diagram structure is visually the same but characters changed

Suggestions

  • Consider adding a note about terminal compatibility if users report issues
  • For CI environments, ensure the font/encoding is UTF-8 aware

PR #63: Memory Guardian and Cognitive Routines

Summary

This is a large PR (894 lines of new code) adding a comprehensive memory management system with two layers:

Cognitive Routines (Prompt-Level):

  • Pre-game routine (5-step checklist)
  • Checkpoint reminder templates
  • After-action review templates

Memory Guardian (System-Level):

  • Pre-compaction breadcrumbs (writes state snapshot before context loss)
  • Auto-breadcrumbs (every N tool calls, default 10)
  • Checkpoint gate (escalating pressure: gentle → firm → urgent at 12/20/30 calls)
  • Memory search tracking (reminds agent to search after 8 turns)

The PR also adds line number tracking to memory chunks for citation support (V9 migration), checkpoint tracker to Thread struct, and extensive testing (15 new tests).

Pros

  • Comprehensive documentation: Excellent module-level docs explaining the two-layer design rationale
  • Design rationale: Clear explanation of why both layers exist (instructions vs. enforcement)
  • Good defaults: Sensible thresholds (breadcrumb_interval=10, gate_thresholds=[12,20,30])
  • Defense in depth: Escalating pressure vs. hard blocking is a thoughtful UX choice
  • Sanitization functions: sanitize_checkpoint_text() prevents markdown/prompt injection
  • Extensive testing: 15 tests covering the full functionality

Concerns

  • Large PR: 894 lines makes review difficult; could be split into smaller PRs
  • Config not wired up: CognitiveConfig is defined but not loaded from configuration files or used to enable/disable features
  • Session lock pattern: Multiple session lock acquisitions in the agent loop could become a performance bottleneck with high tool call volume
  • Breadcrumb performance: Async I/O in the tool execution path for every Nth tool call adds latency
  • Missing integration tests: No tests verifying the guardian works end-to-end with compaction/reset
  • Tracker not persisted: CheckpointTracker is in-memory only, resets on session restart

Suggestions

  • Split into smaller PRs: (1) cognitive module structure, (2) guardian integration, (3) database migration
  • Wire up CognitiveConfig from config files/CLI flags so features can be toggled
  • Consider making breadcrumb writing non-blocking (spawn without await)
  • Document the performance impact of breadcrumb I/O
  • Add integration test verifying pre-compaction breadcrumb actually writes
  • Add metrics/logging for guardian actions (breadcrumb writes, gate triggers)

PR #62: Add Tinfoil private inference provider

Summary

This PR adds support for Tinfoil as a new LLM backend. The changes include:

  • Add Tinfoil variant to LlmBackend enum with string parsing
  • Add TinfoilConfig struct with api_key and model fields
  • Add create_tinfoil_provider() function that uses Rig's OpenAI client with https://inference.tinfoil.sh/v1 as base URL
  • Explicitly use completions_api() for Tinfoil (they don't support the newer Responses API)
  • Wire up in create_llm_provider() dispatch
  • Update setup wizard to include tinfoil field in config

Pros

  • Clean implementation: Follows existing provider patterns (Ollama, OpenAI-compatible)
  • Proper error handling: Returns LlmError::AuthFailed when tinfoil config is missing
  • Environment variable support: Reads TINFOIL_API_KEY and TINFOIL_MODEL from env with defaults
  • Good documentation: Note about why completions_api() is explicitly used

Concerns

  • Hardcoded base URL: TINFOIL_BASE_URL constant is not configurable via environment variable
  • No provider documentation: No usage examples or caveats in the main docs
  • Secret exposure: Uses expose_secret() on api_key when passing to Rig - this is the correct approach but worth verifying Rig doesn't log it
  • No health check: No verification that the Tinfoil endpoint works on startup

Suggestions

  • Make TINFOIL_BASE_URL configurable via environment variable
  • Add a brief section in the README about supported LLM providers including Tinfoil
  • Consider adding a quick health check or connection test on startup when tinfoil is configured

PR #61: Add PostgreSQL test workflow for CI

Summary

This PR adds a PostgreSQL with pgvector service to the GitHub Actions test workflow. The changes enable running tests against a real PostgreSQL database instead of libSQL-only testing. The service is configured with pgvector/pgvector:pg16 image and health checks.

Pros

  • Better test coverage: Enables testing against production database backend (PostgreSQL + pgvector)
  • Proper service configuration: Health checks, retries, and proper port mapping
  • Simple integration: Just adds the service and env var for DATABASE_URL

Concerns

  • No database fixtures: No test data setup or migrations for the test database
  • No test splitting: All tests still run against both backends; could be slow
  • Missing test matrix: Could benefit from matrix strategy testing libSQL and PostgreSQL in parallel
  • No cleanup between tests: Tests may interfere with each other's state
  • Flaky test risk: No fixture isolation or transaction rollback

Suggestions

  • Add a job matrix to test libSQL and PostgreSQL in parallel
  • Add migration step before running tests: cargo run -- database migrate
  • Consider wrapping tests in transactions with rollback for isolation
  • Add a separate integration test job vs. unit tests

PR #57: Sandbox job monitoring and credential grants persistence

Summary

This is a large PR (~6333 lines touched) introducing several features:

  1. Job Monitor System: New job_monitor.rs module spawns background tasks that watch for events from sandbox jobs (especially Claude Code) and inject assistant messages back into the agent loop via an injection channel in ChannelManager

  2. Credential Grants Persistence: Added credential_grants_json field to sandbox jobs for restoring credentials on job restart

  3. Claude Code OAuth Token Extraction: New logic to read OAuth tokens from macOS Keychain or Linux ~/.claude/.credentials.json for passing to containers via ANTHROPIC_API_KEY env var

  4. Database improvements: Changed LibSqlBackend::connect() to async and added PRAGMA busy_timeout = 5000 and PRAGMA journal_mode=WAL for better concurrency

  5. NEAR AI URL migration: Changed default from https://cloud-api.near.ai to https://private.near.ai

  6. Worker Docker image: Added GitHub CLI installation and improved layer caching

  7. Model command enhancement: /model now lists available models with the active one marked

Pros

  • Job monitoring is useful: Forwarding Claude Code output to the main agent enables better visibility into sub-agent work
  • Good test coverage: Job monitor has 4 unit tests for message forwarding, filtering, and lifecycle
  • Credential persistence: Restoring grants on job restart is a good UX improvement
  • Database concurrency: WAL mode and busy_timeout reduce lock contention in libSQL
  • OAuth extraction: Clever use of existing Claude auth rather than re-implementing auth

Concerns

  • Monolithic PR: Too many unrelated changes in one PR (job monitor, credentials, database, NEAR AI URLs, Docker, model command)
  • Cross-platform fragility: OAuth token extraction depends on macOS Keychain commands and Linux file paths - could break on different systems
  • Secret handling: OAuth token is passed via ANTHROPIC_API_KEY env var - this could be leaked in logs if the container or tool doesn't sanitize
  • Channel coupling: Job monitor couples Claude Code specifically to the agent loop - not generic for any job type
  • Database breaking change: Making connect() async is a breaking change for any code calling it synchronously
  • No rate limiting: Job monitor could spam the agent loop if the sub-agent produces many messages quickly
  • Missing test forOAuth: No tests for the token extraction logic

Suggestions

  • Split into smaller focused PRs: (1) job monitor, (2) credential grants, (3) database changes (4) Docker/URL migrations
  • Add rate limiting to job monitor (e.g., max 1 message/second)
  • Make job monitor generic (protocol-agnostic) instead of Claude-specific
  • Add tests for OAuth token extraction on both platforms (mocking the Keychain/file read)
  • Document the connect() async migration as a breaking change
  • Add warning logs when job monitor lags or drops messages
  • Verify ANTHROPIC_API_KEY doesn't leak in container logs

Summary

Highest Quality PRs

  1. PR fix: undo() peeks without popping, breaking repeated undo and leaking redo stack #71 - Well-scoped bug fix with excellent documentation and testing
  2. PR feat: add Tinfoil private inference provider #62 - Clean implementation following existing patterns

Needs Splitting

Minor Concerns

Recommendations

@ilblackdragon ilblackdragon merged commit e499795 into nearai:main Feb 17, 2026
11 checks passed
jaswinder6991 pushed a commit to jaswinder6991/ironclaw that referenced this pull request Feb 26, 2026
… redo stack (nearai#71)

* fix: undo() peeks without popping, breaking repeated undo and leaking redo stack

undo() used self.undo_stack.back() (peek) instead of pop_back(), so
repeated undo always returned the same checkpoint while pushing to
the redo stack unboundedly.

Additionally, redo() did not save the current state to the undo stack,
breaking the undo/redo cycle.

Changes:
- undo(): change back() to pop_back(), return owned Checkpoint
- redo(): accept current_turn/current_messages params, save current
  state to undo stack before popping from redo stack
- Update process_undo/process_redo callers in agent_loop.rs
- Add tests for repeated undo, undo/redo cycling, stack size invariant

* fix: standardize lock ordering and extract push_undo helper

Address review feedback:
- Standardize lock order (Session before UndoManager) in process_undo
  and process_redo to match process_user_input and prevent deadlocks
- Extract push_undo() helper to deduplicate push-and-trim logic shared
  by checkpoint() and redo()

* docs: add move-semantics notes and stack invariant to UndoManager

Address review feedback requesting documentation about the ownership
semantics of undo/redo parameters and the stack size invariant.

---------

Co-authored-by: Yi LIU <yi@quantstamp.com>
Co-authored-by: firat.sertgoz <f@nuff.tech>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
… redo stack (nearai#71)

* fix: undo() peeks without popping, breaking repeated undo and leaking redo stack

undo() used self.undo_stack.back() (peek) instead of pop_back(), so
repeated undo always returned the same checkpoint while pushing to
the redo stack unboundedly.

Additionally, redo() did not save the current state to the undo stack,
breaking the undo/redo cycle.

Changes:
- undo(): change back() to pop_back(), return owned Checkpoint
- redo(): accept current_turn/current_messages params, save current
  state to undo stack before popping from redo stack
- Update process_undo/process_redo callers in agent_loop.rs
- Add tests for repeated undo, undo/redo cycling, stack size invariant

* fix: standardize lock ordering and extract push_undo helper

Address review feedback:
- Standardize lock order (Session before UndoManager) in process_undo
  and process_redo to match process_user_input and prevent deadlocks
- Extract push_undo() helper to deduplicate push-and-trim logic shared
  by checkpoint() and redo()

* docs: add move-semantics notes and stack invariant to UndoManager

Address review feedback requesting documentation about the ownership
semantics of undo/redo parameters and the stack size invariant.

---------

Co-authored-by: Yi LIU <yi@quantstamp.com>
Co-authored-by: firat.sertgoz <f@nuff.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants