Skip to content

feat(routines): approval context for autonomous job execution#577

Merged
ilblackdragon merged 16 commits intomainfrom
feat/routine-approval-context
Mar 7, 2026
Merged

feat(routines): approval context for autonomous job execution#577
ilblackdragon merged 16 commits intomainfrom
feat/routine-approval-context

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon commented Mar 5, 2026

Summary

  • ApprovalContext — new enum that distinguishes interactive sessions (human approves tools) from autonomous jobs (pre-authorized tool lists). Tools with ApprovalRequirement::Always are blocked in autonomous mode unless explicitly listed in tool_permissions.
  • Routine tool_permissionsroutine_create now accepts a tool_permissions array, and the routine engine builds an ApprovalContext::Autonomous from it before dispatching full_job routines.
  • routine_fire tool — new tool to manually trigger a routine, enabling deterministic E2E testing of the full routine execution stack.
  • HttpInterceptor propagationhttp_interceptor flows from AgentDepsSchedulerWorkerDepsJobContext, enabling mock HTTP responses in routine worker tests.
  • Completed→Completed guard — prevents InvalidTransition errors when a job's execution loop and run() both try to mark a job completed.
  • E2E trace fixture — 2-turn routine_news_digest.json testing routine creation and firing, with the routine worker exercising http (mock) → memory_write → message broadcast.

Fixes

  • Tool message requires authentication in autonomous routine jobs
  • Cannot transition from completed to completed state machine error

Review feedback addressed

  • Extracted ApprovalContext::is_blocked_or_default() helper (dedup approval checks)
  • Extracted parse_tool_permissions() helper (dedup JSON parsing)
  • Fixed test name/docstring mismatch (test_mark_completed_twice_returns_error)
  • Fixed flaky index-based assertion → content-based find()
  • Updated ApprovalContext docs (autonomous-only)
  • Added TODOs for subtask approval propagation and global message tool race

Changes

File Change
src/tools/tool.rs ApprovalContext enum + is_blocked() + is_blocked_or_default() + 5 tests
src/tools/mod.rs Re-export ApprovalContext
src/agent/worker.rs Context-aware approval check, http_interceptor on WorkerDeps, completed guard + 4 tests
src/agent/scheduler.rs dispatch_job_inner() dedup, dispatch_job_with_context(), http_interceptor propagation, approval in execute_tool_task + 3 tests
src/agent/agent_loop.rs Propagate http_interceptor from AgentDeps to Scheduler
src/agent/routine.rs tool_permissions field on FullJob action + parse_tool_permissions() helper
src/agent/routine_engine.rs Build ApprovalContext from permissions, set message tool context
src/tools/builtin/routine.rs RoutineFireTool + parse tool_permissions via shared helper
src/tools/registry.rs Register routine_fire tool
src/tools/schema_validator.rs routine_fire schema entry
tests/fixtures/llm_traces/advanced/routine_news_digest.json 2-turn trace: create routine + fire with http → memory_write → message
tests/e2e_advanced_traces.rs routine_news_digest test with mock HTTP exchanges
tests/support/test_rig.rs with_routines() + with_http_exchanges() builder methods

Test plan

  • cargo check --all-features passes
  • cargo clippy --all --all-features — zero warnings
  • cargo fmt -- --check — clean
  • Unit tests for ApprovalContext::is_blocked() + is_blocked_or_default() (5 tests in tool.rs)
  • Unit tests for worker approval logic (4 tests in worker.rs)
  • Integration tests for scheduler approval context (3 tests in scheduler.rs)
  • Routine action roundtrip test with tool_permissions
  • E2E routine_news_digest passes — exercises full stack: routine_create → routine_fire → http (mock) → memory_write → message broadcast

🤖 Generated with Claude Code

ilblackdragon and others added 2 commits March 5, 2026 09:12
Routines and background jobs were unable to use any tools that required
approval (file ops, shell, message, http), making them effectively
useless. This adds an ApprovalContext system that lets autonomous jobs
pre-authorize tools at dispatch time.

- Add ApprovalContext enum with Autonomous variant that auto-approves
  UnlessAutoApproved tools and optionally pre-authorizes Always tools
- Add tool_permissions field to RoutineAction::FullJob for pre-authorizing
  Always-gated tools (e.g. destructive shell, cross-channel messaging)
- Add Scheduler::dispatch_job_with_context() to thread approval context
  through to workers
- Set message tool default channel/target from routine NotifyConfig
  so routines can send results without cross-channel approval
- Fix Completed→Completed state transition error in worker (plan marks
  job completed, then direct loop or outer run() tries again)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a 3-turn trace fixture and test that exercises:
- Turn 1: routine_create with full_job mode and tool_permissions
- Turn 2: Simulated digest workflow with echo + memory_write
- Turn 3: Verification via memory_search

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 5, 2026 19:08
@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: tool Tool infrastructure scope: tool/builtin Built-in tools size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 5, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's capability to manage tool execution within autonomous jobs, such as scheduled routines. By introducing a granular ApprovalContext, it provides a more secure and controlled environment for automated workflows, allowing UnlessAutoApproved tools to run freely while requiring explicit authorization for more sensitive Always-approval tools. This change improves the reliability of autonomous agents and addresses critical state management issues.

Highlights

  • ApprovalContext Enum: Introduced a new ApprovalContext enum to differentiate between interactive sessions (human approves tools) and autonomous jobs (pre-authorized tool lists). Tools with ApprovalRequirement::Always are now blocked in autonomous mode unless explicitly listed in tool_permissions.
  • Routine Tool Permissions: The routine_create function now accepts a tool_permissions array. The routine engine uses this to build an ApprovalContext::Autonomous before dispatching full_job routines, allowing specific tools to be pre-authorized.
  • Completed to Completed Guard: Implemented a guard to prevent InvalidTransition errors when a job's execution loop and run() both attempt to mark a job as Completed, ensuring state machine integrity.
  • New E2E Trace Fixture: Added a 3-turn routine_news_digest.json end-to-end testing routine that covers routine creation, simulated digest workflow (echo + memory_write), and verification (memory_search).
  • Bug Fixes: Resolved issues where Tool message requires authentication occurred in autonomous routine jobs and Cannot transition from completed to completed state machine errors.
Changelog
  • src/agent/routine.rs
    • Added tool_permissions: Vec<String> field to the RoutineAction::FullJob enum variant.
    • Updated RoutineAction::from_db to parse tool_permissions from JSON configuration.
    • Modified RoutineAction::to_config_json to serialize the new tool_permissions field.
    • Extended the RoutineAction::FullJob test case to include tool_permissions for roundtrip verification.
  • src/agent/routine_engine.rs
    • Imported the ApprovalContext enum for use in routine execution.
    • Modified execute_routine to pass the tool_permissions from RoutineAction::FullJob to execute_full_job.
    • Updated the execute_full_job function signature to accept tool_permissions.
    • Added logic to execute_full_job to set the message tool's default channel and target based on the routine's notify configuration.
    • Constructed an ApprovalContext::autonomous_with_tools using the provided tool_permissions and passed it to dispatch_job_with_context.
  • src/agent/scheduler.rs
    • Imported ApprovalContext for use in job scheduling and tool execution.
    • Introduced dispatch_job_with_context to allow dispatching jobs with a specified ApprovalContext.
    • Created schedule_with_context to internally handle job scheduling with an optional ApprovalContext.
    • Modified execute_tool_task to perform tool approval checks using the approval_context if provided, otherwise falling back to default behavior.
    • Added new unit tests to verify execute_tool_task behavior with and without ApprovalContext for UnlessAutoApproved and Always tools.
  • src/agent/worker.rs
    • Imported ApprovalContext for worker dependencies.
    • Added an approval_context: Option<ApprovalContext> field to the WorkerDeps struct.
    • Implemented a guard in the run() method to prevent errors when a job attempts to transition from Completed to Completed.
    • Updated the execution_loop to check for the Completed state to avoid redundant processing.
    • Modified execute_tool to use the deps.approval_context for determining tool approval status.
    • Added new unit tests for test_mark_completed_twice_does_not_error and test_approval_context_unblocks_unless_auto_approved and test_approval_context_blocks_always_unless_permitted.
  • src/tools/builtin/routine.rs
    • Added tool_permissions as an array parameter to the routine_create tool's JSON schema.
    • Updated the execute method to parse the tool_permissions parameter when creating a full_job routine action.
  • src/tools/mod.rs
    • Re-exported the newly defined ApprovalContext enum.
  • src/tools/tool.rs
    • Defined the ApprovalContext enum, including the Autonomous variant with an allowed_tools set.
    • Implemented autonomous() and autonomous_with_tools() constructors for ApprovalContext.
    • Added the is_blocked() method to ApprovalContext to determine if a tool invocation is blocked based on its requirement and the context's allowed tools.
    • Included unit tests to validate the ApprovalContext's is_blocked logic for various approval requirements and tool permissions.
  • tests/e2e_advanced_traces.rs
    • Added a new asynchronous test function routine_news_digest to validate the end-to-end routine workflow.
    • Adjusted section numbering in comments to accommodate the new test.
  • tests/fixtures/llm_traces/advanced/routine_news_digest.json
    • Added a new JSON trace fixture to simulate a multi-turn routine news digest workflow, including tool calls for routine_create, memory_write, memory_search, and echo.
Activity
  • cargo check --all-features passes.
  • Unit tests for ApprovalContext::is_blocked() (3 tests in tool.rs) are implemented.
  • Unit tests for worker approval logic (4 tests in worker.rs) are implemented.
  • Integration tests for scheduler approval context (3 tests in scheduler.rs) are implemented.
  • Routine action roundtrip test with tool_permissions is implemented.
  • E2E trace compiles with --features libsql.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new ApprovalContext to allow autonomous jobs to execute tools that require approval by pre-authorizing them, which is a great enhancement for routines. However, the current implementation has two significant security flaws: RoutineCreateTool defaults to Never approval, potentially allowing prompt injection to create routines with arbitrary permissions, and the Autonomous context automatically permits all UnlessAutoApproved tools, bypassing tool_permissions restrictions. Additionally, there are opportunities to reduce code duplication for better maintainability, and a potential issue where subtasks might not inherit the approval context, which could impact complex autonomous jobs.

Comment thread src/tools/tool.rs
match self {
Self::Autonomous { allowed_tools } => match requirement {
ApprovalRequirement::Never => false,
ApprovalRequirement::UnlessAutoApproved => false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The ApprovalContext::is_blocked method is implemented such that all tools with ApprovalRequirement::UnlessAutoApproved are automatically permitted in an Autonomous context (used for routines). This means that any tool categorized as UnlessAutoApproved (which often includes sensitive operations like file writes or network requests) will be allowed to run in any routine without explicit authorization in tool_permissions. This significantly increases the risk of autonomous jobs performing sensitive actions without explicit user consent, especially if an attacker can manipulate the routine's prompt via injection.

Suggested change
ApprovalRequirement::UnlessAutoApproved => false,
ApprovalRequirement::UnlessAutoApproved => true, // Block UnlessAutoApproved in autonomous context by default

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is by design. Routines run autonomously and need basic tools like file read/write, http, and memory to function. Only Always-approval tools (shell, message) require explicit listing in tool_permissions. Blocking UnlessAutoApproved would break all routine functionality since even memory_write and read_file use that level. The defense-in-depth here is: (1) routines are created by the user with explicit tool_permissions, (2) the routine prompt is user-authored, (3) tool outputs still go through the safety layer.

Comment thread src/agent/scheduler.rs
tools,
context_manager,
safety,
None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When calling execute_tool_task from spawn_subtask, None is passed for approval_context. This means subtasks will not inherit the ApprovalContext from their parent job and will use default approval logic. This could cause tools to be incorrectly blocked in autonomous routines that spawn subtasks. The ApprovalContext should be passed down to subtasks, which may require modifying the Task::ToolExec enum.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO comment in 55d4acb. spawn_subtask is only called by spawn_batch which is not used in any production code path — only in scheduler tests. Will propagate ApprovalContext when subtasks are actually used in routine/autonomous paths.

Comment thread src/agent/scheduler.rs
Comment on lines +143 to +170
pub async fn dispatch_job_with_context(
&self,
user_id: &str,
title: &str,
description: &str,
metadata: Option<serde_json::Value>,
approval_context: ApprovalContext,
) -> Result<Uuid, JobError> {
let job_id = self
.context_manager
.create_job_for_user(user_id, title, description)
.await?;

if let Some(meta) = metadata {
self.context_manager
.update_context(job_id, |ctx| {
ctx.metadata = meta;
})
.await?;
}

if let Some(ref store) = self.store {
let ctx = self.context_manager.get_context(job_id).await?;
store.save_job(&ctx).await.map_err(|e| JobError::Failed {
id: job_id,
reason: format!("failed to persist job: {e}"),
})?;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's significant code duplication between dispatch_job and dispatch_job_with_context. The logic for creating the job, updating metadata, and persisting it is identical in both functions. To improve maintainability, consider extracting this common logic into a private helper function. This aligns with the principle of consolidating related sequences of operations into a single reusable method to improve code consistency and maintainability.

References
  1. Consolidate related sequences of operations, such as creating, persisting, and scheduling a job, into a single reusable method to improve code consistency and maintainability.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 2bc6c0e — extracted dispatch_job_inner private helper.

Comment thread src/agent/worker.rs Outdated
Comment on lines +685 to +688
let blocked = match &deps.approval_context {
Some(ctx) => ctx.is_blocked(tool_name, requirement),
None => requirement.is_required(),
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This approval check logic is duplicated in src/agent/scheduler.rs inside execute_tool_task. To avoid duplication, this logic could be extracted into a static helper method on ApprovalContext itself. For example:

// in ApprovalContext
pub fn is_blocked_or_default(context: &Option<Self>, tool_name: &str, requirement: ApprovalRequirement) -> bool {
    match context {
        Some(ctx) => ctx.is_blocked(tool_name, requirement),
        None => requirement.is_required(),
    }
}

Then both call sites could be simplified to let blocked = ApprovalContext::is_blocked_or_default(&deps.approval_context, tool_name, requirement);.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 55d4acb — extracted ApprovalContext::is_blocked_or_default() static helper, exactly as suggested.

Comment thread src/tools/builtin/routine.rs Outdated
Comment on lines +201 to +209
let tool_permissions = params
.get("tool_permissions")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(String::from))
.collect()
})
.unwrap_or_default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for parsing tool_permissions from a serde_json::Value is also present in src/agent/routine.rs within RoutineAction::from_db. To avoid code duplication, consider extracting this into a shared helper function.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 55d4acb — extracted parse_tool_permissions() helper in routine.rs, used from both call sites.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

ilblackdragon and others added 2 commits March 5, 2026 11:58
- Add `with_routines()` to TestRigBuilder that passes a RoutineConfig
  to Agent::new, enabling routine tool registration during agent startup
- Add Turn 2 (routine_list) to the trace to verify routine persistence
  in the database after routine_create
- Fix formatting issues flagged by CI (cargo fmt)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ontext

Extract shared logic into private `dispatch_job_inner` to prevent
divergence when dispatch behavior changes in the future.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 5, 2026 20:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/agent/scheduler.rs
Comment on lines 307 to 313
tools,
context_manager,
safety,
None,
tool_parent_id,
&tool_name,
params,
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execute_tool_task is now approval-context aware, but this call site always passes None, which forces the legacy behavior (blocking all non-Never tools). This defeats the new autonomous ApprovalContext in scheduler-driven tool execution paths. Pass through the relevant approval_context (e.g., from the job/worker deps or message payload) instead of hardcoding None.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as Gemini's comment — TODO added in 55d4acb. Not in active code paths currently.

Comment thread src/agent/worker.rs Outdated
Comment on lines +1513 to +1546
/// Regression test: calling mark_completed on an already-Completed job must
/// not surface as an error. The outer `run()` guards should skip the second
/// transition instead of attempting Completed → Completed.
#[tokio::test]
async fn test_mark_completed_twice_does_not_error() {
let worker = make_worker(vec![]).await;

// Transition to InProgress first (required by state machine)
worker
.context_manager()
.update_context(worker.job_id, |ctx| {
ctx.transition_to(JobState::InProgress, None)
})
.await
.unwrap()
.unwrap();

// First mark_completed should succeed
worker.mark_completed().await.unwrap();

// Verify state is Completed
let ctx = worker
.context_manager()
.get_context(worker.job_id)
.await
.unwrap();
assert_eq!(ctx.state, JobState::Completed);

// Second mark_completed should fail (Completed → Completed is invalid)
let result = worker.mark_completed().await;
assert!(
result.is_err(),
"Completed → Completed transition should be rejected by state machine"
);
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s name/docstring claims the second completion “must not surface as an error”, but the assertion explicitly expects an error on the second mark_completed(). Either (a) update the test to exercise the real regression path described (e.g., calling run()/execution loop such that a second completion attempt is safely ignored), or (b) rename/reword the test to reflect that mark_completed() is expected to error when called twice.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 55d4acb — renamed to test_mark_completed_twice_returns_error and updated docstring.

Comment thread src/tools/tool.rs Outdated
Comment on lines +31 to +44
/// Approval context for tool execution in different environments.
///
/// Interactive sessions use session-level auto-approve lists managed by the UI.
/// Autonomous contexts (routines, background jobs) use pre-authorized tool sets
/// configured at creation time.
#[derive(Debug, Clone)]
pub enum ApprovalContext {
/// Autonomous job with no interactive user. `UnlessAutoApproved` tools are
/// pre-approved. `Always` tools are blocked unless listed in `allowed_tools`.
Autonomous {
/// Tool names that are pre-authorized even for `Always` approval.
allowed_tools: std::collections::HashSet<String>,
},
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment describes both interactive and autonomous environments, but the enum currently only models the autonomous case. Either add an explicit interactive variant (if this type is intended to represent both modes), or update the docs to avoid implying interactive-session behavior is represented by ApprovalContext.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 55d4acb — updated doc to clarify it only models autonomous mode.

Comment thread tests/e2e_advanced_traces.rs Outdated
Comment on lines +305 to +315
for tool in &[
"routine_create",
"routine_list",
"memory_write",
"memory_search",
] {
assert!(
started.iter().any(|s| s == *tool),
"{tool} not called: {started:?}"
);
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace fixture’s expects.tools_used includes echo, and turn 3 uses an echo tool call, but this verification list omits it. Add \"echo\" here (or adjust the fixture expectations) so the test consistently validates the intended tool sequence.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 8eafa57 — echo replaced with http tool throughout (fixture, test, and expects).

…test

- Add `routine_fire` tool that calls `RoutineEngine::fire_manual` to
  trigger a routine on demand. Registered alongside the other 5 routine
  tools (now 6 total).

- Rewrite the routine_news_digest E2E trace to exercise the full
  execution stack end-to-end:
  1. routine_create (manual trigger, full_job, tool_permissions: [message])
  2. routine_fire → RoutineEngine → Scheduler::dispatch_job_with_context
     → autonomous Worker consuming TraceLlm steps
  3. Worker calls echo → memory_write → message (broadcast to test channel)
  4. Test verifies the message broadcast arrived, proving ApprovalContext
     correctly allowed the Always-approval message tool

- Register message tools in TestRig so routines can send messages to
  the test channel via channel_manager.broadcast().

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rker http calls

Propagate http_interceptor from AgentDeps → Scheduler → WorkerDeps → JobContext
so that routine workers (and any scheduler-dispatched workers) can use the
ReplayingHttpInterceptor for mock HTTP responses during tests.

Changes:
- Add http_interceptor field to Scheduler and WorkerDeps
- Set job_ctx.http_interceptor in Worker before tool execution
- Add with_http_exchanges() builder method to TestRigBuilder
- Replace echo tool with http tool in routine_news_digest trace
- Test now exercises real http tool with mock response → memory_write → message

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 00:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/agent/worker.rs Outdated
Comment on lines +1520 to +1553
/// Regression test: calling mark_completed on an already-Completed job must
/// not surface as an error. The outer `run()` guards should skip the second
/// transition instead of attempting Completed → Completed.
#[tokio::test]
async fn test_mark_completed_twice_does_not_error() {
let worker = make_worker(vec![]).await;

// Transition to InProgress first (required by state machine)
worker
.context_manager()
.update_context(worker.job_id, |ctx| {
ctx.transition_to(JobState::InProgress, None)
})
.await
.unwrap()
.unwrap();

// First mark_completed should succeed
worker.mark_completed().await.unwrap();

// Verify state is Completed
let ctx = worker
.context_manager()
.get_context(worker.job_id)
.await
.unwrap();
assert_eq!(ctx.state, JobState::Completed);

// Second mark_completed should fail (Completed → Completed is invalid)
let result = worker.mark_completed().await;
assert!(
result.is_err(),
"Completed → Completed transition should be rejected by state machine"
);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test name/docstring says marking a Completed job completed again “does not error”, but the assertions expect an error (and the comment even states the Completed → Completed transition is invalid). Either rename/reword the test to match what it actually verifies, or change it to exercise the real regression (the worker run loop should not attempt a second Completed transition when execution_loop already completed the job).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 55d4acb — renamed test and updated docstring (same as earlier Copilot comment).

Comment on lines +443 to 453
// Set the message tool's default channel/target from the routine's notify config
// so the LLM can send results without triggering cross-channel approval.
if let Some(channel) = &routine.notify.channel {
scheduler
.tools()
.set_message_tool_context(Some(channel.clone()), Some(routine.notify.user.clone()))
.await;
}

let metadata = serde_json::json!({ "max_iterations": max_iterations });

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the message tool’s default channel/target via ToolRegistry mutates shared global state. Because the scheduler can run multiple jobs concurrently, this can race with other jobs/interactive turns and cause messages to be sent to the wrong channel/target when the message tool is invoked without explicit params (and it also affects requires_approval() decisions). Prefer storing the routine’s notify channel/target in the JobContext (e.g., metadata) and applying it per-job immediately before message tool approval/execute, or require routines to always pass explicit channel/target instead of relying on global defaults.

Suggested change
// Set the message tool's default channel/target from the routine's notify config
// so the LLM can send results without triggering cross-channel approval.
if let Some(channel) = &routine.notify.channel {
scheduler
.tools()
.set_message_tool_context(Some(channel.clone()), Some(routine.notify.user.clone()))
.await;
}
let metadata = serde_json::json!({ "max_iterations": max_iterations });
// Attach the routine's notify configuration to the job metadata instead of
// mutating global message tool defaults, to avoid cross-job races.
let metadata = serde_json::json!({
"max_iterations": max_iterations,
"notify": {
"channel": routine.notify.channel,
"user": routine.notify.user,
},
});

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. Added TODO comment in 55d4acb noting the race. Will move notify config into JobContext metadata in a follow-up.

Comment thread tests/e2e_advanced_traces.rs Outdated
Comment on lines +327 to +331
// Find the main conversation reply (from turn 2).
let fire_reply = &responses[1].content.to_lowercase();
assert!(
fire_reply.contains("fired") || fire_reply.contains("running"),
"Turn 2: expected fired/running, got: {fire_reply}"
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test assumes the “fire” conversational reply will always be at responses[1]. Because the routine worker runs asynchronously and broadcasts to the same channel, the broadcast message could arrive before/after the conversational reply depending on scheduling, making the index-based assertion flaky. Consider locating the fire confirmation response by content (similar to message_broadcast) rather than relying on a fixed position.

Suggested change
// Find the main conversation reply (from turn 2).
let fire_reply = &responses[1].content.to_lowercase();
assert!(
fire_reply.contains("fired") || fire_reply.contains("running"),
"Turn 2: expected fired/running, got: {fire_reply}"
// Find the main conversation reply (from turn 2) by content, since
// the routine worker runs asynchronously and may interleave messages.
let fire_reply = responses.iter().find(|r| {
let content = r.content.to_lowercase();
content.contains("fired") || content.contains("running")
});
assert!(
fire_reply.is_some(),
"Turn 2: expected fired/running, got: {:?}",
responses.iter().map(|r| &r.content).collect::<Vec<_>>()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 55d4acb — changed to content-based find() lookup, exactly as suggested.

Comment thread tests/e2e_advanced_traces.rs Outdated
// Exercises the full routine execution stack:
// routine_create → routine_fire → RoutineEngine::fire_manual →
// Scheduler::dispatch_job_with_context → Worker (autonomous) →
// echo + memory_write + message (broadcast to test channel)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment says the routine worker executes “echo + memory_write + message”, but the trace fixture/expected tool chain here is http → memory_write → message. Updating the comment to match the actual tools exercised will avoid confusion when debugging this test.

Suggested change
// echo + memory_write + message (broadcast to test channel)
// http + memory_write + message (broadcast to test channel)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 55d4acb — updated to http + memory_write + message.

ilblackdragon and others added 2 commits March 5, 2026 16:55
…-context

# Conflicts:
#	tests/support/test_rig.rs
- Extract `ApprovalContext::is_blocked_or_default()` helper to deduplicate
  approval check logic in worker.rs and scheduler.rs
- Extract `parse_tool_permissions()` helper to deduplicate JSON array
  parsing in routine.rs and builtin/routine.rs
- Fix test name: `test_mark_completed_twice_does_not_error` →
  `test_mark_completed_twice_returns_error` (matches actual behavior)
- Fix ApprovalContext doc comment to clarify it only models autonomous mode
- Fix flaky index-based assertion in routine_news_digest test — now uses
  content-based search instead of fixed position
- Fix stale comment: echo → http in routine test header
- Add TODO for subtask approval context propagation (latent, not in
  active code paths)
- Add TODO for global message tool context race in routine_engine

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 01:10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +443 to 455
// Set the message tool's default channel/target from the routine's notify config
// so the LLM can send results without triggering cross-channel approval.
// TODO: This mutates shared global state and can race with concurrent jobs.
// Move notify config into JobContext metadata and apply per-job instead.
if let Some(channel) = &routine.notify.channel {
scheduler
.tools()
.set_message_tool_context(Some(channel.clone()), Some(routine.notify.user.clone()))
.await;
}

let metadata = serde_json::json!({ "max_iterations": max_iterations });

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_message_tool_context() mutates the global MessageTool defaults (shared ToolRegistry state). In concurrent routine runs or while an interactive session is active, this can race and cause messages to be sent with the wrong channel/target and can also change message.requires_approval() outcomes across jobs. Prefer making channel/target a per-job/per-call input (e.g., pass explicitly in tool args or derive from JobContext metadata) rather than mutating shared global state.

Suggested change
// Set the message tool's default channel/target from the routine's notify config
// so the LLM can send results without triggering cross-channel approval.
// TODO: This mutates shared global state and can race with concurrent jobs.
// Move notify config into JobContext metadata and apply per-job instead.
if let Some(channel) = &routine.notify.channel {
scheduler
.tools()
.set_message_tool_context(Some(channel.clone()), Some(routine.notify.user.clone()))
.await;
}
let metadata = serde_json::json!({ "max_iterations": max_iterations });
// Embed the routine's notify config into the job metadata so that any
// message tools can derive channel/target per job instead of mutating
// shared global state.
let metadata = serde_json::json!({
"max_iterations": max_iterations,
"notify": {
"channel": routine.notify.channel,
"user": routine.notify.user,
},
});

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of the earlier comment on the same line. TODO already added in 55d4acb noting the race. Will move to per-job metadata in a follow-up.

Comment on lines +2 to +6
"model_name": "advanced-routine-news-digest",
"expects": {
"tools_used": ["routine_create", "routine_fire", "http", "memory_write", "message"],
"all_tools_succeeded": true,
"min_responses": 2
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the E2E trace fixture is a 3-turn flow that includes a verification step using memory_search, but this trace has 2 turns and no memory_search tool usage. Please either update the PR description to match the actual fixture/test, or extend the fixture/test to include the documented verification step.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR description to match the actual implementation: 2-turn fixture with http → memory_write → message (no memory_search). The fixture evolved from the original 3-turn design during implementation.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/support/test_rig.rs Outdated
let exchanges = if self.http_exchanges.is_empty() {
http_exchanges
} else {
self.http_exchanges
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestRigBuilder::build(self) conditionally moves self.http_exchanges here, but the function continues to access other self.* fields afterward (e.g. self.enable_routines, and later self.max_tool_iterations). This will fail to compile due to a partial move of self. Consider destructuring self into locals at the top of build (including enable_routines and http_exchanges) or changing the method to take mut self and using std::mem::take(&mut self.http_exchanges) so self remains usable for subsequent fields.

Suggested change
self.http_exchanges
self.http_exchanges.clone()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already resolved in 35847feself is destructured into locals at the top of build().

Comment on lines +598 to +600
let run_id = self.engine.fire_manual(routine.id).await.map_err(|e| {
ToolError::ExecutionFailed(format!("failed to fire routine '{}': {e}", name))
})?;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routine_fire ultimately calls RoutineEngine::fire_manual, which (unlike cron/event firing paths) does not apply the routine cooldown guardrail. This means a manual trigger can spam runs even when cooldown_secs is set, which could cause unexpected load/cost. If cooldown is meant to be respected for manual fires too, add a cooldown check in fire_manual or enforce it in this tool before firing; otherwise clarify in the tool description/docs that manual fires bypass cooldown.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — documented in description ('bypassing schedule, trigger type, and cooldown') since c138926.

ilblackdragon and others added 2 commits March 5, 2026 23:49
… fragility

Destructure TestRigBuilder at the top of build() instead of accessing
self.* fields after moving self.http_exchanges. While the prior code
compiled (remaining fields are Copy), it was fragile and would break
if any non-Copy field were added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Manual fires are explicitly user-initiated and intentionally bypass
cooldown checks (which only apply to automated cron/event triggers).
Updated tool description and fire_manual docstring to make this clear.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 07:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/tools/builtin/routine.rs:105

  • routine_create now accepts tool_permissions, which are later used to allow ApprovalRequirement::Always tools in autonomous jobs. However, this tool does not override requires_approval(), so it defaults to ApprovalRequirement::Never and can be invoked without any approval in interactive sessions—effectively allowing an LLM to self-grant elevated autonomous tool permissions. Consider requiring at least UnlessAutoApproved, and ideally Always when action_type == "full_job" and tool_permissions is non-empty (or contains high-risk tools like shell/cross-channel message).
                "cooldown_secs": {
                    "type": "integer",
                    "description": "Minimum seconds between fires (default: 300)"
                },
                "tool_permissions": {
                    "type": "array",
                    "items": { "type": "string" },
                    "description": "Tool names pre-authorized for Always-approval tools in full_job mode (e.g. ['shell', 'message']). UnlessAutoApproved tools are automatically permitted in routines."
                }
            },
            "required": ["name", "trigger_type", "prompt"]
        })

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +559 to +568
#[async_trait]
impl Tool for RoutineFireTool {
fn name(&self) -> &str {
"routine_fire"
}

fn description(&self) -> &str {
"Manually trigger a routine to run immediately, bypassing schedule, trigger type, and cooldown."
}

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routine_fire can immediately dispatch a routine run (bypassing schedule/trigger/cooldown per the description), but the tool does not override requires_approval() and therefore defaults to ApprovalRequirement::Never. With full_job routines + tool_permissions, this can become an approval bypass path for Always-gated tools. Add a requires_approval() implementation (at least UnlessAutoApproved, potentially Always) so interactive sessions must explicitly approve firing routines.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added requires_approval() -> UnlessAutoApproved to routine_fire in fc15ff0.

ilblackdragon and others added 2 commits March 6, 2026 00:00
Two fixes for message tool failures in autonomous routine jobs:

1. MessageTool::requires_approval() now returns UnlessAutoApproved when
   the explicit channel param matches the default channel (was Always,
   causing "requires authentication" errors for routine workers).

2. routine_create tool now accepts notify_channel and notify_user params,
   wired into NotifyConfig. Without these, routines had channel: None,
   so set_message_tool_context was never called, causing "No channel
   specified" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-context

Resolved conflicts in message.rs (kept same-channel UnlessAutoApproved
fix over main's Always behavior) and test_rig.rs (combined extra_tools
from main with enable_routines/http_exchanges from this branch, added
trace_llm_ref tracking).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 16:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +608 to +612
let name = require_str(&params, "name")?;

let routine = self
.store
.get_routine_by_name(&ctx.user_id, name)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is a &str borrowed from params and is then held across .await points, which will fail to compile in an async function due to borrowing a local across suspension. Convert name to an owned String immediately (e.g., let name = require_str(...)? .to_string();) and then pass &name to async calls / use it in error messages and the result JSON.

Suggested change
let name = require_str(&params, "name")?;
let routine = self
.store
.get_routine_by_name(&ctx.user_id, name)
let name = require_str(&params, "name")?.to_string();
let routine = self
.store
.get_routine_by_name(&ctx.user_id, &name)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — params is owned by the function, so the &str borrow from it lives for the function's lifetime. This compiles and runs correctly (verified by CI).

Comment on lines +446 to +462
// Set the message tool's default channel/target from the routine's notify config
// so the LLM can send results without triggering cross-channel approval.
// TODO: This mutates shared global state and can race with concurrent jobs.
// Move notify config into JobContext metadata and apply per-job instead.
if let Some(channel) = &routine.notify.channel {
scheduler
.tools()
.set_message_tool_context(Some(channel.clone()), Some(routine.notify.user.clone()))
.await;
}

let metadata = serde_json::json!({ "max_iterations": max_iterations });

// Build approval context: UnlessAutoApproved tools are auto-approved for routines;
// Always tools require explicit listing in tool_permissions.
let approval_context = ApprovalContext::autonomous_with_tools(tool_permissions.iter().cloned());

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updates shared/global message tool defaults, which can race when multiple jobs/routines run concurrently (e.g., one routine can accidentally change another routine’s default channel/target, impacting both routing and approval behavior). Prefer setting per-job message context (e.g., store notify channel/user in JobContext/metadata and have MessageTool resolve defaults from the executing job context rather than global mutable state).

Suggested change
// Set the message tool's default channel/target from the routine's notify config
// so the LLM can send results without triggering cross-channel approval.
// TODO: This mutates shared global state and can race with concurrent jobs.
// Move notify config into JobContext metadata and apply per-job instead.
if let Some(channel) = &routine.notify.channel {
scheduler
.tools()
.set_message_tool_context(Some(channel.clone()), Some(routine.notify.user.clone()))
.await;
}
let metadata = serde_json::json!({ "max_iterations": max_iterations });
// Build approval context: UnlessAutoApproved tools are auto-approved for routines;
// Always tools require explicit listing in tool_permissions.
let approval_context = ApprovalContext::autonomous_with_tools(tool_permissions.iter().cloned());
// Attach notify configuration to the job metadata so the worker/message tool
// can resolve per-job defaults instead of mutating shared global state.
let metadata = serde_json::json!({
"max_iterations": max_iterations,
"notify": {
"channel": routine.notify.channel,
"user": routine.notify.user,
},
});
// Build approval context: UnlessAutoApproved tools are auto-approved for routines;
// Always tools require explicit listing in tool_permissions.
let approval_context = ApprovalContext::autonomous_with_tools(tool_permissions.iter().cloned());
let approval_context = ApprovalContext::autonomous_with_tools(tool_permissions.iter().cloned());

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of earlier comment. TODO already added in 55d4acb. Will move to per-job metadata in a follow-up.

Comment thread src/tools/tool.rs Outdated
}

#[test]
fn test_approval_context_never_always_passes() {
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name is confusing (“never_always”). Consider renaming to something more specific to the behavior under test (e.g., test_approval_context_never_is_not_blocked) to improve readability and intent.

Suggested change
fn test_approval_context_never_always_passes() {
fn test_approval_context_never_is_not_blocked() {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to test_approval_context_never_is_not_blocked in fc15ff0.

ilblackdragon and others added 2 commits March 6, 2026 10:43
The message tool only sends to user-owned channels via
ChannelManager::broadcast (TUI, Telegram, Slack, web gateway, etc.).
It cannot reach arbitrary external services, so approval adds friction
with no security benefit. This also eliminates the routine context
errors entirely since approval is never checked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- routine_fire now returns UnlessAutoApproved since firing a routine
  can dispatch a full_job with pre-authorized Always-gated tools
- Rename test_approval_context_never_always_passes to
  test_approval_context_never_is_not_blocked for clarity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 7, 2026 01:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/support/test_rig.rs Outdated
Comment on lines +420 to +425
/// Enable the routines system so `routine_create` and other routine tools
/// are registered and functional.
pub fn with_routines(mut self) -> Self {
self.enable_routines = true;
self
}
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_routines() claims it enables routines so routine tools are registered/functional, but build() currently registers routine tools unconditionally (and uses a RoutineEngine constructed with scheduler: None). This means routine tools may be present even when routines are “disabled”, but full_job execution will fail (no scheduler), and the builder flag doesn’t actually control tool availability. Gate routine tool registration/engine creation behind enable_routines (or rely on Agent::run() to register them when routine_config.enabled).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docstring to clarify that with_routines() enables scheduler wiring, not tool registration. Fixed in 1a38775.

Comment on lines +210 to +214
fn requires_approval(&self, _params: &serde_json::Value) -> ApprovalRequirement {
// Message tool only sends to user-owned channels (TUI, Telegram, Slack, web
// gateway, etc.) via ChannelManager::broadcast. It cannot reach arbitrary
// external services, so no approval is needed.
ApprovalRequirement::Never
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new comment says the message tool “cannot reach arbitrary external services”, but ChannelManager::broadcast can send to external integrations (Telegram/Slack/web gateway/etc.) depending on configured channels. Either reintroduce an approval requirement for higher-risk sends (e.g., cross-channel/explicit channel), or adjust the comment + surrounding security assumptions (e.g., routine tool_permissions docs and routine engine comments) to match the new behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to 'only delivers to channels the user has configured' in 1a38775.

Comment thread src/tools/builtin/routine.rs Outdated
"tool_permissions": {
"type": "array",
"items": { "type": "string" },
"description": "Tool names pre-authorized for Always-approval tools in full_job mode (e.g. ['shell', 'message']). UnlessAutoApproved tools are automatically permitted in routines."
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool_permissions schema/description still uses message as an example of an Always-approval tool (“cross-channel messaging”), but MessageTool::requires_approval() now returns Never. Update this description/examples to reflect the current approval model (e.g., use shell / HTTP with credentials as examples) so users don’t think message needs to be pre-authorized.

Suggested change
"description": "Tool names pre-authorized for Always-approval tools in full_job mode (e.g. ['shell', 'message']). UnlessAutoApproved tools are automatically permitted in routines."
"description": "Tool names pre-authorized for tools that require explicit approval in full_job mode (e.g. ['shell', 'http_request_with_auth']). Tools that are automatically approved do not need to be listed here."

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — updated example to ['shell'] since message is now Never. Fixed in 1a38775.

Comment on lines +590 to +591
// so this is a meaningful escalation that warrants auto-approval gating.
ApprovalRequirement::UnlessAutoApproved
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routine_fire returns ApprovalRequirement::UnlessAutoApproved, but in autonomous jobs ApprovalContext::Autonomous auto-approves UnlessAutoApproved tools. That means a routine full_job can invoke routine_fire (and spawn more jobs/runs) without being listed in tool_permissions, despite the comment calling it a “meaningful escalation”. Consider changing this to ApprovalRequirement::Always so autonomous contexts block it unless explicitly permitted, and interactive sessions still prompt.

Suggested change
// so this is a meaningful escalation that warrants auto-approval gating.
ApprovalRequirement::UnlessAutoApproved
// so this is a meaningful escalation that warrants explicit approval in all contexts.
ApprovalRequirement::Always

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — routine_fire is an interactive tool (the user asks the agent to fire a routine). Routines don't fire other routines, so this never runs in autonomous context. In interactive mode, UnlessAutoApproved prompts once then auto-approves.

- Remove 'message' from tool_permissions example (no longer Always)
- Reword message tool approval comment for accuracy
- Clarify with_routines() docstring re: tool registration vs engine wiring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ilblackdragon ilblackdragon merged commit ae89a52 into main Mar 7, 2026
22 checks passed
@ilblackdragon ilblackdragon deleted the feat/routine-approval-context branch March 7, 2026 05:22
This was referenced Mar 7, 2026
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…#577)

* feat(routines): add approval context for autonomous job execution

Routines and background jobs were unable to use any tools that required
approval (file ops, shell, message, http), making them effectively
useless. This adds an ApprovalContext system that lets autonomous jobs
pre-authorize tools at dispatch time.

- Add ApprovalContext enum with Autonomous variant that auto-approves
  UnlessAutoApproved tools and optionally pre-authorizes Always tools
- Add tool_permissions field to RoutineAction::FullJob for pre-authorizing
  Always-gated tools (e.g. destructive shell, cross-channel messaging)
- Add Scheduler::dispatch_job_with_context() to thread approval context
  through to workers
- Set message tool default channel/target from routine NotifyConfig
  so routines can send results without cross-channel approval
- Fix Completed→Completed state transition error in worker (plan marks
  job completed, then direct loop or outer run() tries again)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(routines): add E2E trace for routine news digest workflow

Add a 3-turn trace fixture and test that exercises:
- Turn 1: routine_create with full_job mode and tool_permissions
- Turn 2: Simulated digest workflow with echo + memory_write
- Turn 3: Verification via memory_search

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): wire RoutineEngine into test rig for routine_create E2E

- Add `with_routines()` to TestRigBuilder that passes a RoutineConfig
  to Agent::new, enabling routine tool registration during agent startup
- Add Turn 2 (routine_list) to the trace to verify routine persistence
  in the database after routine_create
- Fix formatting issues flagged by CI (cargo fmt)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(scheduler): deduplicate dispatch_job and dispatch_job_with_context

Extract shared logic into private `dispatch_job_inner` to prevent
divergence when dispatch behavior changes in the future.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(routines): add routine_fire tool and real E2E routine execution test

- Add `routine_fire` tool that calls `RoutineEngine::fire_manual` to
  trigger a routine on demand. Registered alongside the other 5 routine
  tools (now 6 total).

- Rewrite the routine_news_digest E2E trace to exercise the full
  execution stack end-to-end:
  1. routine_create (manual trigger, full_job, tool_permissions: [message])
  2. routine_fire → RoutineEngine → Scheduler::dispatch_job_with_context
     → autonomous Worker consuming TraceLlm steps
  3. Worker calls echo → memory_write → message (broadcast to test channel)
  4. Test verifies the message broadcast arrived, proving ApprovalContext
     correctly allowed the Always-approval message tool

- Register message tools in TestRig so routines can send messages to
  the test channel via channel_manager.broadcast().

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(routines): wire HttpInterceptor through scheduler for routine worker http calls

Propagate http_interceptor from AgentDeps → Scheduler → WorkerDeps → JobContext
so that routine workers (and any scheduler-dispatched workers) can use the
ReplayingHttpInterceptor for mock HTTP responses during tests.

Changes:
- Add http_interceptor field to Scheduler and WorkerDeps
- Set job_ctx.http_interceptor in Worker before tool execution
- Add with_http_exchanges() builder method to TestRigBuilder
- Replace echo tool with http tool in routine_news_digest trace
- Test now exercises real http tool with mock response → memory_write → message

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments from Copilot on PR nearai#577

- Extract `ApprovalContext::is_blocked_or_default()` helper to deduplicate
  approval check logic in worker.rs and scheduler.rs
- Extract `parse_tool_permissions()` helper to deduplicate JSON array
  parsing in routine.rs and builtin/routine.rs
- Fix test name: `test_mark_completed_twice_does_not_error` →
  `test_mark_completed_twice_returns_error` (matches actual behavior)
- Fix ApprovalContext doc comment to clarify it only models autonomous mode
- Fix flaky index-based assertion in routine_news_digest test — now uses
  content-based search instead of fixed position
- Fix stale comment: echo → http in routine test header
- Add TODO for subtask approval context propagation (latent, not in
  active code paths)
- Add TODO for global message tool context race in routine_engine

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix formatting in is_blocked_or_default test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(test_rig): destructure self in build() to avoid partial-move fragility

Destructure TestRigBuilder at the top of build() instead of accessing
self.* fields after moving self.http_exchanges. While the prior code
compiled (remaining fields are Copy), it was fragile and would break
if any non-Copy field were added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: clarify that routine_fire bypasses cooldown

Manual fires are explicitly user-initiated and intentionally bypass
cooldown checks (which only apply to automated cron/event triggers).
Updated tool description and fire_manual docstring to make this clear.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(routines): fix message tool approval in routine context

Two fixes for message tool failures in autonomous routine jobs:

1. MessageTool::requires_approval() now returns UnlessAutoApproved when
   the explicit channel param matches the default channel (was Always,
   causing "requires authentication" errors for routine workers).

2. routine_create tool now accepts notify_channel and notify_user params,
   wired into NotifyConfig. Without these, routines had channel: None,
   so set_message_tool_context was never called, causing "No channel
   specified" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(message): remove approval requirement from message tool

The message tool only sends to user-owned channels via
ChannelManager::broadcast (TUI, Telegram, Slack, web gateway, etc.).
It cannot reach arbitrary external services, so approval adds friction
with no security benefit. This also eliminates the routine context
errors entirely since approval is never checked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments — routine_fire approval + test rename

- routine_fire now returns UnlessAutoApproved since firing a routine
  can dispatch a full_job with pre-authorized Always-gated tools
- Rename test_approval_context_never_always_passes to
  test_approval_context_never_is_not_blocked for clarity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review nits — update stale docs and comments

- Remove 'message' from tool_permissions example (no longer Always)
- Reword message tool approval comment for accuracy
- Clarify with_routines() docstring re: tool registration vs engine wiring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…#577)

* feat(routines): add approval context for autonomous job execution

Routines and background jobs were unable to use any tools that required
approval (file ops, shell, message, http), making them effectively
useless. This adds an ApprovalContext system that lets autonomous jobs
pre-authorize tools at dispatch time.

- Add ApprovalContext enum with Autonomous variant that auto-approves
  UnlessAutoApproved tools and optionally pre-authorizes Always tools
- Add tool_permissions field to RoutineAction::FullJob for pre-authorizing
  Always-gated tools (e.g. destructive shell, cross-channel messaging)
- Add Scheduler::dispatch_job_with_context() to thread approval context
  through to workers
- Set message tool default channel/target from routine NotifyConfig
  so routines can send results without cross-channel approval
- Fix Completed→Completed state transition error in worker (plan marks
  job completed, then direct loop or outer run() tries again)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(routines): add E2E trace for routine news digest workflow

Add a 3-turn trace fixture and test that exercises:
- Turn 1: routine_create with full_job mode and tool_permissions
- Turn 2: Simulated digest workflow with echo + memory_write
- Turn 3: Verification via memory_search

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): wire RoutineEngine into test rig for routine_create E2E

- Add `with_routines()` to TestRigBuilder that passes a RoutineConfig
  to Agent::new, enabling routine tool registration during agent startup
- Add Turn 2 (routine_list) to the trace to verify routine persistence
  in the database after routine_create
- Fix formatting issues flagged by CI (cargo fmt)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(scheduler): deduplicate dispatch_job and dispatch_job_with_context

Extract shared logic into private `dispatch_job_inner` to prevent
divergence when dispatch behavior changes in the future.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(routines): add routine_fire tool and real E2E routine execution test

- Add `routine_fire` tool that calls `RoutineEngine::fire_manual` to
  trigger a routine on demand. Registered alongside the other 5 routine
  tools (now 6 total).

- Rewrite the routine_news_digest E2E trace to exercise the full
  execution stack end-to-end:
  1. routine_create (manual trigger, full_job, tool_permissions: [message])
  2. routine_fire → RoutineEngine → Scheduler::dispatch_job_with_context
     → autonomous Worker consuming TraceLlm steps
  3. Worker calls echo → memory_write → message (broadcast to test channel)
  4. Test verifies the message broadcast arrived, proving ApprovalContext
     correctly allowed the Always-approval message tool

- Register message tools in TestRig so routines can send messages to
  the test channel via channel_manager.broadcast().

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(routines): wire HttpInterceptor through scheduler for routine worker http calls

Propagate http_interceptor from AgentDeps → Scheduler → WorkerDeps → JobContext
so that routine workers (and any scheduler-dispatched workers) can use the
ReplayingHttpInterceptor for mock HTTP responses during tests.

Changes:
- Add http_interceptor field to Scheduler and WorkerDeps
- Set job_ctx.http_interceptor in Worker before tool execution
- Add with_http_exchanges() builder method to TestRigBuilder
- Replace echo tool with http tool in routine_news_digest trace
- Test now exercises real http tool with mock response → memory_write → message

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments from Copilot on PR nearai#577

- Extract `ApprovalContext::is_blocked_or_default()` helper to deduplicate
  approval check logic in worker.rs and scheduler.rs
- Extract `parse_tool_permissions()` helper to deduplicate JSON array
  parsing in routine.rs and builtin/routine.rs
- Fix test name: `test_mark_completed_twice_does_not_error` →
  `test_mark_completed_twice_returns_error` (matches actual behavior)
- Fix ApprovalContext doc comment to clarify it only models autonomous mode
- Fix flaky index-based assertion in routine_news_digest test — now uses
  content-based search instead of fixed position
- Fix stale comment: echo → http in routine test header
- Add TODO for subtask approval context propagation (latent, not in
  active code paths)
- Add TODO for global message tool context race in routine_engine

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix formatting in is_blocked_or_default test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(test_rig): destructure self in build() to avoid partial-move fragility

Destructure TestRigBuilder at the top of build() instead of accessing
self.* fields after moving self.http_exchanges. While the prior code
compiled (remaining fields are Copy), it was fragile and would break
if any non-Copy field were added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: clarify that routine_fire bypasses cooldown

Manual fires are explicitly user-initiated and intentionally bypass
cooldown checks (which only apply to automated cron/event triggers).
Updated tool description and fire_manual docstring to make this clear.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(routines): fix message tool approval in routine context

Two fixes for message tool failures in autonomous routine jobs:

1. MessageTool::requires_approval() now returns UnlessAutoApproved when
   the explicit channel param matches the default channel (was Always,
   causing "requires authentication" errors for routine workers).

2. routine_create tool now accepts notify_channel and notify_user params,
   wired into NotifyConfig. Without these, routines had channel: None,
   so set_message_tool_context was never called, causing "No channel
   specified" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(message): remove approval requirement from message tool

The message tool only sends to user-owned channels via
ChannelManager::broadcast (TUI, Telegram, Slack, web gateway, etc.).
It cannot reach arbitrary external services, so approval adds friction
with no security benefit. This also eliminates the routine context
errors entirely since approval is never checked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review comments — routine_fire approval + test rename

- routine_fire now returns UnlessAutoApproved since firing a routine
  can dispatch a full_job with pre-authorized Always-gated tools
- Rename test_approval_context_never_always_passes to
  test_approval_context_never_is_not_blocked for clarity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review nits — update stale docs and comments

- Remove 'message' from tool_permissions example (no longer Always)
- Reword message tool approval comment for accuracy
- Clarify with_routines() docstring re: tool registration vs engine wiring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: tool/builtin Built-in tools scope: tool Tool infrastructure size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants