Skip to content

fix(engine): surface action errors to LLM with [ACTION FAILED] prefix#2326

Merged
serrrfirat merged 2 commits intostagingfrom
fix/2279-action-error-visibility
Apr 15, 2026
Merged

fix(engine): surface action errors to LLM with [ACTION FAILED] prefix#2326
serrrfirat merged 2 commits intostagingfrom
fix/2279-action-error-visibility

Conversation

@serrrfirat
Copy link
Copy Markdown
Collaborator

Summary

  • Prefix failed action results with [ACTION FAILED] <tool>: in the Python orchestrator so the LLM can't ignore tool failures and falsely claim success
  • Add TODO(#2325) for consecutive action error counting, deferred pending unified progress-tracking design

Context

When tools fail (e.g., "No lease for action"), the orchestrator appended the raw error JSON as a generic ActionResult message — indistinguishable from success. The LLM frequently ignored these errors and told the user "Successfully sent!" despite every tool call failing.

The Rust executor already sets is_error: true on lease and policy failures. This change reads that flag and prefixes the message content so the error is unmissable.

Error counting (failing the thread after repeated action errors) is tracked in #2325 and depends on the progress-tracking design discussed in #2240.

Closes #2279

Test plan

  • cargo test -p ironclaw_engine — 281 tests pass
  • Manual: trigger a "No lease for action" error on staging and verify the bot reports failure instead of claiming success

🤖 Generated with Claude Code

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added scope: ci CI/CD workflows scope: docs Documentation scope: dependencies Dependency updates size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 11, 2026
@serrrfirat serrrfirat requested review from henrypark133 and ilblackdragon and removed request for ilblackdragon April 12, 2026 12:46
@serrrfirat serrrfirat force-pushed the fix/2279-action-error-visibility branch from 256d28f to c37eda5 Compare April 12, 2026 14:37
@github-actions github-actions bot added size: S 10-49 changed lines risk: low Changes to docs, tests, or low-risk modules and removed size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules labels Apr 12, 2026
When tools fail (e.g. "No lease for action"), the orchestrator appended
the raw error JSON as an ActionResult message. The LLM frequently
ignored these errors and claimed success — a trust/hallucination issue.

Prefix failed action results with "[ACTION FAILED] <tool>:" so the LLM
receives an unmissable signal that the tool call did not succeed. The
Rust executor already sets `is_error: true` on lease and policy failures.

Closes #2279

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@serrrfirat serrrfirat force-pushed the fix/2279-action-error-visibility branch from c37eda5 to e0c163a Compare April 12, 2026 15:58
henrypark133
henrypark133 previously approved these changes Apr 14, 2026
Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Surface action errors to LLM with [ACTION FAILED] prefix (Risk: Low)

Clean fix. When is_error is true on an ActionResult, the output is now prefixed with [ACTION FAILED] action_name: so the LLM can clearly distinguish failures from successful outputs in its next turn.

Positives:

  • Python orchestrator change is minimal (2 lines) and targeted
  • Rust loop_engine test failed_action_result_is_explicit_in_next_llm_message verifies the prefix appears in the message content
  • TODO comment for consecutive error tracking is well-scoped with issue reference (#2325)

LGTM.

Resolve conflict in orchestrator/default.py: combine PR's action_name
in error prefix with staging's batch error/success counting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Surface action errors to LLM with [ACTION FAILED] prefix

Clean fix. The Python-side prefixing change is minimal, and the new loop_engine test verifies the next LLM turn now sees an explicit failure marker including the action name.

Residual risk is low: I only spot-checked the engine path and ran cargo test -p ironclaw_engine failed_action_result_is_explicit_in_next_llm_message locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules scope: ci CI/CD workflows scope: dependencies Dependency updates scope: docs Documentation size: S 10-49 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[QA] Bot falsely claims success despite shell/open tool errors ("No lease for action")

2 participants