Skip to content

chore: promote staging to staging-promote/b91bdbdb-23954805921 (2026-04-03 19:14 UTC)#1982

Merged
henrypark133 merged 11 commits intostaging-promote/b91bdbdb-23954805921from
staging-promote/994f9607-23958793145
Apr 9, 2026
Merged

chore: promote staging to staging-promote/b91bdbdb-23954805921 (2026-04-03 19:14 UTC)#1982
henrypark133 merged 11 commits intostaging-promote/b91bdbdb-23954805921from
staging-promote/994f9607-23958793145

Conversation

@ironclaw-ci
Copy link
Copy Markdown
Contributor

@ironclaw-ci ironclaw-ci bot commented Apr 3, 2026

Auto-promotion from staging CI

Batch range: a55aff980a4e235590c3af57ded2542512e2f9f6..994f96079dd0660bb1b2bf96ab13bd691f0ef290
Promotion branch: staging-promote/994f9607-23958793145
Base: staging-promote/b91bdbdb-23954805921
Triggered by: Staging CI batch at 2026-04-03 19:14 UTC

Commits in this batch (16):

Current commits in this promotion (1)

Current base: staging-promote/b91bdbdb-23954805921
Current head: staging-promote/994f9607-23958793145
Current range: origin/staging-promote/b91bdbdb-23954805921..origin/staging-promote/994f9607-23958793145

Auto-updated by staging promotion metadata workflow

Waiting for gates:

  • Tests: pending
  • E2E: pending
  • Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

* feat(docker): publish ironclaw-worker image alongside ironclaw

Build and push nearaidev/ironclaw-worker from Dockerfile.worker in the
same workflow. Both images share the same version/sha/tag scheme.

This lets ironclaw-dind pull the pre-built worker image for sandbox
baking instead of cloning the repo and building from source.

[skip-regression-check]

* feat(docker): daily scheduled build of :staging from staging branch

* perf(docker): worker image copies binary from ironclaw image instead of rebuilding

* revert Dockerfile.worker changes, keep it building from source
@github-actions github-actions bot added scope: ci CI/CD workflows size: S 10-49 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 3, 2026
@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Code review

Found 1 issue:

  1. [HIGH:75] Invalid git ref for workflow_dispatch and pull_request events

When github.event_name is not 'schedule' (e.g., during workflow_dispatch or pull_request), the checkout step sets an explicit empty string for the ref parameter. This differs from omitting the parameter and will cause the action to fail or behave unpredictably. Use github.ref as fallback instead of empty string.

See: https://github.com/anthropics/ironclaw/blob/391ec825cbebd1564a436c5983969ab1364ea458/.github/workflows/docker.yml#L38-L40

@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Additional findings from security review

Found 2 additional issues:

  1. [CRITICAL:92] Unsafe Input Injection via inputs.tag in Docker Tags

The inputs.tag parameter is user-supplied via workflow_dispatch and is directly interpolated into Docker image tags without validation. Docker tags have strict format requirements ([a-zA-Z0-9_][a-zA-Z0-9._-]*, max 128 chars). An attacker could pass malicious values like 'latest$(command)' to create invalid tags or cause unexpected behavior.

Recommendation: Validate inputs.tag against Docker tag format before use.

https://github.com/anthropics/ironclaw/blob/391ec825cbebd1564a436c5983969ab1364ea458/.github/workflows/docker.yml#L78-L80

  1. [MEDIUM:78] Unvalidated Checkout Ref

The code conditionally checks out the 'staging' branch for scheduled runs without validation that the branch exists or is safe. No validation of the branch name before checkout can allow checking out untrusted refs in a CI/CD context.

Recommendation: Explicitly validate or document the staging branch constraint.

@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Performance and Production Findings

Found 4 additional issues related to build efficiency and production readiness:

  1. [MEDIUM:70] Dockerfile.worker lacks dependency caching layer

Dockerfile.worker uses direct 'cargo build --release' without cargo-chef optimization, unlike the main Dockerfile. This rebuilds the entire dependency tree on every invocation. Combined with the new daily scheduled builds, this means each build wastes 20-30 minutes of redundant compilation. The main image uses cargo-chef caching but worker doesn't.

Recommendation: Apply cargo-chef layering to Dockerfile.worker matching the main Dockerfile structure.

  1. [MEDIUM:65] Sequential builds prevent parallelization

The two docker/build-push-action steps run sequentially but could execute in parallel since they're independent (different Dockerfiles, separate cache scopes) on ubuntu-24.04 runners. This doubles the effective build wall-clock time.

Recommendation: Use GitHub Actions matrix strategy to build both images simultaneously.

  1. [MEDIUM:55] Inconsistent GHA cache scope declaration

Main image build uses 'cache-from: type=gha' with implicit scope, while worker build uses 'cache-from: type=gha,scope=worker'. This mixed approach increases risk of cache layer collision.

Recommendation: Explicitly set scope for both: 'cache-from: type=gha,scope=ironclaw' and 'cache-from: type=gha,scope=worker'.

  1. [LOW:40] No timeout on scheduled builds

Daily scheduled builds have no explicit timeout. Sequential builds could take 40-65 minutes and may exceed job timeout if there are network issues.

Recommendation: Add 'timeout-minutes: 120' to the workflow.

henrypark133 and others added 10 commits April 3, 2026 17:51
…B-backed pairing, and OwnershipCache (#1898)

* feat(ownership): add OwnerId, Identity, UserRole, can_act_on types

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): private OwnerId field, ResourceScope serde derives, fix doc comment

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* refactor(tenant): replace SystemScope::db() escape hatch with typed workspace_for_user(), fix stale variable names

- Add SystemScope::workspace_for_user() that wraps Workspace::new_with_db
- Remove SystemScope::db() which exposed the raw Arc<dyn Database>
- Update 3 callers (routine_engine.rs x2, heartbeat.rs x1) to use the new method
- Fix stale comment: "admin context" -> "system context" in SystemScope
- Rename `admin` bindings to `system` in agent_loop.rs for clarity

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(tenant): rename stale admin binding to system_store in heartbeat.rs

* refactor(tenant): TenantScope/TenantCtx carry Identity, add with_identity() constructor and bridge new()

- TenantScope: replace `user_id: String` field with `identity: Identity`; add `with_identity()` preferred constructor; keep `new(user_id, db)` as Member-role bridge; add `identity()` accessor; all internal method bodies use `identity.owner_id.as_str()` in place of `&self.user_id`
- TenantCtx: replace `user_id: String` field with `identity: Identity`; update constructor signature; add `identity()` accessor; `user_id()` delegates to `identity.owner_id.as_str()`; cost/rate methods updated accordingly
- agent_loop: split `tenant_ctx(&str)` into bridge + new `tenant_ctx_with_identity(Identity)` which holds the full body; bridge delegates to avoid duplication

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(db): add V16 tool scope, V17 channel_identities, V18 pairing_requests migrations

- PostgreSQL: V16__tool_scope.sql adds scope column to wasm_tools/dynamic_tools
- PostgreSQL: V17__channel_identities.sql creates channel identity resolution table
- PostgreSQL: V18__pairing_requests.sql creates pairing request table replacing file-based store
- libSQL SCHEMA: adds scope column to wasm_tools/dynamic_tools, channel_identities, pairing_requests tables
- libSQL INCREMENTAL_MIGRATIONS: versions 17-19 for existing databases
- IDEMPOTENT_ADD_COLUMN_MIGRATIONS: handles fresh-install/upgrade dual path for scope columns
- Runner updated to check ALL idempotent columns per version before skipping SQL
- Test: test_ownership_model_tables_created verifies all new tables/columns exist after migrations

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(db): use correct RFC3339 timestamp default in libSQL, document version sequence offset

Replace datetime('now') with strftime('%Y-%m-%dT%H:%M:%fZ', 'now') in the
channel_identities and pairing_requests table definitions (both in SCHEMA and
INCREMENTAL_MIGRATIONS) to match the project-standard RFC 3339 timestamp format
with millisecond precision. Also add a comment clarifying that libSQL incremental
migration version numbers are independent from PostgreSQL VN migration numbers.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(ownership): bootstrap_ownership(), migrate_default_owner, V19 FK migration, replace hardcoded 'default' user IDs

- Add V19__ownership_fk.sql (programmatic-only, not in auto-migration sweep)
- Add `migrate_default_owner` to Database trait + both PgBackend and LibSqlBackend
- Add `get_or_create_user` default method to UserStore trait
- Add `bootstrap_ownership()` to app.rs, called in init_database() after connect_with_handles
- Replace hardcoded "default" owner_id in cli/config.rs, cli/mcp.rs, cli/mod.rs, orchestrator/mod.rs
- Add TODO(ownership) comments in llm/session.rs and tools/mcp/client.rs for deferred constructors

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): atomic get_or_create_user, transactional migrate_default_owner, V19 FK inline constant, fix remaining 'default' user IDs

- Delete migrations/V19__ownership_fk.sql so refinery no longer auto-applies FK constraints before bootstrap_ownership runs; add OWNERSHIP_FK_SQL constant with TODO for future programmatic application
- Remove racy SELECT+INSERT default in UserStore::get_or_create_user; both PostgreSQL (ON CONFLICT DO NOTHING) and libSQL (INSERT OR IGNORE) now use atomic upserts
- Wrap migrate_default_owner in explicit transactions on both backends for atomicity
- Make bootstrap_ownership failure fatal (propagate error instead of warn-and-continue)
- Fix mcp auth/test --user: change from default_value="default" to Option<String> resolved from configured owner_id
- Replace hardcoded "default" user IDs in channels/wasm/setup.rs with config.owner_id
- Replace "default" sentinel in OrchestratorState test helper with "<unset>" to make the test-only nature explicit

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): remove default user_id from create_job(), change sentinel strings to <unset>

- Gate ContextManager::create_job() behind #[cfg(test)]; production code must
  use create_job_for_user() with an explicit user_id to prevent DB rows with
  user_id = 'default' being silently created on the production write path.
- Change the placeholder user_id in McpClient::new(), new_with_name(), and
  new_with_config() from "default" to "<unset>" so accidental secrets/settings
  lookups surface immediately rather than silently touching the wrong DB partition.
- Same sentinel change for SessionManager::new() and new_async() in session.rs;
  these are overwritten by attach_store() at startup with the real owner_id.
- Update tests that asserted the old "default" sentinel to expect "<unset>", and
  switch test_list_jobs_tool / test_job_status_tool to create_job_for_user("default")
  to keep ownership alignment with JobContext::default().

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(db): add ChannelPairingStore sub-trait with resolve_channel_identity, upsert/approve pairing, PostgreSQL + libSQL implementations

Adds PairingRequestRecord, ChannelPairingStore trait (5 methods), and
generate_pairing_code() to src/db/mod.rs; implements for PgBackend in
postgres.rs and LibSqlBackend in libsql/pairing.rs; wires ChannelPairingStore
into the Database supertrait bound; all 6 libSQL unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(db): atomic libSQL approve_pairing with BEGIN IMMEDIATE, add case-insensitive/expired/double-approve tests

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(ownership): add OwnershipCache for zero-DB-read identity resolution on warm path

Converts src/ownership.rs to src/ownership/ module directory and adds
src/ownership/cache.rs with a write-through in-process cache mapping
(channel, external_id) -> Identity. Wired as Arc<OwnershipCache> on
AppComponents for Task 8 pairing integration. All 7 cache unit tests pass.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(e2e): add ownership model E2E tests and extend pairing tests for DB-backed store

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): remove unused asyncio import, add fallback assertion in test_pairing_response_structure

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(tenant): unit tests for TenantScope::with_identity and AdminScope construction

Adds 5 focused unit tests verifying TenantScope::with_identity stores the
full Identity (owner_id + role), TenantScope::new creates a Member-role
identity, and AdminScope::new returns Some for Admin and None for Member.
Uses LibSqlBackend::new_memory() as the test DB stub.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): recover from RwLock poison instead of expect() in OwnershipCache

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(ownership): integration tests for bootstrap, tenant isolation, and ChannelPairingStore

Adds tests/ownership_integration.rs covering migrate_default_owner idempotency,
TenantScope per-user setting isolation (including Admin role bypass check),
and the full ChannelPairingStore lifecycle (upsert, approve, remove, multi-channel isolation).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate pairing tests and flaky random-code assertion from integration suite

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(pairing): rewrite PairingStore to DB-backed async with OwnershipCache

Replaces the file-based pairing store (~/.ironclaw/*-pairing.json,
*-allowFrom.json) with a DB-backed async implementation that delegates
to ChannelPairingStore and writes through to OwnershipCache on reads.

- PairingStore::new(db, cache) uses the DB; new_noop() for test/no-DB
- resolve_identity() cache-first lookup via OwnershipCache
- approve(code, owner_id) removes channel arg (DB looks up by code)
- All WASM host functions updated: pairing_upsert_request uses block_in_place,
  pairing-is-allowed renamed to pairing-resolve-identity returning Option<String>,
  pairing-read-allow-from deprecated (returns empty list)
- Signal channel receives PairingStore via new(config, db) constructor
- Web gateway pairing handlers read from state.store (DB) directly
- extensions.rs derive_activation_status drops PairingStore dependency;
  derives status from extension.active and owner_binding flag instead
- All test call sites updated to use new_noop()

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pairing): add missing pairing_store field to all GatewayState initializers, fix disk-full post-edit compile

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(channels): remove owner_id from IncomingMessage, user_id is the canonical resolved OwnerId

`owner_id` on `IncomingMessage` was always a duplicate of `user_id` —
both fields held the same value at every call site. Remove the field and
`with_owner_id()` builder, update the four WASM-wrapper and HTTP test
assertions to use `user_id`, and drop the redundant struct literal field
in the routine_engine test helper.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(channels): remove stale owner_id param from make_message test helper

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* test(e2e): add browser/Playwright tests for ownership model — auth screen, chat UI, owner login

Adds five Playwright-based browser tests to the ownership model E2E suite
verifying the web UI experience: authenticated owner sees chat input, unauthenticated
browser sees auth screen, owner can send a message and receive a response, settings
tab renders without errors, and basic page structure is correct after login.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* feat(settings): migrate channel credentials from plaintext settings to encrypted secrets store

Moves nearai.session_token from the plaintext DB settings table to the
AES-256-GCM encrypted secrets store (key: nearai_session_token).

- SessionManager gains an `attach_secrets()` method that wires in the
  secrets store; `save_session` writes to it when available and
  `load_session_from_secrets` is called preferentially over settings
- `migrate_session_credential()` runs idempotently on each startup in
  `init_secrets()`, reading the JSON session from settings, writing it
  to secrets, then deleting the plaintext copy
- Wizard's `persist_session_to_db` now writes to secrets first, falling
  back to plaintext settings only when secrets store is unavailable
- Plaintext settings path is preserved as fallback for installs without
  a secrets store (no master key configured)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(settings): settings fallback only when no secrets store, verify decryption before deleting plaintext

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): ROLLBACK in libSQL migrate_default_owner, shared OwnershipCache across channels, add dynamic_tools to migration, fix doc comment

- libSQL migrate_default_owner: wrap UPDATE loop in async closure + match to emit ROLLBACK on any mid-transaction failure (mirroring approve_pairing pattern)
- Both backends: add dynamic_tools to the migrate_default_owner table list so agent-built tools are migrated on first pairing
- setup_wasm_channels: accept Arc<OwnershipCache> parameter instead of allocating a fresh cache, share the AppComponents cache
- SignalChannel::new: accept Arc<OwnershipCache> parameter and pass it to PairingStore instead of allocating a new cache
- PairingStore: fix module-level and struct-level doc comments to accurately describe lazy cache population after approve()

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(web): use can_act_on for authorization in job/routine handlers instead of raw string comparisons

Replace 12 raw `user_id != user.user_id` / `user_id == user.user_id` string comparisons
in jobs.rs and 4 in routines.rs with calls through the canonical `can_act_on` function
from `crate::ownership`, which is the spec-mandated authorization mechanism.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* chore: include remaining modified files in ownership model branch

* fix: add pairing_store field to test GatewayState initializers, update PairingStore API calls in integration tests

Add missing `pairing_store: None` to all GatewayState struct initializers
in test files. Migrate old file-based PairingStore API calls
(PairingStore::new(), PairingStore::with_base_dir()) to the new DB-backed
API (PairingStore::new_noop()). Rewrite pairing_integration.rs to use
LibSqlBackend with the new async DB-backed PairingStore API.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* chore: cargo fmt

* fix(pairing): truly no-op PairingStore noop mode, ensure owner user in CLI, fix signal safety comments

- PairingStore::upsert_request now returns a dummy record in noop mode instead of
  erroring, and approve silently succeeds (matching the doc promise of "writes
  are silently discarded").
- PairingStore::approve now accepts a channel parameter, matching the updated
  DB trait signature and propagated to all call sites (CLI, web server, tests).
- CLI run_pairing_command ensures the owner user row exists before approval to
  satisfy the FK constraint on channel_identities.owner_id.
- Signal channel block_in_place safety comments corrected from "WASM channel
  callbacks" to "Signal channel message processing".

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pairing): thread channel through approve_pairing, add created flag, retry on code collision, remove redundant indexes

Addresses PR review comments:
- approve_pairing validates code belongs to the given channel
- PairingRequestRecord.created replaces timing heuristic
- upsert retries on UNIQUE violation (up to 3 attempts)
- redundant indexes removed (UNIQUE creates implicit index)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(ownership): migrate api_tokens, serialize PG approvals, propagate resolved owner_id

Addresses PR review P1/P2 regressions:

- api_tokens included in migrate_default_owner (both backends)
- PostgreSQL approve_pairing uses FOR UPDATE to prevent concurrent approvals
- Signal resolve_sender_identity returns owner_id, set as IncomingMessage.user_id
  with raw phone number preserved as sender_id for reply routing
- Feishu uses resolved owner_id from pairing_resolve_identity in emitted message
- PairingStore noop mode logs warning when pairing admission is impossible

[skip-regression-check]

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-review): sanitize DB errors in pairing handlers, fix doc comments, add TODO for derive_activation_status

- Pairing list/approve handlers no longer leak DB error details to clients
- NotFound errors return user-friendly 'Invalid or expired pairing code' message
- Module doc in pairing/store.rs corrected (remove -> evict, no insert method)
- wit_compat.rs stub comment corrected to match actual Val shape
- TODO added for derive_activation_status has_paired approximation

* fix(pr-review): propagate libSQL query errors in approve_pairing, round-trip validate session credential migration, fix test doc comment

- libSQL approve_pairing: .ok().flatten() replaced with .map_err() to propagate DB errors
- migrate_session_credential: round-trip compares decrypted secret against plaintext before deleting
- ownership_integration.rs: doc comment corrected to match actual test coverage

* fix(pairing): store meta, wrap upserts in transactions, case-insensitive role/channel, log Signal DB errors, use auth role in handlers

- Store meta JSONB/TEXT column in pairing_requests (PG migration V18, libSQL schema + incremental migration 19)
- Wrap upsert_pairing_request in transactions (PG: client.transaction(), libSQL: BEGIN IMMEDIATE/COMMIT/ROLLBACK)
- Case-insensitive role parsing: eq_ignore_ascii_case("admin") in both backends
- Case-insensitive channel matching in approve_pairing: LOWER(channel) = LOWER($2)
- Log DB errors in Signal resolve_sender_identity instead of silently discarding
- Use auth role from UserIdentity in web handlers (jobs.rs, routines.rs) via identity_from_auth helper
- Fix variable shadowing: rename `let channel` to `let req_channel` in libsql approve_pairing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): add auth to pairing list, cache eviction on deactivate, runtime assert in Signal, remove default fallback, warn on noop pairing codes

Addresses zmanian's review:
- #1: pairing_list_handler requires AuthenticatedUser
- #2: OwnershipCache.evict_user() evicts all entries for a user on suspension
- #3: debug_assert! for multi-thread runtime in Signal block_in_place
- #9: Noop PairingStore warns when generating unredeemable codes
- #10: cli/mcp.rs default fallback replaced with <unset>

* fix(pairing): consistent LOWER() channel matching in resolve_channel_identity, fix wizard doc comment, fix E2E test assertion for ActionResponse convention

* fix(pairing): apply LOWER() consistently across all ChannelPairingStore queries (upsert, list_pending, remove)

All channel matching now uses LOWER() in both PostgreSQL and libSQL backends:
- upsert_pairing_request: WHERE LOWER(channel) = LOWER($1)
- list_pending_pairings: WHERE LOWER(channel) = LOWER($1)
- remove_channel_identity: WHERE LOWER(channel) = LOWER($1)

Previously only resolve_channel_identity and approve_pairing used LOWER(),
causing inconsistent matching when channel names differed by case.

* fix(pairing): unify code challenge flow and harden web pairing

* test: harden pairing review follow-ups

* fix: guard wasm pairing callbacks by runtime flavor

* fix(pairing): normalize channel keys and serialize pg upserts

* chore(web): clean up ownership review follow-ups

* Preserve WASM pairing allowlist compatibility

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* Fix turn cost footer and per-turn usage accounting

* Avoid panicking on poisoned turn usage mutex

* Tighten SSE usage regression coverage

* Report usage for interrupted turns
…tags (#1952)

* fix(llm): invert reasoning default — unknown models skip <think>/<final> injection

When NEAR AI model="auto" resolves server-side to Qwen 3.5, the system
prompt injected <think>/<final> tags because "auto" didn't match any
known native-thinking pattern. This caused empty responses:

1. Qwen 3.5's native thinking puts reasoning in a `reasoning` field
   (not `reasoning_content`) — silently dropped due to field name mismatch
2. Content contained only <think> tags or <tool_call> XML, which
   clean_response() stripped to empty → "I'm not sure how to respond"

Three fixes:
- Invert the default: new requires_think_final_tags() with empty allowlist
  means unknown/alias models get the safe direct-answer prompt
- Add #[serde(alias = "reasoning")] so vLLM's field name is accepted
- Update active_model from API response.model so capability checks
  use the resolved model name after the first call

Confirmed via direct API testing against NEAR AI staging with
Qwen/Qwen3.5-122B-A10B.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* remove model alias resolution from nearai_chat

auto should stay as the active model name — no reason to overwrite it
with the resolved model since requires_think_final_tags() returns false
for both "auto" and the resolved name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix wording: remove native-thinking assumption from direct-answer prompt

The direct-answer prompt is now the default for all models, not just
native-thinking ones. Remove misleading "handled natively" language.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix bootstrap ownership migration for dynamic tools

* Add bootstrap ownership regression coverage
* feat(embeddings): add bedrock provider

* refactor(embeddings): address review feedback

* fix: CI failures and review feedback for Bedrock embeddings

- Replace ENV_MUTEX.lock() with lock_env() to match staging's test pattern
- Use db_first_or_default() for non-bedrock model resolution (staging API)
- Validate returned embedding dimension matches configured dimension

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(security): run safety checks on truncated tool output

Previously, `sanitize_tool_output()` returned immediately after
truncating oversized output, skipping leak detection, policy
enforcement, and injection scanning entirely. This allowed an attacker
to embed malicious payloads in the first N bytes of oversized tool
output and have them delivered unsanitized to the LLM.

Restructure the truncation path so it feeds into the same safety
pipeline as non-truncated content: leak detection, policy checks,
and Aho-Corasick injection scanning all run on the (possibly
truncated) content before it is returned.

Adds regression tests to verify truncated output is still scanned.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply rustfmt to fix CI formatting check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Wui <wui@Wui-Work-2.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
…5957

chore: promote staging to staging-promote/833dd32a-23994503713 (2026-04-05 05:33 UTC)
…3713

chore: promote staging to staging-promote/54519776-23969879338 (2026-04-05 04:48 UTC)
…9338

chore: promote staging to staging-promote/0588dd1b-23968459241 (2026-04-04 02:58 UTC)
…9241

chore: promote staging to staging-promote/994f9607-23958793145 (2026-04-04 01:32 UTC)
@henrypark133 henrypark133 merged commit 3a5295d into staging-promote/b91bdbdb-23954805921 Apr 9, 2026
3 of 5 checks passed
@henrypark133 henrypark133 deleted the staging-promote/994f9607-23958793145 branch April 9, 2026 04:29
@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: channel Channel infrastructure scope: channel/cli TUI / CLI channel scope: channel/web Web gateway channel scope: channel/wasm WASM channel runtime scope: tool/builtin Built-in tools scope: tool/mcp MCP client scope: db Database trait / abstraction scope: db/postgres PostgreSQL backend labels Apr 9, 2026
@github-actions github-actions bot added scope: db/libsql libSQL / Turso backend scope: llm LLM integration scope: workspace Persistent memory / workspace scope: orchestrator Container orchestrator scope: worker Container worker scope: extensions Extension management scope: setup Onboarding / setup scope: pairing Pairing mode scope: docs Documentation labels Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: channel/cli TUI / CLI channel scope: channel/wasm WASM channel runtime scope: channel/web Web gateway channel scope: channel Channel infrastructure scope: ci CI/CD workflows scope: db/libsql libSQL / Turso backend scope: db/postgres PostgreSQL backend scope: db Database trait / abstraction scope: docs Documentation scope: extensions Extension management scope: llm LLM integration scope: orchestrator Container orchestrator scope: pairing Pairing mode scope: setup Onboarding / setup scope: tool/builtin Built-in tools scope: tool/mcp MCP client scope: worker Container worker scope: workspace Persistent memory / workspace size: S 10-49 changed lines staging-promotion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants