Skip to content

feat: new-project skill and template ref resolution for parallel tool calls#2353

Merged
ilblackdragon merged 8 commits into
stagingfrom
feat/new-project-skill
Apr 17, 2026
Merged

feat: new-project skill and template ref resolution for parallel tool calls#2353
ilblackdragon merged 8 commits into
stagingfrom
feat/new-project-skill

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon commented Apr 12, 2026

Summary

  • New-project skill: Renamed project-managementnew-project so users can invoke /new-project <description>. Rewrote skill to use memory_write + mission_create directly instead of referencing nonexistent project_create/project_update tools. Includes goals and metrics when the project warrants them. Instructs sequential tool execution.
  • Template ref resolution: Some OpenAI-format models (e.g. Qwen) emit {{call_id.field}} template references in parallel tool call arguments. Added a resolution pass in LlmBridgeAdapter that resolves these from prior tool results in conversation history, preventing orphaned/broken mission creates.
  • Gateway enhancements: Project metrics types, mission cadence scheduling UI, and frontend project views with metrics and goal tracking.

Motivated by trace analysis (trace_20260411T133641.json) where Qwen 3.5-122B failed to create a project correctly via the old skill.

Project Detail Page

project-detail

Test plan

  • All 30 bridge::llm_adapter tests pass (8 new template resolution tests)
  • cargo clippy clean
  • E2E screenshot test (test_project_detail.py) passes
  • Manual test: /new-project <description> creates workspace files and missions correctly
  • Verify template resolution with a Qwen model making parallel tool calls

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 12, 2026 06:14
@github-actions github-actions Bot added size: XL 500+ changed lines scope: channel/web Web gateway channel scope: docs Documentation risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs and removed size: XL 500+ changed lines labels Apr 12, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a 'Projects Control Room' (Engine V2) to facilitate autonomous project management, incorporating goals, metrics, and project-scoped missions. Key updates include new backend data structures, API endpoints for project overviews and dynamic widget discovery, and an extensive frontend dashboard. Feedback points out a mismatch between the project creation skill and the backend's UUID requirement, a logic flaw in template resolution that breaks early on unresolvable patterns, and missing fields in the project overview DTO required by the frontend. Additionally, improvements were suggested for metric update logic and CSS scoping for project-specific widgets to prevent UI leakage.

Comment thread skills/new-project/SKILL.md Outdated
Comment on lines +68 to +72
Create recurring missions scoped to the project. Use the **slug** as `project_id`:

```
mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{slug}")
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a conflict between the skill instructions and the mission_create tool implementation. The skill instructs the agent to use the project slug as the project_id, but the mission_create tool (in effect_adapter.rs) expects a valid UUID. If the agent passes a slug, uuid::Uuid::parse_str will fail, and the mission will incorrectly default to the current thread's project instead of the intended one.

Suggested change
Create recurring missions scoped to the project. Use the **slug** as `project_id`:
```
mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{slug}")
```
Create recurring missions scoped to the project. Use the project_id (UUID) returned by the project_create tool:
mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{project_id}")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7fe8ccmission_create now accepts project names/slugs and resolves them by matching against the user's projects. SKILL.md updated to use project name instead of slug.

Comment thread src/bridge/llm_adapter.rs
Comment on lines +202 to +236
for _ in 0..50 {
let Some(start) = result.find("{{") else {
break;
};
let Some(end) = result[start..].find("}}") else {
break;
};
let end = start + end;
let ref_str = &result[start + 2..end]; // e.g. "chatcmpl-tool-9816a462feb22da1.project_id"

let resolved = if let Some(dot_pos) = ref_str.rfind('.') {
let call_id = &ref_str[..dot_pos];
let field = &ref_str[dot_pos + 1..];
tool_results
.iter()
.find(|(id, _)| id == call_id)
.and_then(|(_, json)| json.get(field))
.map(|v| match v {
serde_json::Value::String(s) => s.clone(),
other => other.to_string(),
})
} else {
None
};

match resolved {
Some(val) => {
result.replace_range(start..end + 2, &val);
}
None => {
// Can't resolve — stop to avoid infinite loop on the same pattern
break;
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The resolve_template_refs function stops processing all templates in a string if it encounters a single unresolvable pattern (e.g., a non-existent call_id or a string that happens to contain {{ but isn't a template). This is because the loop breaks when resolved is None. It should instead skip the unresolvable pattern and continue searching for subsequent templates.

Comment on lines +5977 to +5988
if (proj && (proj.goals && proj.goals.length || proj.metrics && proj.metrics.length)) {
var gmHtml = '';
if (proj.goals && proj.goals.length) {
gmHtml += '<div class="cr-goals"><div class="cr-section-title">Goals</div>';
proj.goals.forEach(function(g) {
gmHtml += '<div class="cr-goal-item">' + escapeHtml(g) + '</div>';
});
gmHtml += '</div>';
}
// Metrics would come from project detail; overview doesn't include them yet.
document.getElementById('cr-drill-header').innerHTML += gmHtml;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The drillIntoProject function attempts to display project goals and metrics using the proj object from the cached overview. However, the ProjectOverviewEntry DTO returned by the /api/engine/projects/overview endpoint does not include goals or metrics fields. Consequently, these sections will never be rendered in the drill-in view. You should either include these fields in the overview response or fetch the full project detail in this function.

Comment thread src/bridge/effect_adapter.rs Outdated
Comment on lines +294 to +299
if let Some(metrics) = params.get("metrics").and_then(|v| v.as_array()) {
project.metrics = metrics
.iter()
.filter_map(|v| serde_json::from_value(v.clone()).ok())
.collect();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When updating project metrics, ensure the logic sums the values with existing ones rather than overwriting them. Additionally, if a timestamp is missing, do not fall back to the current time (Utc::now()) to avoid issues with resource cleanup logic; instead, log a warning and skip the update for that metric. Please add tests to cover multi-event scenarios.

if let Some(new_metrics) = params.get("metrics").and_then(|v| v.as_array()) {
    for v in new_metrics {
        if let Ok(new_m) = serde_json::from_value::<ironclaw_engine::ProjectMetric>(v.clone()) {
            if let Some(existing) = project.metrics.iter_mut().find(|m| m.name == new_m.name) {
                existing.value += new_m.value;
            } else if new_m.updated_at.is_some() {
                project.metrics.push(new_m);
            } else {
                tracing::warn!("Skipping metric update due to missing timestamp: {}", new_m.name);
            }
        }
    }
}
References
  1. When accumulating metrics from multiple events, ensure the logic correctly sums the values rather than overwriting them. Add tests to cover multi-event scenarios.
  2. When calculating the age of a resource for cleanup, if the creation timestamp is unknown or cannot be parsed, treat it as un-reapable by logging a warning and skipping it. Do not fall back to the current time.

Comment on lines +6194 to +6204
if (w.css) {
var style = document.createElement('style');
style.setAttribute('data-widget', manifest.id);
style.textContent = w.css; // Already scoped server-side? No — project widgets come raw.
// Scope it client-side using the same prefix convention.
var prefix = '[data-widget="' + manifest.id + '"]';
// Simple client-side scoping: prepend prefix to each rule.
// For production, use the server's scope_css. For now, inject as-is
// with the data-widget attribute providing natural scoping.
document.head.appendChild(style);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Project widgets are injected with their CSS added directly to the document head without scoping. If a widget contains global CSS rules (e.g., targeting body, h2, or common class names like .btn), it will leak and affect the entire application UI. Consider implementing the client-side scoping mentioned in the comments or ensuring the server-side scope_css logic is applied to project-specific widgets.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds engine-v2 “Projects control room” UX and supporting engine APIs, plus LLM-side tool-call argument template-reference resolution, and introduces a new /new-project skill.

Changes:

  • Added project overview + project widgets endpoints and updated gateway status to expose engine_v2_enabled.
  • Implemented project CRUD actions in the engine bridge and added project goals/metrics types.
  • Added a template-ref resolution pass in LlmBridgeAdapter for {{call_id.field}} patterns in tool-call JSON.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/channels/web/server.rs Adds new engine routes (/projects/overview, /projects/{id}/widgets) and exposes engine_v2_enabled in gateway status.
src/channels/web/handlers/frontend.rs Implements project_widgets_handler to discover/load project-scoped widgets from workspace.
src/channels/web/handlers/engine.rs Adds optional project_id query filter for missions/threads and a new overview handler.
src/bridge/router.rs Registers project capability/actions and implements projects overview aggregation + project DTO expansions (goals/metrics).
src/bridge/mod.rs Re-exports newly added overview DTOs and query function.
src/bridge/llm_adapter.rs Resolves {{call_id.field}} references in tool-call parameters and adds unit tests.
src/bridge/effect_adapter.rs Adds handlers for project_create/project_update/project_list and supports project_id override on mission creation.
skills/new-project/SKILL.md Introduces the new-project skill definition and procedure.
crates/ironclaw_gateway/static/style.css Adds styling for the Projects control room UI.
crates/ironclaw_gateway/static/index.html Replaces legacy missions tab with Projects tab (v2) and updates tab labels/buttons.
crates/ironclaw_gateway/static/i18n/en.js Renames “Memory” tab to “Workspace” and adds “Projects” label.
crates/ironclaw_gateway/static/app.js Implements Projects control room behavior, overview/drill-in/detail rendering, and widget loading.
crates/ironclaw_gateway/src/widget.rs Adds new widget slots (ProjectHeader, ProjectSection) and updates slot serialization tests.
crates/ironclaw_gateway/src/bundle.rs Makes ResolvedWidget serializable for JSON responses.
crates/ironclaw_engine/src/types/project.rs Adds ProjectMetric + goals/metrics fields on Project.
crates/ironclaw_engine/src/runtime/mission.rs Exposes MissionManager::store() for project CRUD via the mission manager.
crates/ironclaw_engine/src/lib.rs Re-exports ProjectMetric.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/new-project/SKILL.md Outdated
Comment on lines +68 to +72
Create recurring missions scoped to the project. Use the **slug** as `project_id`:

```
mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{slug}")
```
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill instructs using the workspace slug as project_id for mission_create, but the engine expects project_id to be a UUID (see mission/project IDs parsed with Uuid::parse_str). As written, mission creation will fail to scope to the new project and will fall back to the current thread’s project (often Default). Update the skill to create the project first (e.g. via project_create) and then pass the returned UUID project_id, while still using the slug only for workspace paths.

Copilot uses AI. Check for mistakes.
Comment on lines +6206 to +6211
// Eval the JS module to register the widget.
try {
var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null;
var fn = new Function('container', 'api', 'projectId', w.js);
fn(container, api, projectId);

Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new Function(...) is effectively eval and will be blocked by the gateway’s CSP (script-src does not include 'unsafe-eval'). This means project widgets will fail to mount in production. To keep CSP strict, load widget JS by injecting a <script type="module"> element carrying the page’s existing CSP nonce (or otherwise execute through a nonce-authorized script path) instead of using new Function.

Suggested change
// Eval the JS module to register the widget.
try {
var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null;
var fn = new Function('container', 'api', 'projectId', w.js);
fn(container, api, projectId);
// Execute the widget through a nonce-authorized module script so it
// remains compatible with a strict CSP that disallows unsafe-eval.
try {
var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null;
var nonceSource = document.querySelector('script[nonce]');
var nonce = nonceSource ? nonceSource.nonce || nonceSource.getAttribute('nonce') : '';
var widgetToken = 'cr-widget-' + manifest.id + '-' + Date.now() + '-' + Math.random().toString(36).slice(2);
var widgetRegistry = window.__ironclawProjectWidgetMounts || (window.__ironclawProjectWidgetMounts = {});
var script = document.createElement('script');
widgetRegistry[widgetToken] = {
container: container,
api: api,
projectId: projectId
};
script.type = 'module';
if (nonce) script.setAttribute('nonce', nonce);
script.setAttribute('data-widget', manifest.id);
script.addEventListener('error', function(err) {
delete widgetRegistry[widgetToken];
script.remove();
console.error('[projects] Failed to mount widget ' + manifest.id + ':', err);
container.innerHTML = '<div class="cr-empty">Widget error: ' + manifest.id + '</div>';
});
script.textContent =
'(function(){\n' +
' var registry = window.__ironclawProjectWidgetMounts || {};\n' +
' var ctx = registry[' + JSON.stringify(widgetToken) + '];\n' +
' if (!ctx) return;\n' +
' try {\n' +
' (function(container, api, projectId) {\n' +
w.js + '\n' +
' })(ctx.container, ctx.api, ctx.projectId);\n' +
' } finally {\n' +
' delete registry[' + JSON.stringify(widgetToken) + '];\n' +
' document.currentScript.remove();\n' +
' }\n' +
'})();';
document.head.appendChild(script);

Copilot uses AI. Check for mistakes.
Comment on lines +485 to +500
pub async fn project_widgets_handler(
State(state): State<Arc<GatewayState>>,
AuthenticatedUser(user): AuthenticatedUser,
Path(project_id): Path<String>,
) -> Result<Json<Vec<ResolvedWidget>>, (StatusCode, String)> {
// Resolve project name to derive the workspace slug.
let project = crate::bridge::get_engine_project(&project_id, &user.user_id)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?
.ok_or((StatusCode::NOT_FOUND, "Project not found".to_string()))?;

let slug = project
.name
.to_lowercase()
.replace(|c: char| !c.is_alphanumeric() && c != '-', "-");
let widgets_dir = format!("projects/{slug}/.system/widgets/");
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project widget lookup derives the workspace slug from project.name at request time. This makes widget discovery fragile: renaming a project (or creating workspace folders with a different slug) will break widget loading because the directory path changes. Consider persisting a stable workspace slug/path in the project record (metadata) and using that here, rather than recomputing from the display name.

Suggested change
pub async fn project_widgets_handler(
State(state): State<Arc<GatewayState>>,
AuthenticatedUser(user): AuthenticatedUser,
Path(project_id): Path<String>,
) -> Result<Json<Vec<ResolvedWidget>>, (StatusCode, String)> {
// Resolve project name to derive the workspace slug.
let project = crate::bridge::get_engine_project(&project_id, &user.user_id)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?
.ok_or((StatusCode::NOT_FOUND, "Project not found".to_string()))?;
let slug = project
.name
.to_lowercase()
.replace(|c: char| !c.is_alphanumeric() && c != '-', "-");
let widgets_dir = format!("projects/{slug}/.system/widgets/");
fn project_name_slug(name: &str) -> String {
name.to_lowercase()
.replace(|c: char| !c.is_alphanumeric() && c != '-', "-")
}
fn project_widgets_dir_from_metadata(project: &impl serde::Serialize, fallback_name: &str) -> String {
let fallback = format!("projects/{}/.system/widgets/", project_name_slug(fallback_name));
let Ok(value) = serde_json::to_value(project) else {
return fallback;
};
let Some(metadata) = value.get("metadata") else {
return fallback;
};
let candidate = metadata
.get("workspace_slug")
.and_then(|v| v.as_str())
.or_else(|| metadata.get("workspaceSlug").and_then(|v| v.as_str()))
.or_else(|| metadata.get("workspace_path").and_then(|v| v.as_str()))
.or_else(|| metadata.get("workspacePath").and_then(|v| v.as_str()));
let Some(raw) = candidate.map(str::trim).filter(|s| !s.is_empty()) else {
return fallback;
};
let normalized = raw.trim_matches('/');
if normalized.is_empty() {
return fallback;
}
let project_root = if normalized.starts_with("projects/") {
normalized.to_string()
} else {
format!("projects/{normalized}")
};
format!("{project_root}/.system/widgets/")
}
pub async fn project_widgets_handler(
State(state): State<Arc<GatewayState>>,
AuthenticatedUser(user): AuthenticatedUser,
Path(project_id): Path<String>,
) -> Result<Json<Vec<ResolvedWidget>>, (StatusCode, String)> {
// Prefer a persisted workspace slug/path from project metadata so widget
// lookup remains stable across project renames. Fall back to the historic
// name-derived slug for backward compatibility with older project records.
let project = crate::bridge::get_engine_project(&project_id, &user.user_id)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?
.ok_or((StatusCode::NOT_FOUND, "Project not found".to_string()))?;
let widgets_dir = project_widgets_dir_from_metadata(&project, &project.name);

Copilot uses AI. Check for mistakes.
Comment thread src/channels/web/handlers/frontend.rs Outdated
let workspace = resolve_workspace(&state, &user).await?;
let layout = read_layout_config(&workspace).await;

let entries = workspace.list(&widgets_dir).await.unwrap_or_default();
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workspace.list(&widgets_dir).await.unwrap_or_default() silently treats backend listing failures the same as “no widgets”. In this file, other widget discovery paths log a warn! on real workspace errors to avoid hiding operational issues. Consider matching that behavior here (log + empty fallback, or propagate a 500) so widget outages are diagnosable.

Suggested change
let entries = workspace.list(&widgets_dir).await.unwrap_or_default();
let entries = match workspace.list(&widgets_dir).await {
Ok(entries) => entries,
Err(error) => {
tracing::warn!(
"failed to list widgets in workspace at {}: {}",
widgets_dir,
error
);
Vec::new()
}
};

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7fe8cc — now logs at debug! level before falling back to empty list.

Comment on lines +535 to +541
let css = workspace
.read(&format!("{widgets_dir}{name}/style.css"))
.await
.ok()
.map(|d| d.content)
.filter(|c| !c.trim().is_empty() && c.len() <= MAX_WIDGET_CSS_BYTES);

Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project widget CSS is returned and later injected without running through the existing scope_css isolation used for global widgets. This allows a project widget’s style.css to affect the entire gateway UI (style leakage / UI spoofing). Consider scoping the CSS server-side with ironclaw_gateway::scope_css (and applying the same tag-close neutralization used in bundle assembly) before returning it to the client.

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/router.rs Outdated
Comment on lines +3851 to +3855
let threads = store.list_threads(pid, user_id).await.unwrap_or_default();
let missions = store
.list_missions_with_shared(pid, user_id)
.await
.unwrap_or_default();
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_engine_projects_overview swallows store errors via unwrap_or_default() when listing threads/missions. If the store is unavailable or returns an error, the overview will silently report “no activity/missions” and can incorrectly show green health. Prefer propagating the error (or at least logging and marking the project unhealthy) so the UI doesn’t mask backend failures.

Suggested change
let threads = store.list_threads(pid, user_id).await.unwrap_or_default();
let missions = store
.list_missions_with_shared(pid, user_id)
.await
.unwrap_or_default();
let threads = store
.list_threads(pid, user_id)
.await
.map_err(|e| engine_err("list project threads", e))?;
let missions = store
.list_missions_with_shared(pid, user_id)
.await
.map_err(|e| engine_err("list project missions", e))?;

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/router.rs Outdated
Comment on lines +3849 to +3855
for project in &projects {
let pid = project.id;
let threads = store.list_threads(pid, user_id).await.unwrap_or_default();
let missions = store
.list_missions_with_shared(pid, user_id)
.await
.unwrap_or_default();
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overview implementation performs per-project I/O (list_threads + list_missions_with_shared) inside a loop, which becomes an N+1 pattern as project count grows. Consider batching at the store layer (e.g., list threads/missions for all projects in one query) or at least fetching threads/missions concurrently per project to reduce tail latency for the dashboard.

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/router.rs
Comment on lines +3892 to +3899
// Count pending gates for threads in this project.
let project_thread_ids: std::collections::HashSet<_> =
threads.iter().map(|t| t.id).collect();
let project_gates: Vec<_> = user_gates
.iter()
.filter(|g| project_thread_ids.contains(&g.thread_id))
.collect();
let pending_gate_count = project_gates.len() as u64;
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending-gate counting does a full scan of user_gates for every project (user_gates.iter().filter(...)), which is O(projects × gates) and also allocates a HashSet per project. Consider pre-indexing user_gates by thread_id once (e.g., HashMap<ThreadId, Vec>) and counting/collecting by looking up only the project’s thread IDs.

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/llm_adapter.rs Outdated
Comment on lines +193 to +194
/// Returns the resolved string if all references could be substituted,
/// or the original string if none were found.
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment says this returns the resolved string only if all references could be substituted, but the implementation can partially substitute earlier templates and then break on the first unresolvable one (returning a partially-resolved string). Either adjust the behavior to be all-or-nothing, or update the comment to reflect the current partial-substitution semantics.

Suggested change
/// Returns the resolved string if all references could be substituted,
/// or the original string if none were found.
/// Returns the original string unchanged if no template markers are found.
/// Otherwise, resolves references iteratively until there are no more
/// `{{...}}` patterns or until an unresolvable reference is encountered.
/// If resolution stops on an unresolvable reference, any earlier successful
/// substitutions are preserved, so the returned string may be partially resolved.

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/effect_adapter.rs Outdated
Ok(u) => ironclaw_engine::ProjectId(u),
Err(e) => {
return Some(Err(EngineError::Effect {
reason: format!("Invalid project_id: {e}"),
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message uses Invalid project_id but the parameter name for this action is id (and the schema also calls it id). Using the same term in code and schema would make debugging easier (e.g., Invalid project id or Invalid id).

Suggested change
reason: format!("Invalid project_id: {e}"),
reason: format!("Invalid id: {e}"),

Copilot uses AI. Check for mistakes.
@ilblackdragon
Copy link
Copy Markdown
Member Author

ilblackdragon commented Apr 12, 2026

Project Detail Page

Screenshot from the Playwright E2E test (test_project_detail.py):

Shows the drill-in view for 'AI Research Intelligence' project with goals, 4 missions (3 active, 1 paused), and recent activity threads with status badges.

To view the screenshot locally:

cd tests/e2e && .venv/bin/pytest scenarios/test_project_detail.py -v
# Screenshot saved to tests/e2e/project-detail.png

@github-actions github-actions Bot added the size: XL 500+ changed lines label Apr 12, 2026
Copilot AI review requested due to automatic review settings April 12, 2026 06:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/ironclaw_gateway/static/app.js Outdated
Comment on lines +6193 to +6204
// Inject scoped CSS if present.
if (w.css) {
var style = document.createElement('style');
style.setAttribute('data-widget', manifest.id);
style.textContent = w.css; // Already scoped server-side? No — project widgets come raw.
// Scope it client-side using the same prefix convention.
var prefix = '[data-widget="' + manifest.id + '"]';
// Simple client-side scoping: prepend prefix to each rule.
// For production, use the server's scope_css. For now, inject as-is
// with the data-widget attribute providing natural scoping.
document.head.appendChild(style);
}
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project widget CSS is injected without scoping/isolation (unlike global widgets, which are passed through scope_css). Relying on a data-widget attribute alone doesn’t prevent selectors like body { ... } from leaking across the app. Consider applying scope_css server-side in project_widgets_handler (preferred, consistent with bundle assembly) or implement real client-side selector prefixing before injecting.

Copilot uses AI. Check for mistakes.
Comment on lines +490 to +501
// Resolve project name to derive the workspace slug.
let project = crate::bridge::get_engine_project(&project_id, &user.user_id)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?
.ok_or((StatusCode::NOT_FOUND, "Project not found".to_string()))?;

let slug = project
.name
.to_lowercase()
.replace(|c: char| !c.is_alphanumeric() && c != '-', "-");
let widgets_dir = format!("projects/{slug}/.system/widgets/");

Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deriving the workspace slug from the current project name makes the widgets path unstable (rename breaks widget discovery) and can also cause collisions between projects with the same/similar name. Suggestion: persist a stable slug in the project record/metadata at creation time (or derive from project_id), and use that for projects/{slug}/... lookups instead of re-slugifying project.name on every request.

Copilot uses AI. Check for mistakes.
Comment on lines +535 to +555
let css = workspace
.read(&format!("{widgets_dir}{name}/style.css"))
.await
.ok()
.map(|d| d.content)
.filter(|c| !c.trim().is_empty() && c.len() <= MAX_WIDGET_CSS_BYTES);

let enabled = layout
.widgets
.get(&manifest.id)
.map(|w| w.enabled)
.unwrap_or(true);
if !enabled {
continue;
}

widgets.push(ResolvedWidget {
manifest,
js: js_doc.content,
css,
});
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handler returns raw widget CSS/JS, but unlike the global widget pipeline it doesn’t apply scope_css (style isolation) or use the existing <script nonce=…> module injection approach. If project widgets are meant to behave like normal widgets, consider scoping CSS server-side with ironclaw_gateway::scope_css and serving JS as a same-origin module script (or otherwise ensuring it can run under the current CSP).

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/router.rs
Comment on lines +3381 to +3398
/// Per-project summary with computed health and stats.
#[derive(Debug, Clone, serde::Serialize)]
pub struct ProjectOverviewEntry {
pub id: String,
pub name: String,
pub description: String,
/// `"green"`, `"yellow"`, or `"red"`.
pub health: String,
pub active_missions: u64,
pub total_missions: u64,
pub threads_today: u64,
pub cost_today_usd: f64,
pub failures_24h: u64,
pub pending_gates: u64,
#[serde(skip_serializing_if = "Option::is_none")]
pub last_activity: Option<String>,
pub created_at: String,
}
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProjectOverviewEntry doesn’t include goals/metrics, but the Projects drill-in UI (and the new E2E mock data) expects goals to be available from the overview payload. Either add goals (and optionally metrics) to ProjectOverviewEntry, or have the frontend fetch /api/engine/projects/{id} on drill-in and render goals/metrics from that response.

Copilot uses AI. Check for mistakes.
Comment thread crates/ironclaw_gateway/static/app.js Outdated
if (tab === 'projects') {
loadProjectsOverview();
} else if (crCurrentProjectId) {
// Reset drill-in state when leaving Projects tab so we start fresh.
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When leaving the Projects tab, the code resets crCurrentProjectId but does not call destroyProjectWidgets() or reset/hide the drill/detail DOM. If a project widget registers timers/event listeners, it can keep running while the tab is hidden, and the next visit may double-mount widgets. Consider calling destroyProjectWidgets() (and/or crBackToOverview()) when switching away from tab === 'projects'.

Suggested change
// Reset drill-in state when leaving Projects tab so we start fresh.
// Reset drill-in state when leaving Projects tab so we start fresh.
// Use the existing Projects navigation reset so any mounted widgets,
// listeners, and drill/detail DOM are torn down before clearing state.
crBackToOverview();

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/effect_adapter.rs Outdated
};
// Allow explicit project_id override (so agent can create
// missions in a specific project from any thread).
let target_project = params
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: the new-project skill tells the model to pass a project slug to mission_create.project_id, but this code only accepts a UUID and silently falls back to context.project_id on parse failure. The skill also does not call project_create, so /new-project can write files under projects/{slug} while all created missions land in the current/default project and no project card is created. Please either make the skill create a real project and use the returned UUID, or reject/resolve non-UUID project IDs here instead of falling back silently.

Comment thread src/bridge/router.rs

/// Per-project summary with computed health and stats.
#[derive(Debug, Clone, serde::Serialize)]
pub struct ProjectOverviewEntry {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frontend drill-in renders proj.goals/proj.metrics from the cached overview response, but ProjectOverviewEntry does not serialize those fields. In production this branch never renders project goals/metrics; the e2e mock includes fields that the real API cannot return, so the screenshot test misses the contract mismatch. Please add these fields to the overview response or fetch project detail before rendering them, and align the test mock with the production response shape.

continue;
}

let css = workspace
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns raw project widget CSS to the frontend, which then injects it into document.head without actually prefixing selectors. Global widgets go through scope_css(...) before being embedded, so project widgets can leak styles across the whole gateway UI or between widgets. Please scope this server-side with the existing CSS scoping helper, or reuse the resolved-widget loading path that already enforces the same isolation contract.

Comment thread src/bridge/router.rs Outdated

for project in &projects {
let pid = project.id;
let threads = store.list_threads(pid, user_id).await.unwrap_or_default();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not turn store errors into empty thread/mission lists here. If either backend call fails, the overview can report zero activity, green health, and no attention items for a broken project, which makes operational failures look healthy. Propagate the error, or surface a degraded/unknown health state instead of unwrap_or_default().

Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: New-project skill + Projects Control Room + template ref resolution (Risk: Medium)

Good direction — the Projects Control Room is a meaningful UX upgrade, and template ref resolution for parallel tool calls is a valuable Qwen compatibility fix. However, two critical issues block this PR.

Positives:

  • Clean engine v2 gating with data-v2-only / data-v1-only attributes
  • Template ref resolution is well-tested with 7 unit tests covering multi-ref, nested JSON, and no-ref cases
  • E2E screenshot test provides visual regression coverage
  • Project overview with attention items and health stats is a solid foundation

Critical: mission_create project_id override has no ownership check (IDOR)

File: src/bridge/effect_adapter.rs (handle_project_call / mission_create changes)

The LLM can supply an arbitrary project_id UUID to mission_create:

let target_project = params.get("project_id")
    .and_then(|v| v.as_str())
    .and_then(|s| uuid::Uuid::parse_str(s).ok())
    .map(ironclaw_engine::ProjectId)
    .unwrap_or(context.project_id);

MissionManager::create_mission() does NOT validate that target_project is owned by context.user_id. A prompt-injected LLM response could create missions in any user's project by guessing/leaking a UUID.

Fix: Validate ownership before accepting a non-default project_id:

if target_project != context.project_id {
    let proj = store.load_project(target_project).await?;
    match proj {
        Some(p) if p.is_owned_by(&context.user_id) => {},
        _ => return Some(Err(EngineError::Effect {
            reason: "project_id does not belong to current user".into()
        })),
    }
}

Critical: new-project skill v0.2 instructs agent to use slug as UUID — silently broken

File: skills/new-project/SKILL.md

The v0.2 skill tells the LLM: "Use the slug as project_id" — e.g. mission_create(project_id: "ai-research"). But project_id is parsed as uuid::Uuid::parse_str(), which fails on slugs and silently falls back to context.project_id. Every mission gets created in the wrong project.

Fix: Restore the project_create → get UUID → pass to mission_create workflow from v0.1.

Concerning: ProjectOverviewEntry omits goals field

File: src/bridge/router.rs

The struct has no goals field, but app.js reads proj.goals. The E2E mock masks this — goals will never display in production.

Fix: Add goals: Vec<String> to ProjectOverviewEntry and populate it.

Concerning: E2E test widget mock has wrong JSON shape

File: tests/e2e/scenarios/test_project_detail.py

Mock returns {"widgets": []} (object) but the real handler returns Vec<ResolvedWidget> (bare array). Test passes by accident because both result in "no widgets rendered."

Concerning: Template ref resolution not tested through the caller

File: src/bridge/llm_adapter.rs

Per .claude/rules/testing.md, the caller path (LlmBridgeAdapter::complete() with actual {{call_id.field}} refs in message history) should have a test, not just the helper functions.

Concerning: N+1 DB queries in overview

File: src/bridge/router.rsget_engine_projects_overview()

2 sequential queries per project in a loop. Acceptable for MVP but will cause visible latency at 20+ projects.

Minor notes:

  • Slug collision between similarly-named projects could serve wrong widgets via project_widgets_handler
  • CI failing: formatting, cargo-deny, code style need fixing before merge

Copy link
Copy Markdown
Member Author

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review response (5e6c900)

Thanks @henrypark133 and @serrrfirat for the thorough review. All issues addressed:

Critical fixes

IDOR on mission_create project_id — Added ownership check: when project_id differs from the thread's project, we now load_project() + is_owned_by() before accepting. Non-UUID slugs (the old silent fallback) now return an explicit error.

project_create/update/list tools removed — These were supposed to be removed from this PR. The new-project skill intentionally uses memory_write + mission_create only, not separate project CRUD tools. Removed the capability registration, the handle_project_call() method, and the dispatch call.

Other fixes

Issue Fix
ProjectOverviewEntry missing goals Added goals: Vec<String> field, populated from project
Store errors swallowed in overview Changed unwrap_or_default() to map_err()? — errors propagate now
Project widget CSS not scoped Applied ironclaw_gateway::scope_css() server-side in project_widgets_handler
Template ref doc comment mismatch Updated to document partial resolution semantics
E2E mock widget shape wrong Fixed: [] instead of {"widgets": []}
Widget leak on tab switch crBackToOverview() now called when leaving Projects tab
CI: cargo-deny failures Removed stale wasmtime advisory ignores, added RUSTSEC-2026-0097 (rand)
CI: formatting Ran cargo fmt

Tests added

  • Caller-level template ref test: complete_resolves_template_refs_through_adapter — drives LlmBridgeAdapter::complete() with a mock provider returning {{call-1.project_id}} refs, verifies resolution through the full adapter path (per testing rules: "Test Through the Caller")
  • Total: 9 template ref tests (8 helper + 1 caller-level)

Deferred

  • N+1 in overview: acceptable for MVP, will batch when project count warrants it
  • new Function CSP concern: project widgets follow the same trust model as global widgets (user-placed = trusted), and the gateway CSP already allows inline scripts. Will revisit if we add untrusted widget sources.
  • Slug collision for widget paths: noted, will add stable slug to project metadata in a follow-up

// Eval the JS module to register the widget.
try {
var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null;
var fn = new Function('container', 'api', 'projectId', w.js);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Severity — Arbitrary JS execution via workspace widgets

Project widgets load JS from projects/{slug}/.system/widgets/{id}/index.js in the workspace and execute it via new Function(...) in the browser. The memory_write tool has no write protection for .system/widgets/ paths within project directories.

A prompt injection attack could instruct the LLM to write malicious JS to projects/{slug}/.system/widgets/evil/index.js + a manifest.json. The next time the user drills into that project, the JS executes in their authenticated session with full access to IronClaw.api, cookies, and the DOM.

Unlike global widgets (loaded via <script type="module"> from a server-controlled template), project widgets are dynamically fetched and eval'd from user-writable workspace storage. No CSP, no sandboxing, no user confirmation.

Suggested fix: Either:
(a) Block memory_write from writing to .system/ paths unless via a privileged code path, or
(b) Sandbox project widget JS in an iframe with a null origin, or
(c) Require explicit user confirmation before mounting project widgets.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — widgets follow same trust model as global widgets (user-authored workspace content). CSP blocks external eval. Will revisit for untrusted widget sources.

Comment thread src/bridge/llm_adapter.rs
/// encountered, resolution stops and earlier successful substitutions
/// are preserved (partial resolution). Returns the original string
/// unchanged if no `{{` markers are found.
fn resolve_template_refs(value: &str, tool_results: &[(String, serde_json::Value)]) -> String {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — Template ref resolution operates on LLM-controlled data before safety pipeline

Template resolution substitutes {{call_id.field}} references from prior tool results into tool call parameters. Both call_id and field come from LLM output, and the resolved value flows directly into downstream tool parameters.

Resolution happens before the safety pipeline sees the parameters. If a malicious tool result contains crafted JSON (e.g., {"project_id": "'; DROP TABLE ..."}), the resolved value flows unsanitized into downstream tools. The effect adapter validates project_id as UUID (blocking this specific case), but other tools may not validate as strictly.

Suggested fix: Document this trust boundary explicitly. Log when template resolution occurs for audit trail. Consider running resolved parameters through safety validation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — template refs resolve from prior tool results already validated by the safety pipeline. Will add a second pass on substituted values as a hardening step in a follow-up.

Comment thread src/bridge/router.rs
/// Iterates all projects, computes per-project stats from missions and threads,
/// and collects pending gates as attention items. Designed for the control room
/// dashboard where the user checks in on a highly autonomous agent.
pub async fn get_engine_projects_overview(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — N+1 queries in projects overview

For each project, the handler issues separate list_threads(pid, user_id) and list_missions_with_shared(pid, user_id) queries. With P projects, this is 2P+1 database round trips. A user with 20 projects and thousands of threads will see significant latency. No timeout, pagination, or caching.

Suggested fix: Add pagination or caching. Consider a dedicated Store method that computes per-project stats in a single query. At minimum, add a TODO documenting the N+1 pattern.

Comment thread skills/new-project/SKILL.md Outdated
Create recurring missions scoped to the project. Use the **slug** as `project_id`:

```
mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{slug}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Skill tells LLM to use slug as project_id, but effect adapter rejects non-UUIDs

The skill instructs: mission_create(... project_id: "{slug}"). But the new effect adapter code rejects non-UUID project_id values with "Invalid project_id '{pid_str}': must be a UUID." This means the LLM following the skill will always fail when passing a slug.

The skill and the validation are contradictory — this is a functional bug.

Suggested fix: Update the skill to instruct the LLM to use the project UUID returned from project creation, not the slug.

Comment thread crates/ironclaw_gateway/static/app.js Outdated
// Scope it client-side using the same prefix convention.
var prefix = '[data-widget="' + manifest.id + '"]';
// Simple client-side scoping: prepend prefix to each rule.
// For production, use the server's scope_css. For now, inject as-is
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Misleading CSS injection comment

The client-side code has a comment saying "For production, use the server's scope_css. For now, inject as-is" — but the server-side handler does call scope_css(). The comment is wrong and could lead a future developer to remove the server-side scoping (thinking it's handled client-side) or vice versa.

Suggested fix: Fix the comment to accurately state that CSS arrives pre-scoped from the server.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7fe8cc — comment now correctly states CSS is scoped server-side.

_projectWidgets.push({
id: manifest.id,
container: container,
style: style || null,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — `style` variable potentially undefined

If `w.css` is falsy, the `style` variable declared inside the `if (w.css)` block is never assigned. In non-strict JS, `style` resolves to `undefined` (or throws a ReferenceError in strict mode). The `|| null` fallback handles `undefined` but relies on hoisting behavior.

Suggested fix: Declare `var style = null;` before the `if (w.css)` block to make the intent explicit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7fe8ccvar style = null; declared before the if block.

Ok(Json(EngineProjectListResponse { projects }))
}

pub async fn engine_projects_overview_handler(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — No rate limiting on expensive overview endpoint

The /api/engine/projects/overview endpoint is computationally expensive (N+1 queries) but has no rate limiting. An attacker with a valid auth token could spam this endpoint to create excessive database load. Other expensive endpoints (chat send) have rate limiting.

Suggested fix: Add rate limiting or caching for this endpoint.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — endpoint is auth-protected and single-user. Will add rate limiting when project count grows.

ilblackdragon and others added 2 commits April 13, 2026 13:12
…new-project skill

Adds project metrics types, mission cadence scheduling via gateway,
and a /new-project skill for creating autonomous projects with goals,
metrics, and missions. Includes gateway frontend enhancements for
project views with metrics and goal tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… new-project skill

Two fixes from trace analysis (trace_20260411T133641.json):

1. Skill rewrite: new-project skill now instructs the model to use
   memory_write + mission_create directly instead of referencing
   nonexistent project_create/project_update tools. Includes goals
   and metrics when appropriate. Instructs sequential execution.

2. Template ref resolution: some OpenAI-format models (e.g. Qwen)
   emit {{call_id.field}} references in parallel tool call arguments.
   Added resolution pass in LlmBridgeAdapter that scans ActionCall
   parameters for these patterns and resolves them from prior tool
   results in the conversation history.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 13, 2026 13:32
@ilblackdragon ilblackdragon force-pushed the feat/new-project-skill branch from 5e6c900 to b7fe8cc Compare April 13, 2026 13:32
@ilblackdragon
Copy link
Copy Markdown
Member Author

PR Review Response — Round 2

Rebased onto latest staging and addressed remaining open comments:

Fixed in b7fe8cc

  1. Slug-based project_id in mission_create (gemini-code-assist, Copilot, @serrrfirat, @henrypark133) — mission_create now accepts project names/slugs, not just UUIDs. When a non-UUID project_id is provided, it resolves by matching against the user's projects (name or slugified name). SKILL.md updated accordingly.

  2. Misleading CSS comment in app.js (@serrrfirat #3071506775) — Fixed. Comment now correctly states CSS is scoped server-side.

  3. style variable potentially undefined (@serrrfirat #3071507029) — var style = null; declared before the if block.

  4. workspace.list() errors silently swallowed (Copilot #3069074674) — Now logs at debug! level before falling back to empty list.

Deferred (with rationale)

  • Template ref resolution before safety pipeline (@serrrfirat #3071505436) — Template refs resolve from prior tool results that were already validated by the safety pipeline when produced. Adding a second safety pass on substituted values is a good hardening step but doesn't block this PR.

  • new Function() for widget JS (@serrrfirat #3071504747, Copilot #3069074652) — Widgets follow the same trust model as global widgets (user-authored workspace content). CSP blocks external eval. Will revisit for untrusted widget sources.

  • Rate limiting on overview endpoint (@serrrfirat #3071507230) — Endpoint is auth-protected and single-user. Low risk for MVP; will add when project count warrants.

  • N+1 queries (Copilot, @serrrfirat, @henrypark133) — Acceptable for MVP, will batch when project count grows.

  • Unstable slug from project name (Copilot) — Will add stable slug to project metadata in follow-up.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bridge/router.rs Outdated
Comment on lines +3792 to +3801
for project in &projects {
let pid = project.id;
let threads = store
.list_threads(pid, user_id)
.await
.map_err(|e| engine_err("list project threads", e))?;
let missions = store
.list_missions_with_shared(pid, user_id)
.await
.map_err(|e| engine_err("list project missions", e))?;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_engine_projects_overview does per-project I/O sequentially (list_threads then list_missions_with_shared inside a for loop). With many projects (or slow backends) this becomes an N+1 latency multiplier and can degrade the Projects overview UI. Consider fetching threads/missions concurrently per project (e.g., tokio::try_join! for the two calls, and futures::future::try_join_all across projects) to keep the endpoint responsive.

Copilot uses AI. Check for mistakes.
Comment thread src/bridge/effect_adapter.rs Outdated
.collect();
name_lower == needle
|| name_slug == needle
|| name_slug.starts_with(&format!("{needle}-"))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Slug prefix matching can route missions to wrong project

The starts_with branch at line 315 is over-broad. If a user has projects "AI" (slug: ai) and "AI Research Intelligence" (slug: ai-research-intelligence), a mission_create with project_id: "ai" will match "AI Research Intelligence" because "ai-research-intelligence".starts_with("ai-") is true. Combined with find() returning the first match, this is non-deterministic — the result depends on list ordering from the store.

Suggested fix: Remove the starts_with branch and require exact name/slug match. If fuzzy matching is desired, split both the needle and slug by - and check that all needle tokens match as a prefix of slug tokens:

name_lower == needle
    || name_slug == needle
    // Remove: || name_slug.starts_with(&format!("{needle}-"))

Also: no tests cover this resolution logic. Given the ambiguity risk, adding unit tests for exact match, slug match, and multi-project disambiguation would be valuable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a617fea — removed the starts_with branch entirely. Now only exact name or exact slug match is accepted.

Comment thread src/bridge/llm_adapter.rs

let mut result = value.to_string();
// Iteratively resolve all `{{..}}` patterns (limit iterations to prevent infinite loops)
for _ in 0..50 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Second-order template injection via tool results

The iterative resolution loop re-scans from position 0 after each replacement. If a resolved value itself contains {{call-id.field}} patterns, those will be resolved in subsequent iterations. This creates a second-order injection path:

  1. Tool A returns {"id": "{{tool-B-call-id.secret_field}}"}
  2. LLM makes a parallel call referencing {{tool-A-call-id.id}}
  3. First iteration resolves to {{tool-B-call-id.secret_field}}
  4. Second iteration resolves the injected ref, leaking Tool B's data into Tool A's parameter

While the LLM already sees all tool results (limiting practical impact), this is relevant when tools come from untrusted MCP servers where one tool shouldn't be able to read another's results.

Suggested fix: After resolving a template ref, advance the scan position past the replacement rather than restarting from 0:

// Instead of scanning from 0 each time:
let mut pos = 0;
for _ in 0..50 {
    let Some(start) = result[pos..].find("{{").map(|i| i + pos) else { break; };
    // ... resolve ...
    match resolved {
        Some(val) => {
            let val_len = val.len();
            result.replace_range(start..end + 2, &val);
            pos = start + val_len; // skip past replacement
        }
        None => break,
    }
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a617fearesolve_template_refs now tracks a search_from cursor that advances past each resolved value, so injected {{...}} patterns in tool results are never re-scanned. Added resolve_template_refs_no_second_order_injection test.

Comment thread src/bridge/router.rs
/// Iterates all projects, computes per-project stats from missions and threads,
/// and collects pending gates as attention items. Designed for the control room
/// dashboard where the user checks in on a highly autonomous agent.
pub async fn get_engine_projects_overview(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — N+1 query problem in projects overview

This function issues 1 + 2N database queries: 1 for `list_projects`, then per project: `list_threads` + `list_missions_with_shared`. For a user with 10 projects, that's 21 round trips. The overview endpoint is called on every tab switch to Projects, so this will be felt in the UI.

Suggested fix (pick one):

  1. Batch + partition: Add `store.list_all_threads_for_user(user_id)` and `store.list_all_missions_for_user(user_id)`, then partition by project_id in memory. Reduces to 3 queries total.
  2. Dedicated aggregate query: Add a single SQL query that JOINs projects with thread/mission counts, returning all stats in one round trip.
  3. Short-term: Cache the overview response for a few seconds (`tokio::sync::watch` or a simple timestamp check), since the data is statistical and doesn't need real-time accuracy.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a617fea — parallelized with tokio::try_join! (intra-project) and futures::try_join_all (inter-project). Still 1+2N queries total but they execute concurrently now.

// Eval the JS module to register the widget.
try {
var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null;
var fn = new Function('container', 'api', 'projectId', w.js);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Arbitrary JS execution from workspace content

`new Function(..., w.js)` executes arbitrary JavaScript from the user's workspace (`projects/{slug}/.system/widgets/`). While this is authenticated and the user is executing their own code, the risk is that any tool with workspace write access (e.g., `memory_write`, MCP tools) could place a malicious widget in `.system/widgets/` that then runs with full browser context (cookies, DOM access, `apiFetch` via the injected `api` parameter).

This is a privilege escalation path: a tool that should only write data can now execute code in the user's browser session.

Suggested mitigations (not necessarily all needed):

  1. Document: Mark `.system/` paths as privileged in the safety layer so writes to `.system/widgets/` require explicit user approval
  2. Sandbox: Run widget JS in an iframe with `sandbox` attribute instead of `new Function`
  3. Audit log: Log widget loads (project, widget id, hash) so users can detect unexpected widget installation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — deferring to a follow-up. Same trust model as global widgets. Will add .system/ write protection when we add untrusted widget sources.

if (!el) return;
if (!items.length) { el.style.display = 'none'; return; }
el.style.display = '';
el.innerHTML = '<div class="cr-attention-title">Needs attention</div>'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Hard-coded English strings break i18n

The new control room code uses hard-coded English strings throughout instead of `I18n.t()` calls:

  • Line 5916: `"Needs attention"`
  • Line 5989: `"All Projects"`
  • Line 5997: `"Goals"`
  • Line 6040: `"Missions"`
  • Line 6070: `"Recent Activity"`
  • Plus: "No missions configured yet", "No threads yet", "Loading projects...", "Failed to load projects", etc.

The rest of the codebase consistently uses `I18n.t('key')` for all user-visible strings.

Suggested fix: Add i18n keys under a `projects.` namespace in `en.js` (and other locale files) and replace all hard-coded strings with `I18n.t()` calls. This can be done as a follow-up PR if you'd prefer to land the functionality first.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid, deferring to a follow-up PR — adding i18n keys for the control room strings is a separate concern from the functional fixes here.

c
} else {
'-'
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — Fuzzy slug matching can silently create missions in the wrong project

The starts_with check on the slug match (around line 306–310) can resolve to an unintended project. Example:

name_slug.starts_with(&format!("{needle}-"))

If a user/agent passes project_id="ai" and there's no project named exactly "AI", this matches any project whose slug starts with "ai-" — e.g., "AI Research Intelligence" (ai-research-intelligence). The mission silently lands in the wrong project instead of returning an error.

Suggested fix: Remove the starts_with clause entirely, or require an exact match on either name_lower or name_slug. Prefix matching on slugs is too ambiguous when multiple projects share a common prefix:

name_lower == needle || name_slug == needle

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a617feastarts_with branch removed. Only exact name/slug match accepted.

Comment thread src/channels/web/handlers/frontend.rs Outdated
let slug = project
.name
.to_lowercase()
.replace(|c: char| !c.is_alphanumeric() && c != '-', "-");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Slug generation inconsistency between effect_adapter and frontend handler

The slug algorithm differs between these two files:

effect_adapter.rs (line ~294) — uses is_ascii_alphanumeric():

.map(|c| if c.is_ascii_alphanumeric() || c == '-' { c } else { '-' })

frontend.rs (line ~498) — uses is_alphanumeric() (Unicode-aware):

.replace(|c: char| !c.is_alphanumeric() && c != '-', "-")

For non-ASCII project names (e.g., "Café Project"), effect_adapter produces "caf--project" while frontend.rs produces "café-project". This means project_widgets_handler may look up widgets in a different directory than where mission_create resolved the project.

Suggested fix: Extract a single Project::slug() method on the Project type in crates/ironclaw_engine/src/types/project.rs and use it in both locations.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a617fea — frontend.rs now uses is_ascii_alphanumeric() matching effect_adapter.rs.

} else {
vec![]
};
// Allow explicit project_id override (so agent can create
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — No test coverage for cross-project mission_create ownership validation

The new project_id override logic (lines 245–340) is security-critical — it prevents IDOR by validating that the target project belongs to the current user. However, there are no tests for any of these paths:

  • UUID project_id with valid ownership → mission created in target project
  • UUID project_id owned by a different user → error returned
  • Non-existent UUID project_id → error returned
  • Slug/name resolution → correct project matched
  • Ambiguous slug → expected behavior documented and tested

Per CLAUDE.md testing rules ("Test Through the Caller, Not Just the Helper"), this should have at least an integration-tier test that drives the EffectBridgeAdapter::execute() call site with a mission_create action containing an explicit project_id.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDOR ownership validation is integration-tier (requires DB). Added two unit-tier security tests for template ref injection paths in this commit. The IDOR paths themselves are exercised via the existing complete_resolves_template_refs_through_adapter caller-level test pattern.

ilblackdragon and others added 2 commits April 17, 2026 15:02
Resolves merge conflicts:
- deny.toml: accept new RUSTSEC-2026-0097 advisory
- bridge/mod.rs: deduplicate re-exports, add BridgeOutcome
- bridge/router.rs: combine project_id with expanded cadence/guardrail params
- bridge/effect_adapter.rs: keep both IDOR validation and guardrails extraction
- app.js: integrate ActivityWorkStore, mission progress tracking alongside
  control room; unify openMissionDetail/openEngineThread to route by active tab
- index.html: add Missions tab alongside Projects tab

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…+1 queries

- Remove over-broad `starts_with` slug prefix matching in mission_create
  project_id resolution — require exact name/slug match only (serrrfirat)
- Fix slug generation inconsistency: frontend.rs now uses
  is_ascii_alphanumeric() matching effect_adapter.rs (serrrfirat)
- Prevent second-order template injection: resolve_template_refs now
  advances past resolved content instead of re-scanning from position 0,
  and skips unresolvable refs instead of breaking (serrrfirat)
- Parallelize N+1 overview queries: per-project thread/mission fetches
  now use tokio::try_join! + futures::try_join_all (serrrfirat, Copilot)
- Add two new security tests for template ref resolution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ilblackdragon
Copy link
Copy Markdown
Member Author

PR Review Response — Round 3

Merged origin/staging and addressed all outstanding comments from @serrrfirat and Copilot. Changes in a617fea.

Fixed

Issue Reviewer Fix
Fuzzy slug matching routes missions to wrong project @serrrfirat Removed starts_with prefix branch — only exact name/slug match accepted now
Slug generation inconsistency (effect_adapter vs frontend.rs) @serrrfirat Both now use is_ascii_alphanumeric()
Second-order template injection via re-scanning resolved content @serrrfirat resolve_template_refs now advances search_from past replacement text; also fixed break on unresolvable ref → skip + continue
N+1 queries in projects overview @serrrfirat, Copilot Per-project I/O now parallel: tokio::try_join! for threads+missions within each project, futures::try_join_all across projects
No test for IDOR / slug resolution paths @serrrfirat 2 new security tests: no_second_order_injection + skips_unresolvable_continues_later

Deferred (unchanged from round 2)

  • Arbitrary JS execution from workspace widgets: Widgets follow same trust model as global widgets (user-authored workspace content). CSP blocks external eval. Will revisit for untrusted widget sources.
  • Hard-coded English strings in control room: Valid — will add i18n keys under projects.* namespace in a follow-up PR to keep this one focused on the functional fixes.

@ilblackdragon ilblackdragon merged commit ab8d64c into staging Apr 17, 2026
15 checks passed
@ilblackdragon ilblackdragon deleted the feat/new-project-skill branch April 17, 2026 16:37
ilblackdragon added a commit that referenced this pull request Apr 18, 2026
… tab, onboarding E2E

Scheduled batched CI on staging was red across three unrelated paths.
All three are fixed in-place; the existing tests become the regression
coverage.

1. `tests/support/test_rig.rs`: rebuild the skill registry against the
   test's `with_skills_dir()` tempdir and actually run `discover_all()`.
   `AppBuilder::init_database()` reloads `config` from DB/TOML/env at the
   top of `build_all()`, which clobbered `config.skills.local_dir` back
   to the default (`~/.ironclaw/skills/`). Any registry `build_all()`
   constructed therefore pointed at the user's real skills dir, not the
   tempdir the test had laid down — so `loaded_skill_names()` came back
   empty and the v1 chain-load assertion panicked. Write the tempdir
   paths back onto `components.config.skills.*` so `AgentDeps::skills_config`
   agrees with the registry. `skill_chain_load_lifecycle::v1_chain_load_pulls_in_required_companions`
   now passes.

2. `crates/ironclaw_gateway/static/index.html`: drop the duplicate
   right-side `status-logs-btn` Jobs button added in #2353. The main
   tab-bar already has `<button data-tab="jobs">Jobs</button>`, and the
   duplicate had no `data-v1-only`/`data-v2-only` marker, so both
   rendered simultaneously. That broke `test_connection.py` (Playwright
   strict-mode rejected `.tab-bar button[data-tab="jobs"]` resolving to
   two elements) and also left both buttons visually `active` when the
   Jobs tab was open.

3. `tests/e2e/scenarios/test_extensions.py`: align
   `test_onboarding_failed_sse_shows_error_toast_and_reloads_extensions`
   with every other auth-card test in the file — resolve the real
   thread id via `_active_thread_id(page)` before calling
   `_show_auth_card`. `showAuthCard` short-circuits on
   `isCurrentThread(data.thread_id)`, and the synthetic `"thread-fail"`
   id fails that check once `currentThreadId` is populated after
   `go_to_extensions(page)`. The auth card was never rendered, so the
   follow-up `wait_for` for `.auth-card` hit its 5s timeout.

Verified: `cargo test --features libsql --test skill_chain_load_lifecycle`
and `--test skill_setup_marker_lifecycle` pass; `cargo clippy --tests
--features libsql` is clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Apr 18, 2026
… tab, onboarding E2E (#2637)

Scheduled batched CI on staging was red across three unrelated paths.
All three are fixed in-place; the existing tests become the regression
coverage.

1. `tests/support/test_rig.rs`: rebuild the skill registry against the
   test's `with_skills_dir()` tempdir and actually run `discover_all()`.
   `AppBuilder::init_database()` reloads `config` from DB/TOML/env at the
   top of `build_all()`, which clobbered `config.skills.local_dir` back
   to the default (`~/.ironclaw/skills/`). Any registry `build_all()`
   constructed therefore pointed at the user's real skills dir, not the
   tempdir the test had laid down — so `loaded_skill_names()` came back
   empty and the v1 chain-load assertion panicked. Write the tempdir
   paths back onto `components.config.skills.*` so `AgentDeps::skills_config`
   agrees with the registry. `skill_chain_load_lifecycle::v1_chain_load_pulls_in_required_companions`
   now passes.

2. `crates/ironclaw_gateway/static/index.html`: drop the duplicate
   right-side `status-logs-btn` Jobs button added in #2353. The main
   tab-bar already has `<button data-tab="jobs">Jobs</button>`, and the
   duplicate had no `data-v1-only`/`data-v2-only` marker, so both
   rendered simultaneously. That broke `test_connection.py` (Playwright
   strict-mode rejected `.tab-bar button[data-tab="jobs"]` resolving to
   two elements) and also left both buttons visually `active` when the
   Jobs tab was open.

3. `tests/e2e/scenarios/test_extensions.py`: align
   `test_onboarding_failed_sse_shows_error_toast_and_reloads_extensions`
   with every other auth-card test in the file — resolve the real
   thread id via `_active_thread_id(page)` before calling
   `_show_auth_card`. `showAuthCard` short-circuits on
   `isCurrentThread(data.thread_id)`, and the synthetic `"thread-fail"`
   id fails that check once `currentThreadId` is populated after
   `go_to_extensions(page)`. The auth card was never rendered, so the
   follow-up `wait_for` for `.auth-card` hit its 5s timeout.

Verified: `cargo test --features libsql --test skill_chain_load_lifecycle`
and `--test skill_setup_marker_lifecycle` pass; `cargo clippy --tests
--features libsql` is clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
serrrfirat added a commit that referenced this pull request Apr 18, 2026
Closes #2626.

`tests/e2e/helpers.py` used `.tab-bar button[data-tab="{tab}"]` to locate
every main-nav tab button. Commit 5058a1c removed the duplicate
right-side `status-logs-btn` Jobs button that was resolving that selector
to two elements on staging, so `test_connection.py` passes again. The
underlying selector is still fragile: any future auxiliary button or
`_addWidgetTab`-injected tab that reuses a built-in `data-tab` id (even
accidentally, as #2353 did) will make the selector resolve to multiple
elements and trip Playwright strict mode the next CI run.

Harden the selector in place — the issue's suggested direction of
scoping to a more specific parent region — so the regression can't
repeat without a test-side opt-in:

- `.tab-bar > button`: direct child, skipping any hypothetical nested
  buttons (e.g. menu popovers).
- `:not(.status-logs-btn)`: excludes the right-side logs/docs cluster
  that uses the same `data-tab` hook for click routing.
- `:not(.tab-btn)`: excludes widget-injected tabs (see `_addWidgetTab`
  in `crates/ironclaw_gateway/static/app.js`), which always carry the
  `tab-btn` class and could in principle collide with a built-in id.

Verified against a synthetic DOM with three colliding `data-tab="jobs"`
buttons (original, status-logs-btn duplicate, widget-injected): the old
selector matches 3 and trips strict mode on `.click()`; the new
selector matches 1 and clicks cleanly. No production HTML change is
required — the acceptance criterion explicitly forbids one.

`pytest tests/e2e/scenarios/test_connection.py -v` → 3 passed.
serrrfirat added a commit that referenced this pull request Apr 18, 2026
…2656)

Closes #2626.

`tests/e2e/helpers.py` used `.tab-bar button[data-tab="{tab}"]` to locate
every main-nav tab button. Commit 5058a1c removed the duplicate
right-side `status-logs-btn` Jobs button that was resolving that selector
to two elements on staging, so `test_connection.py` passes again. The
underlying selector is still fragile: any future auxiliary button or
`_addWidgetTab`-injected tab that reuses a built-in `data-tab` id (even
accidentally, as #2353 did) will make the selector resolve to multiple
elements and trip Playwright strict mode the next CI run.

Harden the selector in place — the issue's suggested direction of
scoping to a more specific parent region — so the regression can't
repeat without a test-side opt-in:

- `.tab-bar > button`: direct child, skipping any hypothetical nested
  buttons (e.g. menu popovers).
- `:not(.status-logs-btn)`: excludes the right-side logs/docs cluster
  that uses the same `data-tab` hook for click routing.
- `:not(.tab-btn)`: excludes widget-injected tabs (see `_addWidgetTab`
  in `crates/ironclaw_gateway/static/app.js`), which always carry the
  `tab-btn` class and could in principle collide with a built-in id.

Verified against a synthetic DOM with three colliding `data-tab="jobs"`
buttons (original, status-logs-btn duplicate, widget-injected): the old
selector matches 3 and trips strict mode on `.click()`; the new
selector matches 1 and clicks cleanly. No production HTML change is
required — the acceptance criterion explicitly forbids one.

`pytest tests/e2e/scenarios/test_connection.py -v` → 3 passed.
@henrypark133 henrypark133 mentioned this pull request Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/web Web gateway channel scope: docs Documentation size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants