feat: new-project skill and template ref resolution for parallel tool calls#2353
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a 'Projects Control Room' (Engine V2) to facilitate autonomous project management, incorporating goals, metrics, and project-scoped missions. Key updates include new backend data structures, API endpoints for project overviews and dynamic widget discovery, and an extensive frontend dashboard. Feedback points out a mismatch between the project creation skill and the backend's UUID requirement, a logic flaw in template resolution that breaks early on unresolvable patterns, and missing fields in the project overview DTO required by the frontend. Additionally, improvements were suggested for metric update logic and CSS scoping for project-specific widgets to prevent UI leakage.
| Create recurring missions scoped to the project. Use the **slug** as `project_id`: | ||
|
|
||
| ``` | ||
| mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{slug}") | ||
| ``` |
There was a problem hiding this comment.
There is a conflict between the skill instructions and the mission_create tool implementation. The skill instructs the agent to use the project slug as the project_id, but the mission_create tool (in effect_adapter.rs) expects a valid UUID. If the agent passes a slug, uuid::Uuid::parse_str will fail, and the mission will incorrectly default to the current thread's project instead of the intended one.
| Create recurring missions scoped to the project. Use the **slug** as `project_id`: | |
| ``` | |
| mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{slug}") | |
| ``` | |
| Create recurring missions scoped to the project. Use the project_id (UUID) returned by the project_create tool: | |
| mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{project_id}") |
There was a problem hiding this comment.
Fixed in b7fe8cc — mission_create now accepts project names/slugs and resolves them by matching against the user's projects. SKILL.md updated to use project name instead of slug.
| for _ in 0..50 { | ||
| let Some(start) = result.find("{{") else { | ||
| break; | ||
| }; | ||
| let Some(end) = result[start..].find("}}") else { | ||
| break; | ||
| }; | ||
| let end = start + end; | ||
| let ref_str = &result[start + 2..end]; // e.g. "chatcmpl-tool-9816a462feb22da1.project_id" | ||
|
|
||
| let resolved = if let Some(dot_pos) = ref_str.rfind('.') { | ||
| let call_id = &ref_str[..dot_pos]; | ||
| let field = &ref_str[dot_pos + 1..]; | ||
| tool_results | ||
| .iter() | ||
| .find(|(id, _)| id == call_id) | ||
| .and_then(|(_, json)| json.get(field)) | ||
| .map(|v| match v { | ||
| serde_json::Value::String(s) => s.clone(), | ||
| other => other.to_string(), | ||
| }) | ||
| } else { | ||
| None | ||
| }; | ||
|
|
||
| match resolved { | ||
| Some(val) => { | ||
| result.replace_range(start..end + 2, &val); | ||
| } | ||
| None => { | ||
| // Can't resolve — stop to avoid infinite loop on the same pattern | ||
| break; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The resolve_template_refs function stops processing all templates in a string if it encounters a single unresolvable pattern (e.g., a non-existent call_id or a string that happens to contain {{ but isn't a template). This is because the loop breaks when resolved is None. It should instead skip the unresolvable pattern and continue searching for subsequent templates.
| if (proj && (proj.goals && proj.goals.length || proj.metrics && proj.metrics.length)) { | ||
| var gmHtml = ''; | ||
| if (proj.goals && proj.goals.length) { | ||
| gmHtml += '<div class="cr-goals"><div class="cr-section-title">Goals</div>'; | ||
| proj.goals.forEach(function(g) { | ||
| gmHtml += '<div class="cr-goal-item">' + escapeHtml(g) + '</div>'; | ||
| }); | ||
| gmHtml += '</div>'; | ||
| } | ||
| // Metrics would come from project detail; overview doesn't include them yet. | ||
| document.getElementById('cr-drill-header').innerHTML += gmHtml; | ||
| } |
There was a problem hiding this comment.
The drillIntoProject function attempts to display project goals and metrics using the proj object from the cached overview. However, the ProjectOverviewEntry DTO returned by the /api/engine/projects/overview endpoint does not include goals or metrics fields. Consequently, these sections will never be rendered in the drill-in view. You should either include these fields in the overview response or fetch the full project detail in this function.
| if let Some(metrics) = params.get("metrics").and_then(|v| v.as_array()) { | ||
| project.metrics = metrics | ||
| .iter() | ||
| .filter_map(|v| serde_json::from_value(v.clone()).ok()) | ||
| .collect(); | ||
| } |
There was a problem hiding this comment.
When updating project metrics, ensure the logic sums the values with existing ones rather than overwriting them. Additionally, if a timestamp is missing, do not fall back to the current time (Utc::now()) to avoid issues with resource cleanup logic; instead, log a warning and skip the update for that metric. Please add tests to cover multi-event scenarios.
if let Some(new_metrics) = params.get("metrics").and_then(|v| v.as_array()) {
for v in new_metrics {
if let Ok(new_m) = serde_json::from_value::<ironclaw_engine::ProjectMetric>(v.clone()) {
if let Some(existing) = project.metrics.iter_mut().find(|m| m.name == new_m.name) {
existing.value += new_m.value;
} else if new_m.updated_at.is_some() {
project.metrics.push(new_m);
} else {
tracing::warn!("Skipping metric update due to missing timestamp: {}", new_m.name);
}
}
}
}References
- When accumulating metrics from multiple events, ensure the logic correctly sums the values rather than overwriting them. Add tests to cover multi-event scenarios.
- When calculating the age of a resource for cleanup, if the creation timestamp is unknown or cannot be parsed, treat it as un-reapable by logging a warning and skipping it. Do not fall back to the current time.
| if (w.css) { | ||
| var style = document.createElement('style'); | ||
| style.setAttribute('data-widget', manifest.id); | ||
| style.textContent = w.css; // Already scoped server-side? No — project widgets come raw. | ||
| // Scope it client-side using the same prefix convention. | ||
| var prefix = '[data-widget="' + manifest.id + '"]'; | ||
| // Simple client-side scoping: prepend prefix to each rule. | ||
| // For production, use the server's scope_css. For now, inject as-is | ||
| // with the data-widget attribute providing natural scoping. | ||
| document.head.appendChild(style); | ||
| } |
There was a problem hiding this comment.
Project widgets are injected with their CSS added directly to the document head without scoping. If a widget contains global CSS rules (e.g., targeting body, h2, or common class names like .btn), it will leak and affect the entire application UI. Consider implementing the client-side scoping mentioned in the comments or ensuring the server-side scope_css logic is applied to project-specific widgets.
There was a problem hiding this comment.
Pull request overview
Adds engine-v2 “Projects control room” UX and supporting engine APIs, plus LLM-side tool-call argument template-reference resolution, and introduces a new /new-project skill.
Changes:
- Added project overview + project widgets endpoints and updated gateway status to expose
engine_v2_enabled. - Implemented project CRUD actions in the engine bridge and added project goals/metrics types.
- Added a template-ref resolution pass in
LlmBridgeAdapterfor{{call_id.field}}patterns in tool-call JSON.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/channels/web/server.rs | Adds new engine routes (/projects/overview, /projects/{id}/widgets) and exposes engine_v2_enabled in gateway status. |
| src/channels/web/handlers/frontend.rs | Implements project_widgets_handler to discover/load project-scoped widgets from workspace. |
| src/channels/web/handlers/engine.rs | Adds optional project_id query filter for missions/threads and a new overview handler. |
| src/bridge/router.rs | Registers project capability/actions and implements projects overview aggregation + project DTO expansions (goals/metrics). |
| src/bridge/mod.rs | Re-exports newly added overview DTOs and query function. |
| src/bridge/llm_adapter.rs | Resolves {{call_id.field}} references in tool-call parameters and adds unit tests. |
| src/bridge/effect_adapter.rs | Adds handlers for project_create/project_update/project_list and supports project_id override on mission creation. |
| skills/new-project/SKILL.md | Introduces the new-project skill definition and procedure. |
| crates/ironclaw_gateway/static/style.css | Adds styling for the Projects control room UI. |
| crates/ironclaw_gateway/static/index.html | Replaces legacy missions tab with Projects tab (v2) and updates tab labels/buttons. |
| crates/ironclaw_gateway/static/i18n/en.js | Renames “Memory” tab to “Workspace” and adds “Projects” label. |
| crates/ironclaw_gateway/static/app.js | Implements Projects control room behavior, overview/drill-in/detail rendering, and widget loading. |
| crates/ironclaw_gateway/src/widget.rs | Adds new widget slots (ProjectHeader, ProjectSection) and updates slot serialization tests. |
| crates/ironclaw_gateway/src/bundle.rs | Makes ResolvedWidget serializable for JSON responses. |
| crates/ironclaw_engine/src/types/project.rs | Adds ProjectMetric + goals/metrics fields on Project. |
| crates/ironclaw_engine/src/runtime/mission.rs | Exposes MissionManager::store() for project CRUD via the mission manager. |
| crates/ironclaw_engine/src/lib.rs | Re-exports ProjectMetric. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Create recurring missions scoped to the project. Use the **slug** as `project_id`: | ||
|
|
||
| ``` | ||
| mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{slug}") | ||
| ``` |
There was a problem hiding this comment.
The skill instructs using the workspace slug as project_id for mission_create, but the engine expects project_id to be a UUID (see mission/project IDs parsed with Uuid::parse_str). As written, mission creation will fail to scope to the new project and will fall back to the current thread’s project (often Default). Update the skill to create the project first (e.g. via project_create) and then pass the returned UUID project_id, while still using the slug only for workspace paths.
| // Eval the JS module to register the widget. | ||
| try { | ||
| var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null; | ||
| var fn = new Function('container', 'api', 'projectId', w.js); | ||
| fn(container, api, projectId); | ||
|
|
There was a problem hiding this comment.
new Function(...) is effectively eval and will be blocked by the gateway’s CSP (script-src does not include 'unsafe-eval'). This means project widgets will fail to mount in production. To keep CSP strict, load widget JS by injecting a <script type="module"> element carrying the page’s existing CSP nonce (or otherwise execute through a nonce-authorized script path) instead of using new Function.
| // Eval the JS module to register the widget. | |
| try { | |
| var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null; | |
| var fn = new Function('container', 'api', 'projectId', w.js); | |
| fn(container, api, projectId); | |
| // Execute the widget through a nonce-authorized module script so it | |
| // remains compatible with a strict CSP that disallows unsafe-eval. | |
| try { | |
| var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null; | |
| var nonceSource = document.querySelector('script[nonce]'); | |
| var nonce = nonceSource ? nonceSource.nonce || nonceSource.getAttribute('nonce') : ''; | |
| var widgetToken = 'cr-widget-' + manifest.id + '-' + Date.now() + '-' + Math.random().toString(36).slice(2); | |
| var widgetRegistry = window.__ironclawProjectWidgetMounts || (window.__ironclawProjectWidgetMounts = {}); | |
| var script = document.createElement('script'); | |
| widgetRegistry[widgetToken] = { | |
| container: container, | |
| api: api, | |
| projectId: projectId | |
| }; | |
| script.type = 'module'; | |
| if (nonce) script.setAttribute('nonce', nonce); | |
| script.setAttribute('data-widget', manifest.id); | |
| script.addEventListener('error', function(err) { | |
| delete widgetRegistry[widgetToken]; | |
| script.remove(); | |
| console.error('[projects] Failed to mount widget ' + manifest.id + ':', err); | |
| container.innerHTML = '<div class="cr-empty">Widget error: ' + manifest.id + '</div>'; | |
| }); | |
| script.textContent = | |
| '(function(){\n' + | |
| ' var registry = window.__ironclawProjectWidgetMounts || {};\n' + | |
| ' var ctx = registry[' + JSON.stringify(widgetToken) + '];\n' + | |
| ' if (!ctx) return;\n' + | |
| ' try {\n' + | |
| ' (function(container, api, projectId) {\n' + | |
| w.js + '\n' + | |
| ' })(ctx.container, ctx.api, ctx.projectId);\n' + | |
| ' } finally {\n' + | |
| ' delete registry[' + JSON.stringify(widgetToken) + '];\n' + | |
| ' document.currentScript.remove();\n' + | |
| ' }\n' + | |
| '})();'; | |
| document.head.appendChild(script); |
| pub async fn project_widgets_handler( | ||
| State(state): State<Arc<GatewayState>>, | ||
| AuthenticatedUser(user): AuthenticatedUser, | ||
| Path(project_id): Path<String>, | ||
| ) -> Result<Json<Vec<ResolvedWidget>>, (StatusCode, String)> { | ||
| // Resolve project name to derive the workspace slug. | ||
| let project = crate::bridge::get_engine_project(&project_id, &user.user_id) | ||
| .await | ||
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))? | ||
| .ok_or((StatusCode::NOT_FOUND, "Project not found".to_string()))?; | ||
|
|
||
| let slug = project | ||
| .name | ||
| .to_lowercase() | ||
| .replace(|c: char| !c.is_alphanumeric() && c != '-', "-"); | ||
| let widgets_dir = format!("projects/{slug}/.system/widgets/"); |
There was a problem hiding this comment.
Project widget lookup derives the workspace slug from project.name at request time. This makes widget discovery fragile: renaming a project (or creating workspace folders with a different slug) will break widget loading because the directory path changes. Consider persisting a stable workspace slug/path in the project record (metadata) and using that here, rather than recomputing from the display name.
| pub async fn project_widgets_handler( | |
| State(state): State<Arc<GatewayState>>, | |
| AuthenticatedUser(user): AuthenticatedUser, | |
| Path(project_id): Path<String>, | |
| ) -> Result<Json<Vec<ResolvedWidget>>, (StatusCode, String)> { | |
| // Resolve project name to derive the workspace slug. | |
| let project = crate::bridge::get_engine_project(&project_id, &user.user_id) | |
| .await | |
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))? | |
| .ok_or((StatusCode::NOT_FOUND, "Project not found".to_string()))?; | |
| let slug = project | |
| .name | |
| .to_lowercase() | |
| .replace(|c: char| !c.is_alphanumeric() && c != '-', "-"); | |
| let widgets_dir = format!("projects/{slug}/.system/widgets/"); | |
| fn project_name_slug(name: &str) -> String { | |
| name.to_lowercase() | |
| .replace(|c: char| !c.is_alphanumeric() && c != '-', "-") | |
| } | |
| fn project_widgets_dir_from_metadata(project: &impl serde::Serialize, fallback_name: &str) -> String { | |
| let fallback = format!("projects/{}/.system/widgets/", project_name_slug(fallback_name)); | |
| let Ok(value) = serde_json::to_value(project) else { | |
| return fallback; | |
| }; | |
| let Some(metadata) = value.get("metadata") else { | |
| return fallback; | |
| }; | |
| let candidate = metadata | |
| .get("workspace_slug") | |
| .and_then(|v| v.as_str()) | |
| .or_else(|| metadata.get("workspaceSlug").and_then(|v| v.as_str())) | |
| .or_else(|| metadata.get("workspace_path").and_then(|v| v.as_str())) | |
| .or_else(|| metadata.get("workspacePath").and_then(|v| v.as_str())); | |
| let Some(raw) = candidate.map(str::trim).filter(|s| !s.is_empty()) else { | |
| return fallback; | |
| }; | |
| let normalized = raw.trim_matches('/'); | |
| if normalized.is_empty() { | |
| return fallback; | |
| } | |
| let project_root = if normalized.starts_with("projects/") { | |
| normalized.to_string() | |
| } else { | |
| format!("projects/{normalized}") | |
| }; | |
| format!("{project_root}/.system/widgets/") | |
| } | |
| pub async fn project_widgets_handler( | |
| State(state): State<Arc<GatewayState>>, | |
| AuthenticatedUser(user): AuthenticatedUser, | |
| Path(project_id): Path<String>, | |
| ) -> Result<Json<Vec<ResolvedWidget>>, (StatusCode, String)> { | |
| // Prefer a persisted workspace slug/path from project metadata so widget | |
| // lookup remains stable across project renames. Fall back to the historic | |
| // name-derived slug for backward compatibility with older project records. | |
| let project = crate::bridge::get_engine_project(&project_id, &user.user_id) | |
| .await | |
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))? | |
| .ok_or((StatusCode::NOT_FOUND, "Project not found".to_string()))?; | |
| let widgets_dir = project_widgets_dir_from_metadata(&project, &project.name); |
| let workspace = resolve_workspace(&state, &user).await?; | ||
| let layout = read_layout_config(&workspace).await; | ||
|
|
||
| let entries = workspace.list(&widgets_dir).await.unwrap_or_default(); |
There was a problem hiding this comment.
workspace.list(&widgets_dir).await.unwrap_or_default() silently treats backend listing failures the same as “no widgets”. In this file, other widget discovery paths log a warn! on real workspace errors to avoid hiding operational issues. Consider matching that behavior here (log + empty fallback, or propagate a 500) so widget outages are diagnosable.
| let entries = workspace.list(&widgets_dir).await.unwrap_or_default(); | |
| let entries = match workspace.list(&widgets_dir).await { | |
| Ok(entries) => entries, | |
| Err(error) => { | |
| tracing::warn!( | |
| "failed to list widgets in workspace at {}: {}", | |
| widgets_dir, | |
| error | |
| ); | |
| Vec::new() | |
| } | |
| }; |
There was a problem hiding this comment.
Fixed in b7fe8cc — now logs at debug! level before falling back to empty list.
| let css = workspace | ||
| .read(&format!("{widgets_dir}{name}/style.css")) | ||
| .await | ||
| .ok() | ||
| .map(|d| d.content) | ||
| .filter(|c| !c.trim().is_empty() && c.len() <= MAX_WIDGET_CSS_BYTES); | ||
|
|
There was a problem hiding this comment.
Project widget CSS is returned and later injected without running through the existing scope_css isolation used for global widgets. This allows a project widget’s style.css to affect the entire gateway UI (style leakage / UI spoofing). Consider scoping the CSS server-side with ironclaw_gateway::scope_css (and applying the same tag-close neutralization used in bundle assembly) before returning it to the client.
| let threads = store.list_threads(pid, user_id).await.unwrap_or_default(); | ||
| let missions = store | ||
| .list_missions_with_shared(pid, user_id) | ||
| .await | ||
| .unwrap_or_default(); |
There was a problem hiding this comment.
get_engine_projects_overview swallows store errors via unwrap_or_default() when listing threads/missions. If the store is unavailable or returns an error, the overview will silently report “no activity/missions” and can incorrectly show green health. Prefer propagating the error (or at least logging and marking the project unhealthy) so the UI doesn’t mask backend failures.
| let threads = store.list_threads(pid, user_id).await.unwrap_or_default(); | |
| let missions = store | |
| .list_missions_with_shared(pid, user_id) | |
| .await | |
| .unwrap_or_default(); | |
| let threads = store | |
| .list_threads(pid, user_id) | |
| .await | |
| .map_err(|e| engine_err("list project threads", e))?; | |
| let missions = store | |
| .list_missions_with_shared(pid, user_id) | |
| .await | |
| .map_err(|e| engine_err("list project missions", e))?; |
| for project in &projects { | ||
| let pid = project.id; | ||
| let threads = store.list_threads(pid, user_id).await.unwrap_or_default(); | ||
| let missions = store | ||
| .list_missions_with_shared(pid, user_id) | ||
| .await | ||
| .unwrap_or_default(); |
There was a problem hiding this comment.
This overview implementation performs per-project I/O (list_threads + list_missions_with_shared) inside a loop, which becomes an N+1 pattern as project count grows. Consider batching at the store layer (e.g., list threads/missions for all projects in one query) or at least fetching threads/missions concurrently per project to reduce tail latency for the dashboard.
| // Count pending gates for threads in this project. | ||
| let project_thread_ids: std::collections::HashSet<_> = | ||
| threads.iter().map(|t| t.id).collect(); | ||
| let project_gates: Vec<_> = user_gates | ||
| .iter() | ||
| .filter(|g| project_thread_ids.contains(&g.thread_id)) | ||
| .collect(); | ||
| let pending_gate_count = project_gates.len() as u64; |
There was a problem hiding this comment.
Pending-gate counting does a full scan of user_gates for every project (user_gates.iter().filter(...)), which is O(projects × gates) and also allocates a HashSet per project. Consider pre-indexing user_gates by thread_id once (e.g., HashMap<ThreadId, Vec>) and counting/collecting by looking up only the project’s thread IDs.
| /// Returns the resolved string if all references could be substituted, | ||
| /// or the original string if none were found. |
There was a problem hiding this comment.
The doc comment says this returns the resolved string only if all references could be substituted, but the implementation can partially substitute earlier templates and then break on the first unresolvable one (returning a partially-resolved string). Either adjust the behavior to be all-or-nothing, or update the comment to reflect the current partial-substitution semantics.
| /// Returns the resolved string if all references could be substituted, | |
| /// or the original string if none were found. | |
| /// Returns the original string unchanged if no template markers are found. | |
| /// Otherwise, resolves references iteratively until there are no more | |
| /// `{{...}}` patterns or until an unresolvable reference is encountered. | |
| /// If resolution stops on an unresolvable reference, any earlier successful | |
| /// substitutions are preserved, so the returned string may be partially resolved. |
| Ok(u) => ironclaw_engine::ProjectId(u), | ||
| Err(e) => { | ||
| return Some(Err(EngineError::Effect { | ||
| reason: format!("Invalid project_id: {e}"), |
There was a problem hiding this comment.
The error message uses Invalid project_id but the parameter name for this action is id (and the schema also calls it id). Using the same term in code and schema would make debugging easier (e.g., Invalid project id or Invalid id).
| reason: format!("Invalid project_id: {e}"), | |
| reason: format!("Invalid id: {e}"), |
Project Detail PageScreenshot from the Playwright E2E test (
To view the screenshot locally: cd tests/e2e && .venv/bin/pytest scenarios/test_project_detail.py -v
# Screenshot saved to tests/e2e/project-detail.png |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Inject scoped CSS if present. | ||
| if (w.css) { | ||
| var style = document.createElement('style'); | ||
| style.setAttribute('data-widget', manifest.id); | ||
| style.textContent = w.css; // Already scoped server-side? No — project widgets come raw. | ||
| // Scope it client-side using the same prefix convention. | ||
| var prefix = '[data-widget="' + manifest.id + '"]'; | ||
| // Simple client-side scoping: prepend prefix to each rule. | ||
| // For production, use the server's scope_css. For now, inject as-is | ||
| // with the data-widget attribute providing natural scoping. | ||
| document.head.appendChild(style); | ||
| } |
There was a problem hiding this comment.
Project widget CSS is injected without scoping/isolation (unlike global widgets, which are passed through scope_css). Relying on a data-widget attribute alone doesn’t prevent selectors like body { ... } from leaking across the app. Consider applying scope_css server-side in project_widgets_handler (preferred, consistent with bundle assembly) or implement real client-side selector prefixing before injecting.
| // Resolve project name to derive the workspace slug. | ||
| let project = crate::bridge::get_engine_project(&project_id, &user.user_id) | ||
| .await | ||
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))? | ||
| .ok_or((StatusCode::NOT_FOUND, "Project not found".to_string()))?; | ||
|
|
||
| let slug = project | ||
| .name | ||
| .to_lowercase() | ||
| .replace(|c: char| !c.is_alphanumeric() && c != '-', "-"); | ||
| let widgets_dir = format!("projects/{slug}/.system/widgets/"); | ||
|
|
There was a problem hiding this comment.
Deriving the workspace slug from the current project name makes the widgets path unstable (rename breaks widget discovery) and can also cause collisions between projects with the same/similar name. Suggestion: persist a stable slug in the project record/metadata at creation time (or derive from project_id), and use that for projects/{slug}/... lookups instead of re-slugifying project.name on every request.
| let css = workspace | ||
| .read(&format!("{widgets_dir}{name}/style.css")) | ||
| .await | ||
| .ok() | ||
| .map(|d| d.content) | ||
| .filter(|c| !c.trim().is_empty() && c.len() <= MAX_WIDGET_CSS_BYTES); | ||
|
|
||
| let enabled = layout | ||
| .widgets | ||
| .get(&manifest.id) | ||
| .map(|w| w.enabled) | ||
| .unwrap_or(true); | ||
| if !enabled { | ||
| continue; | ||
| } | ||
|
|
||
| widgets.push(ResolvedWidget { | ||
| manifest, | ||
| js: js_doc.content, | ||
| css, | ||
| }); |
There was a problem hiding this comment.
This handler returns raw widget CSS/JS, but unlike the global widget pipeline it doesn’t apply scope_css (style isolation) or use the existing <script nonce=…> module injection approach. If project widgets are meant to behave like normal widgets, consider scoping CSS server-side with ironclaw_gateway::scope_css and serving JS as a same-origin module script (or otherwise ensuring it can run under the current CSP).
| /// Per-project summary with computed health and stats. | ||
| #[derive(Debug, Clone, serde::Serialize)] | ||
| pub struct ProjectOverviewEntry { | ||
| pub id: String, | ||
| pub name: String, | ||
| pub description: String, | ||
| /// `"green"`, `"yellow"`, or `"red"`. | ||
| pub health: String, | ||
| pub active_missions: u64, | ||
| pub total_missions: u64, | ||
| pub threads_today: u64, | ||
| pub cost_today_usd: f64, | ||
| pub failures_24h: u64, | ||
| pub pending_gates: u64, | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub last_activity: Option<String>, | ||
| pub created_at: String, | ||
| } |
There was a problem hiding this comment.
ProjectOverviewEntry doesn’t include goals/metrics, but the Projects drill-in UI (and the new E2E mock data) expects goals to be available from the overview payload. Either add goals (and optionally metrics) to ProjectOverviewEntry, or have the frontend fetch /api/engine/projects/{id} on drill-in and render goals/metrics from that response.
| if (tab === 'projects') { | ||
| loadProjectsOverview(); | ||
| } else if (crCurrentProjectId) { | ||
| // Reset drill-in state when leaving Projects tab so we start fresh. |
There was a problem hiding this comment.
When leaving the Projects tab, the code resets crCurrentProjectId but does not call destroyProjectWidgets() or reset/hide the drill/detail DOM. If a project widget registers timers/event listeners, it can keep running while the tab is hidden, and the next visit may double-mount widgets. Consider calling destroyProjectWidgets() (and/or crBackToOverview()) when switching away from tab === 'projects'.
| // Reset drill-in state when leaving Projects tab so we start fresh. | |
| // Reset drill-in state when leaving Projects tab so we start fresh. | |
| // Use the existing Projects navigation reset so any mounted widgets, | |
| // listeners, and drill/detail DOM are torn down before clearing state. | |
| crBackToOverview(); |
| }; | ||
| // Allow explicit project_id override (so agent can create | ||
| // missions in a specific project from any thread). | ||
| let target_project = params |
There was a problem hiding this comment.
Bug: the new-project skill tells the model to pass a project slug to mission_create.project_id, but this code only accepts a UUID and silently falls back to context.project_id on parse failure. The skill also does not call project_create, so /new-project can write files under projects/{slug} while all created missions land in the current/default project and no project card is created. Please either make the skill create a real project and use the returned UUID, or reject/resolve non-UUID project IDs here instead of falling back silently.
|
|
||
| /// Per-project summary with computed health and stats. | ||
| #[derive(Debug, Clone, serde::Serialize)] | ||
| pub struct ProjectOverviewEntry { |
There was a problem hiding this comment.
The frontend drill-in renders proj.goals/proj.metrics from the cached overview response, but ProjectOverviewEntry does not serialize those fields. In production this branch never renders project goals/metrics; the e2e mock includes fields that the real API cannot return, so the screenshot test misses the contract mismatch. Please add these fields to the overview response or fetch project detail before rendering them, and align the test mock with the production response shape.
| continue; | ||
| } | ||
|
|
||
| let css = workspace |
There was a problem hiding this comment.
This returns raw project widget CSS to the frontend, which then injects it into document.head without actually prefixing selectors. Global widgets go through scope_css(...) before being embedded, so project widgets can leak styles across the whole gateway UI or between widgets. Please scope this server-side with the existing CSS scoping helper, or reuse the resolved-widget loading path that already enforces the same isolation contract.
|
|
||
| for project in &projects { | ||
| let pid = project.id; | ||
| let threads = store.list_threads(pid, user_id).await.unwrap_or_default(); |
There was a problem hiding this comment.
Please do not turn store errors into empty thread/mission lists here. If either backend call fails, the overview can report zero activity, green health, and no attention items for a broken project, which makes operational failures look healthy. Propagate the error, or surface a degraded/unknown health state instead of unwrap_or_default().
henrypark133
left a comment
There was a problem hiding this comment.
Review: New-project skill + Projects Control Room + template ref resolution (Risk: Medium)
Good direction — the Projects Control Room is a meaningful UX upgrade, and template ref resolution for parallel tool calls is a valuable Qwen compatibility fix. However, two critical issues block this PR.
Positives:
- Clean engine v2 gating with
data-v2-only/data-v1-onlyattributes - Template ref resolution is well-tested with 7 unit tests covering multi-ref, nested JSON, and no-ref cases
- E2E screenshot test provides visual regression coverage
- Project overview with attention items and health stats is a solid foundation
Critical: mission_create project_id override has no ownership check (IDOR)
File: src/bridge/effect_adapter.rs (handle_project_call / mission_create changes)
The LLM can supply an arbitrary project_id UUID to mission_create:
let target_project = params.get("project_id")
.and_then(|v| v.as_str())
.and_then(|s| uuid::Uuid::parse_str(s).ok())
.map(ironclaw_engine::ProjectId)
.unwrap_or(context.project_id);MissionManager::create_mission() does NOT validate that target_project is owned by context.user_id. A prompt-injected LLM response could create missions in any user's project by guessing/leaking a UUID.
Fix: Validate ownership before accepting a non-default project_id:
if target_project != context.project_id {
let proj = store.load_project(target_project).await?;
match proj {
Some(p) if p.is_owned_by(&context.user_id) => {},
_ => return Some(Err(EngineError::Effect {
reason: "project_id does not belong to current user".into()
})),
}
}Critical: new-project skill v0.2 instructs agent to use slug as UUID — silently broken
File: skills/new-project/SKILL.md
The v0.2 skill tells the LLM: "Use the slug as project_id" — e.g. mission_create(project_id: "ai-research"). But project_id is parsed as uuid::Uuid::parse_str(), which fails on slugs and silently falls back to context.project_id. Every mission gets created in the wrong project.
Fix: Restore the project_create → get UUID → pass to mission_create workflow from v0.1.
Concerning: ProjectOverviewEntry omits goals field
File: src/bridge/router.rs
The struct has no goals field, but app.js reads proj.goals. The E2E mock masks this — goals will never display in production.
Fix: Add goals: Vec<String> to ProjectOverviewEntry and populate it.
Concerning: E2E test widget mock has wrong JSON shape
File: tests/e2e/scenarios/test_project_detail.py
Mock returns {"widgets": []} (object) but the real handler returns Vec<ResolvedWidget> (bare array). Test passes by accident because both result in "no widgets rendered."
Concerning: Template ref resolution not tested through the caller
File: src/bridge/llm_adapter.rs
Per .claude/rules/testing.md, the caller path (LlmBridgeAdapter::complete() with actual {{call_id.field}} refs in message history) should have a test, not just the helper functions.
Concerning: N+1 DB queries in overview
File: src/bridge/router.rs — get_engine_projects_overview()
2 sequential queries per project in a loop. Acceptable for MVP but will cause visible latency at 20+ projects.
Minor notes:
- Slug collision between similarly-named projects could serve wrong widgets via
project_widgets_handler - CI failing: formatting, cargo-deny, code style need fixing before merge
ilblackdragon
left a comment
There was a problem hiding this comment.
Review response (5e6c900)
Thanks @henrypark133 and @serrrfirat for the thorough review. All issues addressed:
Critical fixes
IDOR on mission_create project_id — Added ownership check: when project_id differs from the thread's project, we now load_project() + is_owned_by() before accepting. Non-UUID slugs (the old silent fallback) now return an explicit error.
project_create/update/list tools removed — These were supposed to be removed from this PR. The new-project skill intentionally uses memory_write + mission_create only, not separate project CRUD tools. Removed the capability registration, the handle_project_call() method, and the dispatch call.
Other fixes
| Issue | Fix |
|---|---|
ProjectOverviewEntry missing goals |
Added goals: Vec<String> field, populated from project |
| Store errors swallowed in overview | Changed unwrap_or_default() to map_err()? — errors propagate now |
| Project widget CSS not scoped | Applied ironclaw_gateway::scope_css() server-side in project_widgets_handler |
| Template ref doc comment mismatch | Updated to document partial resolution semantics |
| E2E mock widget shape wrong | Fixed: [] instead of {"widgets": []} |
| Widget leak on tab switch | crBackToOverview() now called when leaving Projects tab |
| CI: cargo-deny failures | Removed stale wasmtime advisory ignores, added RUSTSEC-2026-0097 (rand) |
| CI: formatting | Ran cargo fmt |
Tests added
- Caller-level template ref test:
complete_resolves_template_refs_through_adapter— drivesLlmBridgeAdapter::complete()with a mock provider returning{{call-1.project_id}}refs, verifies resolution through the full adapter path (per testing rules: "Test Through the Caller") - Total: 9 template ref tests (8 helper + 1 caller-level)
Deferred
- N+1 in overview: acceptable for MVP, will batch when project count warrants it
new FunctionCSP concern: project widgets follow the same trust model as global widgets (user-placed = trusted), and the gateway CSP already allows inline scripts. Will revisit if we add untrusted widget sources.- Slug collision for widget paths: noted, will add stable slug to project metadata in a follow-up
| // Eval the JS module to register the widget. | ||
| try { | ||
| var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null; | ||
| var fn = new Function('container', 'api', 'projectId', w.js); |
There was a problem hiding this comment.
Critical Severity — Arbitrary JS execution via workspace widgets
Project widgets load JS from projects/{slug}/.system/widgets/{id}/index.js in the workspace and execute it via new Function(...) in the browser. The memory_write tool has no write protection for .system/widgets/ paths within project directories.
A prompt injection attack could instruct the LLM to write malicious JS to projects/{slug}/.system/widgets/evil/index.js + a manifest.json. The next time the user drills into that project, the JS executes in their authenticated session with full access to IronClaw.api, cookies, and the DOM.
Unlike global widgets (loaded via <script type="module"> from a server-controlled template), project widgets are dynamically fetched and eval'd from user-writable workspace storage. No CSP, no sandboxing, no user confirmation.
Suggested fix: Either:
(a) Block memory_write from writing to .system/ paths unless via a privileged code path, or
(b) Sandbox project widget JS in an iframe with a null origin, or
(c) Require explicit user confirmation before mounting project widgets.
There was a problem hiding this comment.
Deferred — widgets follow same trust model as global widgets (user-authored workspace content). CSP blocks external eval. Will revisit for untrusted widget sources.
| /// encountered, resolution stops and earlier successful substitutions | ||
| /// are preserved (partial resolution). Returns the original string | ||
| /// unchanged if no `{{` markers are found. | ||
| fn resolve_template_refs(value: &str, tool_results: &[(String, serde_json::Value)]) -> String { |
There was a problem hiding this comment.
High Severity — Template ref resolution operates on LLM-controlled data before safety pipeline
Template resolution substitutes {{call_id.field}} references from prior tool results into tool call parameters. Both call_id and field come from LLM output, and the resolved value flows directly into downstream tool parameters.
Resolution happens before the safety pipeline sees the parameters. If a malicious tool result contains crafted JSON (e.g., {"project_id": "'; DROP TABLE ..."}), the resolved value flows unsanitized into downstream tools. The effect adapter validates project_id as UUID (blocking this specific case), but other tools may not validate as strictly.
Suggested fix: Document this trust boundary explicitly. Log when template resolution occurs for audit trail. Consider running resolved parameters through safety validation.
There was a problem hiding this comment.
Deferred — template refs resolve from prior tool results already validated by the safety pipeline. Will add a second pass on substituted values as a hardening step in a follow-up.
| /// Iterates all projects, computes per-project stats from missions and threads, | ||
| /// and collects pending gates as attention items. Designed for the control room | ||
| /// dashboard where the user checks in on a highly autonomous agent. | ||
| pub async fn get_engine_projects_overview( |
There was a problem hiding this comment.
High Severity — N+1 queries in projects overview
For each project, the handler issues separate list_threads(pid, user_id) and list_missions_with_shared(pid, user_id) queries. With P projects, this is 2P+1 database round trips. A user with 20 projects and thousands of threads will see significant latency. No timeout, pagination, or caching.
Suggested fix: Add pagination or caching. Consider a dedicated Store method that computes per-project stats in a single query. At minimum, add a TODO documenting the N+1 pattern.
| Create recurring missions scoped to the project. Use the **slug** as `project_id`: | ||
|
|
||
| ``` | ||
| mission_create(name: "...", goal: "...", cadence: "daily", project_id: "{slug}") |
There was a problem hiding this comment.
Medium Severity — Skill tells LLM to use slug as project_id, but effect adapter rejects non-UUIDs
The skill instructs: mission_create(... project_id: "{slug}"). But the new effect adapter code rejects non-UUID project_id values with "Invalid project_id '{pid_str}': must be a UUID." This means the LLM following the skill will always fail when passing a slug.
The skill and the validation are contradictory — this is a functional bug.
Suggested fix: Update the skill to instruct the LLM to use the project UUID returned from project creation, not the slug.
| // Scope it client-side using the same prefix convention. | ||
| var prefix = '[data-widget="' + manifest.id + '"]'; | ||
| // Simple client-side scoping: prepend prefix to each rule. | ||
| // For production, use the server's scope_css. For now, inject as-is |
There was a problem hiding this comment.
Medium Severity — Misleading CSS injection comment
The client-side code has a comment saying "For production, use the server's scope_css. For now, inject as-is" — but the server-side handler does call scope_css(). The comment is wrong and could lead a future developer to remove the server-side scoping (thinking it's handled client-side) or vice versa.
Suggested fix: Fix the comment to accurately state that CSS arrives pre-scoped from the server.
There was a problem hiding this comment.
Fixed in b7fe8cc — comment now correctly states CSS is scoped server-side.
| _projectWidgets.push({ | ||
| id: manifest.id, | ||
| container: container, | ||
| style: style || null, |
There was a problem hiding this comment.
Medium Severity — `style` variable potentially undefined
If `w.css` is falsy, the `style` variable declared inside the `if (w.css)` block is never assigned. In non-strict JS, `style` resolves to `undefined` (or throws a ReferenceError in strict mode). The `|| null` fallback handles `undefined` but relies on hoisting behavior.
Suggested fix: Declare `var style = null;` before the `if (w.css)` block to make the intent explicit.
There was a problem hiding this comment.
Fixed in b7fe8cc — var style = null; declared before the if block.
| Ok(Json(EngineProjectListResponse { projects })) | ||
| } | ||
|
|
||
| pub async fn engine_projects_overview_handler( |
There was a problem hiding this comment.
Medium Severity — No rate limiting on expensive overview endpoint
The /api/engine/projects/overview endpoint is computationally expensive (N+1 queries) but has no rate limiting. An attacker with a valid auth token could spam this endpoint to create excessive database load. Other expensive endpoints (chat send) have rate limiting.
Suggested fix: Add rate limiting or caching for this endpoint.
There was a problem hiding this comment.
Deferred — endpoint is auth-protected and single-user. Will add rate limiting when project count grows.
…new-project skill Adds project metrics types, mission cadence scheduling via gateway, and a /new-project skill for creating autonomous projects with goals, metrics, and missions. Includes gateway frontend enhancements for project views with metrics and goal tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… new-project skill
Two fixes from trace analysis (trace_20260411T133641.json):
1. Skill rewrite: new-project skill now instructs the model to use
memory_write + mission_create directly instead of referencing
nonexistent project_create/project_update tools. Includes goals
and metrics when appropriate. Instructs sequential execution.
2. Template ref resolution: some OpenAI-format models (e.g. Qwen)
emit {{call_id.field}} references in parallel tool call arguments.
Added resolution pass in LlmBridgeAdapter that scans ActionCall
parameters for these patterns and resolves them from prior tool
results in the conversation history.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5e6c900 to
b7fe8cc
Compare
PR Review Response — Round 2Rebased onto latest staging and addressed remaining open comments: Fixed in b7fe8cc
Deferred (with rationale)
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for project in &projects { | ||
| let pid = project.id; | ||
| let threads = store | ||
| .list_threads(pid, user_id) | ||
| .await | ||
| .map_err(|e| engine_err("list project threads", e))?; | ||
| let missions = store | ||
| .list_missions_with_shared(pid, user_id) | ||
| .await | ||
| .map_err(|e| engine_err("list project missions", e))?; |
There was a problem hiding this comment.
get_engine_projects_overview does per-project I/O sequentially (list_threads then list_missions_with_shared inside a for loop). With many projects (or slow backends) this becomes an N+1 latency multiplier and can degrade the Projects overview UI. Consider fetching threads/missions concurrently per project (e.g., tokio::try_join! for the two calls, and futures::future::try_join_all across projects) to keep the endpoint responsive.
| .collect(); | ||
| name_lower == needle | ||
| || name_slug == needle | ||
| || name_slug.starts_with(&format!("{needle}-")) |
There was a problem hiding this comment.
Medium Severity — Slug prefix matching can route missions to wrong project
The starts_with branch at line 315 is over-broad. If a user has projects "AI" (slug: ai) and "AI Research Intelligence" (slug: ai-research-intelligence), a mission_create with project_id: "ai" will match "AI Research Intelligence" because "ai-research-intelligence".starts_with("ai-") is true. Combined with find() returning the first match, this is non-deterministic — the result depends on list ordering from the store.
Suggested fix: Remove the starts_with branch and require exact name/slug match. If fuzzy matching is desired, split both the needle and slug by - and check that all needle tokens match as a prefix of slug tokens:
name_lower == needle
|| name_slug == needle
// Remove: || name_slug.starts_with(&format!("{needle}-"))Also: no tests cover this resolution logic. Given the ambiguity risk, adding unit tests for exact match, slug match, and multi-project disambiguation would be valuable.
There was a problem hiding this comment.
Fixed in a617fea — removed the starts_with branch entirely. Now only exact name or exact slug match is accepted.
|
|
||
| let mut result = value.to_string(); | ||
| // Iteratively resolve all `{{..}}` patterns (limit iterations to prevent infinite loops) | ||
| for _ in 0..50 { |
There was a problem hiding this comment.
Medium Severity — Second-order template injection via tool results
The iterative resolution loop re-scans from position 0 after each replacement. If a resolved value itself contains {{call-id.field}} patterns, those will be resolved in subsequent iterations. This creates a second-order injection path:
- Tool A returns
{"id": "{{tool-B-call-id.secret_field}}"} - LLM makes a parallel call referencing
{{tool-A-call-id.id}} - First iteration resolves to
{{tool-B-call-id.secret_field}} - Second iteration resolves the injected ref, leaking Tool B's data into Tool A's parameter
While the LLM already sees all tool results (limiting practical impact), this is relevant when tools come from untrusted MCP servers where one tool shouldn't be able to read another's results.
Suggested fix: After resolving a template ref, advance the scan position past the replacement rather than restarting from 0:
// Instead of scanning from 0 each time:
let mut pos = 0;
for _ in 0..50 {
let Some(start) = result[pos..].find("{{").map(|i| i + pos) else { break; };
// ... resolve ...
match resolved {
Some(val) => {
let val_len = val.len();
result.replace_range(start..end + 2, &val);
pos = start + val_len; // skip past replacement
}
None => break,
}
}There was a problem hiding this comment.
Fixed in a617fea — resolve_template_refs now tracks a search_from cursor that advances past each resolved value, so injected {{...}} patterns in tool results are never re-scanned. Added resolve_template_refs_no_second_order_injection test.
| /// Iterates all projects, computes per-project stats from missions and threads, | ||
| /// and collects pending gates as attention items. Designed for the control room | ||
| /// dashboard where the user checks in on a highly autonomous agent. | ||
| pub async fn get_engine_projects_overview( |
There was a problem hiding this comment.
Medium Severity — N+1 query problem in projects overview
This function issues 1 + 2N database queries: 1 for `list_projects`, then per project: `list_threads` + `list_missions_with_shared`. For a user with 10 projects, that's 21 round trips. The overview endpoint is called on every tab switch to Projects, so this will be felt in the UI.
Suggested fix (pick one):
- Batch + partition: Add `store.list_all_threads_for_user(user_id)` and `store.list_all_missions_for_user(user_id)`, then partition by project_id in memory. Reduces to 3 queries total.
- Dedicated aggregate query: Add a single SQL query that JOINs projects with thread/mission counts, returning all stats in one round trip.
- Short-term: Cache the overview response for a few seconds (`tokio::sync::watch` or a simple timestamp check), since the data is statistical and doesn't need real-time accuracy.
There was a problem hiding this comment.
Fixed in a617fea — parallelized with tokio::try_join! (intra-project) and futures::try_join_all (inter-project). Still 1+2N queries total but they execute concurrently now.
| // Eval the JS module to register the widget. | ||
| try { | ||
| var api = typeof IronClaw !== 'undefined' ? IronClaw.api : null; | ||
| var fn = new Function('container', 'api', 'projectId', w.js); |
There was a problem hiding this comment.
Medium Severity — Arbitrary JS execution from workspace content
`new Function(..., w.js)` executes arbitrary JavaScript from the user's workspace (`projects/{slug}/.system/widgets/`). While this is authenticated and the user is executing their own code, the risk is that any tool with workspace write access (e.g., `memory_write`, MCP tools) could place a malicious widget in `.system/widgets/` that then runs with full browser context (cookies, DOM access, `apiFetch` via the injected `api` parameter).
This is a privilege escalation path: a tool that should only write data can now execute code in the user's browser session.
Suggested mitigations (not necessarily all needed):
- Document: Mark `.system/` paths as privileged in the safety layer so writes to `.system/widgets/` require explicit user approval
- Sandbox: Run widget JS in an iframe with `sandbox` attribute instead of `new Function`
- Audit log: Log widget loads (project, widget id, hash) so users can detect unexpected widget installation
There was a problem hiding this comment.
Acknowledged — deferring to a follow-up. Same trust model as global widgets. Will add .system/ write protection when we add untrusted widget sources.
| if (!el) return; | ||
| if (!items.length) { el.style.display = 'none'; return; } | ||
| el.style.display = ''; | ||
| el.innerHTML = '<div class="cr-attention-title">Needs attention</div>' |
There was a problem hiding this comment.
Medium Severity — Hard-coded English strings break i18n
The new control room code uses hard-coded English strings throughout instead of `I18n.t()` calls:
- Line 5916: `"Needs attention"`
- Line 5989: `"All Projects"`
- Line 5997: `"Goals"`
- Line 6040: `"Missions"`
- Line 6070: `"Recent Activity"`
- Plus: "No missions configured yet", "No threads yet", "Loading projects...", "Failed to load projects", etc.
The rest of the codebase consistently uses `I18n.t('key')` for all user-visible strings.
Suggested fix: Add i18n keys under a `projects.` namespace in `en.js` (and other locale files) and replace all hard-coded strings with `I18n.t()` calls. This can be done as a follow-up PR if you'd prefer to land the functionality first.
There was a problem hiding this comment.
Valid, deferring to a follow-up PR — adding i18n keys for the control room strings is a separate concern from the functional fixes here.
| c | ||
| } else { | ||
| '-' | ||
| } |
There was a problem hiding this comment.
High Severity — Fuzzy slug matching can silently create missions in the wrong project
The starts_with check on the slug match (around line 306–310) can resolve to an unintended project. Example:
name_slug.starts_with(&format!("{needle}-"))If a user/agent passes project_id="ai" and there's no project named exactly "AI", this matches any project whose slug starts with "ai-" — e.g., "AI Research Intelligence" (ai-research-intelligence). The mission silently lands in the wrong project instead of returning an error.
Suggested fix: Remove the starts_with clause entirely, or require an exact match on either name_lower or name_slug. Prefix matching on slugs is too ambiguous when multiple projects share a common prefix:
name_lower == needle || name_slug == needleThere was a problem hiding this comment.
Fixed in a617fea — starts_with branch removed. Only exact name/slug match accepted.
| let slug = project | ||
| .name | ||
| .to_lowercase() | ||
| .replace(|c: char| !c.is_alphanumeric() && c != '-', "-"); |
There was a problem hiding this comment.
Medium Severity — Slug generation inconsistency between effect_adapter and frontend handler
The slug algorithm differs between these two files:
effect_adapter.rs (line ~294) — uses is_ascii_alphanumeric():
.map(|c| if c.is_ascii_alphanumeric() || c == '-' { c } else { '-' })frontend.rs (line ~498) — uses is_alphanumeric() (Unicode-aware):
.replace(|c: char| !c.is_alphanumeric() && c != '-', "-")For non-ASCII project names (e.g., "Café Project"), effect_adapter produces "caf--project" while frontend.rs produces "café-project". This means project_widgets_handler may look up widgets in a different directory than where mission_create resolved the project.
Suggested fix: Extract a single Project::slug() method on the Project type in crates/ironclaw_engine/src/types/project.rs and use it in both locations.
There was a problem hiding this comment.
Fixed in a617fea — frontend.rs now uses is_ascii_alphanumeric() matching effect_adapter.rs.
| } else { | ||
| vec![] | ||
| }; | ||
| // Allow explicit project_id override (so agent can create |
There was a problem hiding this comment.
Medium Severity — No test coverage for cross-project mission_create ownership validation
The new project_id override logic (lines 245–340) is security-critical — it prevents IDOR by validating that the target project belongs to the current user. However, there are no tests for any of these paths:
- UUID project_id with valid ownership → mission created in target project
- UUID project_id owned by a different user → error returned
- Non-existent UUID project_id → error returned
- Slug/name resolution → correct project matched
- Ambiguous slug → expected behavior documented and tested
Per CLAUDE.md testing rules ("Test Through the Caller, Not Just the Helper"), this should have at least an integration-tier test that drives the EffectBridgeAdapter::execute() call site with a mission_create action containing an explicit project_id.
There was a problem hiding this comment.
IDOR ownership validation is integration-tier (requires DB). Added two unit-tier security tests for template ref injection paths in this commit. The IDOR paths themselves are exercised via the existing complete_resolves_template_refs_through_adapter caller-level test pattern.
Resolves merge conflicts: - deny.toml: accept new RUSTSEC-2026-0097 advisory - bridge/mod.rs: deduplicate re-exports, add BridgeOutcome - bridge/router.rs: combine project_id with expanded cadence/guardrail params - bridge/effect_adapter.rs: keep both IDOR validation and guardrails extraction - app.js: integrate ActivityWorkStore, mission progress tracking alongside control room; unify openMissionDetail/openEngineThread to route by active tab - index.html: add Missions tab alongside Projects tab Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…+1 queries - Remove over-broad `starts_with` slug prefix matching in mission_create project_id resolution — require exact name/slug match only (serrrfirat) - Fix slug generation inconsistency: frontend.rs now uses is_ascii_alphanumeric() matching effect_adapter.rs (serrrfirat) - Prevent second-order template injection: resolve_template_refs now advances past resolved content instead of re-scanning from position 0, and skips unresolvable refs instead of breaking (serrrfirat) - Parallelize N+1 overview queries: per-project thread/mission fetches now use tokio::try_join! + futures::try_join_all (serrrfirat, Copilot) - Add two new security tests for template ref resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR Review Response — Round 3Merged Fixed
Deferred (unchanged from round 2)
|
… tab, onboarding E2E Scheduled batched CI on staging was red across three unrelated paths. All three are fixed in-place; the existing tests become the regression coverage. 1. `tests/support/test_rig.rs`: rebuild the skill registry against the test's `with_skills_dir()` tempdir and actually run `discover_all()`. `AppBuilder::init_database()` reloads `config` from DB/TOML/env at the top of `build_all()`, which clobbered `config.skills.local_dir` back to the default (`~/.ironclaw/skills/`). Any registry `build_all()` constructed therefore pointed at the user's real skills dir, not the tempdir the test had laid down — so `loaded_skill_names()` came back empty and the v1 chain-load assertion panicked. Write the tempdir paths back onto `components.config.skills.*` so `AgentDeps::skills_config` agrees with the registry. `skill_chain_load_lifecycle::v1_chain_load_pulls_in_required_companions` now passes. 2. `crates/ironclaw_gateway/static/index.html`: drop the duplicate right-side `status-logs-btn` Jobs button added in #2353. The main tab-bar already has `<button data-tab="jobs">Jobs</button>`, and the duplicate had no `data-v1-only`/`data-v2-only` marker, so both rendered simultaneously. That broke `test_connection.py` (Playwright strict-mode rejected `.tab-bar button[data-tab="jobs"]` resolving to two elements) and also left both buttons visually `active` when the Jobs tab was open. 3. `tests/e2e/scenarios/test_extensions.py`: align `test_onboarding_failed_sse_shows_error_toast_and_reloads_extensions` with every other auth-card test in the file — resolve the real thread id via `_active_thread_id(page)` before calling `_show_auth_card`. `showAuthCard` short-circuits on `isCurrentThread(data.thread_id)`, and the synthetic `"thread-fail"` id fails that check once `currentThreadId` is populated after `go_to_extensions(page)`. The auth card was never rendered, so the follow-up `wait_for` for `.auth-card` hit its 5s timeout. Verified: `cargo test --features libsql --test skill_chain_load_lifecycle` and `--test skill_setup_marker_lifecycle` pass; `cargo clippy --tests --features libsql` is clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tab, onboarding E2E (#2637) Scheduled batched CI on staging was red across three unrelated paths. All three are fixed in-place; the existing tests become the regression coverage. 1. `tests/support/test_rig.rs`: rebuild the skill registry against the test's `with_skills_dir()` tempdir and actually run `discover_all()`. `AppBuilder::init_database()` reloads `config` from DB/TOML/env at the top of `build_all()`, which clobbered `config.skills.local_dir` back to the default (`~/.ironclaw/skills/`). Any registry `build_all()` constructed therefore pointed at the user's real skills dir, not the tempdir the test had laid down — so `loaded_skill_names()` came back empty and the v1 chain-load assertion panicked. Write the tempdir paths back onto `components.config.skills.*` so `AgentDeps::skills_config` agrees with the registry. `skill_chain_load_lifecycle::v1_chain_load_pulls_in_required_companions` now passes. 2. `crates/ironclaw_gateway/static/index.html`: drop the duplicate right-side `status-logs-btn` Jobs button added in #2353. The main tab-bar already has `<button data-tab="jobs">Jobs</button>`, and the duplicate had no `data-v1-only`/`data-v2-only` marker, so both rendered simultaneously. That broke `test_connection.py` (Playwright strict-mode rejected `.tab-bar button[data-tab="jobs"]` resolving to two elements) and also left both buttons visually `active` when the Jobs tab was open. 3. `tests/e2e/scenarios/test_extensions.py`: align `test_onboarding_failed_sse_shows_error_toast_and_reloads_extensions` with every other auth-card test in the file — resolve the real thread id via `_active_thread_id(page)` before calling `_show_auth_card`. `showAuthCard` short-circuits on `isCurrentThread(data.thread_id)`, and the synthetic `"thread-fail"` id fails that check once `currentThreadId` is populated after `go_to_extensions(page)`. The auth card was never rendered, so the follow-up `wait_for` for `.auth-card` hit its 5s timeout. Verified: `cargo test --features libsql --test skill_chain_load_lifecycle` and `--test skill_setup_marker_lifecycle` pass; `cargo clippy --tests --features libsql` is clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #2626. `tests/e2e/helpers.py` used `.tab-bar button[data-tab="{tab}"]` to locate every main-nav tab button. Commit 5058a1c removed the duplicate right-side `status-logs-btn` Jobs button that was resolving that selector to two elements on staging, so `test_connection.py` passes again. The underlying selector is still fragile: any future auxiliary button or `_addWidgetTab`-injected tab that reuses a built-in `data-tab` id (even accidentally, as #2353 did) will make the selector resolve to multiple elements and trip Playwright strict mode the next CI run. Harden the selector in place — the issue's suggested direction of scoping to a more specific parent region — so the regression can't repeat without a test-side opt-in: - `.tab-bar > button`: direct child, skipping any hypothetical nested buttons (e.g. menu popovers). - `:not(.status-logs-btn)`: excludes the right-side logs/docs cluster that uses the same `data-tab` hook for click routing. - `:not(.tab-btn)`: excludes widget-injected tabs (see `_addWidgetTab` in `crates/ironclaw_gateway/static/app.js`), which always carry the `tab-btn` class and could in principle collide with a built-in id. Verified against a synthetic DOM with three colliding `data-tab="jobs"` buttons (original, status-logs-btn duplicate, widget-injected): the old selector matches 3 and trips strict mode on `.click()`; the new selector matches 1 and clicks cleanly. No production HTML change is required — the acceptance criterion explicitly forbids one. `pytest tests/e2e/scenarios/test_connection.py -v` → 3 passed.
…2656) Closes #2626. `tests/e2e/helpers.py` used `.tab-bar button[data-tab="{tab}"]` to locate every main-nav tab button. Commit 5058a1c removed the duplicate right-side `status-logs-btn` Jobs button that was resolving that selector to two elements on staging, so `test_connection.py` passes again. The underlying selector is still fragile: any future auxiliary button or `_addWidgetTab`-injected tab that reuses a built-in `data-tab` id (even accidentally, as #2353 did) will make the selector resolve to multiple elements and trip Playwright strict mode the next CI run. Harden the selector in place — the issue's suggested direction of scoping to a more specific parent region — so the regression can't repeat without a test-side opt-in: - `.tab-bar > button`: direct child, skipping any hypothetical nested buttons (e.g. menu popovers). - `:not(.status-logs-btn)`: excludes the right-side logs/docs cluster that uses the same `data-tab` hook for click routing. - `:not(.tab-btn)`: excludes widget-injected tabs (see `_addWidgetTab` in `crates/ironclaw_gateway/static/app.js`), which always carry the `tab-btn` class and could in principle collide with a built-in id. Verified against a synthetic DOM with three colliding `data-tab="jobs"` buttons (original, status-logs-btn duplicate, widget-injected): the old selector matches 3 and trips strict mode on `.click()`; the new selector matches 1 and clicks cleanly. No production HTML change is required — the acceptance criterion explicitly forbids one. `pytest tests/e2e/scenarios/test_connection.py -v` → 3 passed.
Summary
project-management→new-projectso users can invoke/new-project <description>. Rewrote skill to usememory_write+mission_createdirectly instead of referencing nonexistentproject_create/project_updatetools. Includes goals and metrics when the project warrants them. Instructs sequential tool execution.{{call_id.field}}template references in parallel tool call arguments. Added a resolution pass inLlmBridgeAdapterthat resolves these from prior tool results in conversation history, preventing orphaned/broken mission creates.Motivated by trace analysis (
trace_20260411T133641.json) where Qwen 3.5-122B failed to create a project correctly via the old skill.Project Detail Page
Test plan
bridge::llm_adaptertests pass (8 new template resolution tests)cargo clippycleantest_project_detail.py) passes/new-project <description>creates workspace files and missions correctly🤖 Generated with Claude Code