feat: Claude Agent SDK Design Assistant with editable prompts#47
feat: Claude Agent SDK Design Assistant with editable prompts#47
Conversation
Comprehensive design document for refactoring Clara's Design Assistant from Pydantic AI to Claude Agent SDK with AG-UI integration. Key aspects covered: - Architecture comparison (current vs proposed) - Claude Agent SDK concepts (query, subagents, custom tools, hooks) - AG-UI integration for dynamic UI components - Session state management - Implementation plan (5 phases) - File structure for new architecture - Open questions for discussion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add claude-agent-sdk dependency for agent orchestration - Create DesignAssistantSession using ClaudeSDKClient with subagents - Define subagents: domain-expert, rubric-designer, agent-configurator - Add prompts for architect and subagent roles - Create API endpoints for design sessions with SSE streaming - Register design sessions router in main.py with cleanup on shutdown This replaces the Pydantic AI approach with Claude Agent SDK for the Design Assistant, enabling: - Native subagent delegation via AgentDefinition - Pre/Post tool hooks for observability - Async streaming responses via SSE 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Create tools.py with 6 custom MCP tools: - mcp__clara__project: Set project context - mcp__clara__entity: Add entity types to extract - mcp__clara__agent: Configure interview agents - mcp__clara__ask: Present interactive options to user - mcp__clara__phase: Transition design phases - mcp__clara__preview: Get current blueprint state - Update design_assistant.py to register MCP tools - Update architect prompt with tool usage guidelines - In-memory state storage (database persistence to follow) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix tool return format to use {"content": [{"type": "text", "text": "..."}]}
- Fix hook callback signatures to match (input_data, tool_use_id, context)
- End-to-end tested with real Anthropic API - tools work correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements the complete 3-phase design assistant with orchestrator pattern:
- Phase 1: Goal Understanding - discovers user project through conversation
- Phase 2: Agent Configuration - configures specialist interview agent
- Phase 3: Blueprint Design - creates interview blueprint with entities
Backend changes:
- Add phase-specific prompt templates with {{placeholder}} hydration
- Add MCP tools: hydrate_phase2, hydrate_phase3, get_prompt with session_id
- Add DesignSessionPrompt model for persisting hydrated prompts
- Update architect prompt with phase transition protocol
- Remove deprecated agent_configurator, domain_expert, rubric_designer prompts
Frontend changes:
- Add DesignAssistantPage with chat interface and blueprint sidebar
- Add components: ChatMessage, ChatInput, OptionCards, AgentConfiguredCard
- Add useDesignSession hook for SSE streaming and state management
- Add design-sessions API client
- Add routing and navigation from project detail
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
…tions Implements database persistence for design sessions so users can resume conversations where they left off. Sessions now store conversation history, phase state, blueprint state, goal summary, and agent capabilities. - Add DesignSession database model with JSON columns for flexible state storage - Update API to create-or-resume pattern: returns existing active session if one exists - Add restore_session method to DesignAssistantManager for in-memory state restoration - Update frontend hook to detect resumed sessions and restore UI state - Persist assistant responses and state after SSE streaming completes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Replaces the simpler phase 2 prompt with a more sophisticated 4-stage approach that guides the LLM through explicit thinking stages: - Stage 1: Initial thinking and scratchpad work - Stage 2: Structured analysis with explicit requirement mapping - Stage 3: Draft and validate JSON in thinking block - Stage 4: Final polished JSON output in <json_output> tags This should produce more thorough and well-reasoned agent configurations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…back loop Replaces the phase 3 prompt with an improved version featuring: - Structured analysis process with explicit <analysis> tags in thinking - Systematic goal clarity assessment across 5 dimensions - One clarifying question at a time (vs 3-5 at once) - Draft blueprint with explicit feedback request before finalizing - Clear output formats with XML tags for consistent parsing - Iterative design pattern: draft → feedback → iterate → finalize This encourages more thorough reasoning and collaborative blueprint design. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Renames and rewrites the orchestrator prompt with clearer architecture: - Clarifies this is a single agent with phase-specific prompts (not subagents) - Adds visual flow diagram showing data flow between phases - Documents each phase's purpose, mode (interactive/automatic/iterative), and outputs - Explains why Phase 2 exists (specialist persona primes Phase 3) - Adds session persistence documentation - Lists key principles for the agent's behavior Updates code references in design_assistant.py and docs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Phase 3 was jumping straight to building the complete blueprint without user input. Now uses a 3-step approach with mandatory confirmations: Step 1: Propose entity types → wait for confirmation Step 2: Propose agent configuration → wait for confirmation Step 3: Build complete blueprint with all details Key changes: - Explicit "Wait for user confirmation before proceeding" at each step - Example interaction flow showing the back-and-forth - Clarifying questions encouraged even when goal seems clear - Tools only used AFTER user confirms each section This balances Phase 1's extensive discovery with Phase 3's design work. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… 3 flow - Add PromptEditor component for reviewing/editing generated system prompts - Add prompt_editor MCP tool to display prompts for user editing - Update Phase 3 template with stepped blueprint design flow - Add DebugPanel component for development visibility - Improve OptionCards with multi-select support and clearer UX - Update design-session types with PromptEditorUIComponent - Wire up onPromptSave callback to send edited prompts back to backend Blueprint persistence already works via DesignSession.blueprint_state column. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Test comment |
Pull Request Review: Claude Agent SDK Design AssistantOverviewThis is a significant architectural change migrating from Pydantic AI to Claude Agent SDK for the Design Assistant. The implementation is well-structured with a 3-phase conversation flow and introduces interactive prompt editing capabilities. Overall, this is solid work with clear separation of concerns. StrengthsArchitecture & Design
Code Quality
|
Issues & Concerns1. Security: In-Memory State Storage (High Priority)Location: src/backend/clara/agents/tools.py:52-72 Issue: Session state is stored in a global in-memory dictionary. This has several problems:
Recommendation: Add session ownership validation and consider Redis for production 2. Missing Input Validation (Medium Priority)Location: src/backend/clara/agents/tools.py:285-360 The MCP tools accept user input without validation:
Recommendation: Add validation using Pydantic Field constraints 3. Error Handling Gaps (Medium Priority)Location: src/backend/clara/api/design_sessions.py:268-320 The SSE event generator catches exceptions but doesn't properly clean up or notify users when database persistence fails. Recommendation: Add proper error recovery and user notification 4. Template Placeholder Injection Risk (Low-Medium Priority)Location: src/backend/clara/agents/tools.py:37-48 If user input contains prompt injection patterns, they'll be directly inserted into agent prompts without sanitization. Recommendation: Add sanitization or use a safer templating engine 5. Testing Coverage (High Priority)Issue: No test files for the new Design Assistant functionality are included in this PR. Expected but missing:
Recommendation: Add comprehensive tests covering MCP tool execution, session restoration, phase transitions, error handling, and SSE streaming |
Best Practice Recommendations
Code Quality Metrics
Action Items Before MergeMust Fix (Blocking)
Should Fix (High Value)
|
Specific File Commentssrc/backend/clara/agents/design_assistant.py
src/backend/clara/agents/tools.py
src/backend/clara/api/design_sessions.py
src/frontend/src/components/design-assistant/PromptEditor.tsx
SummaryThis is quality work that significantly advances the Design Assistant functionality. The architecture is sound and the code is generally well-written. However, there are important security and reliability concerns that should be addressed before merging to production:
Once these are addressed, this will be an excellent addition to the codebase. Recommendation: Request changes for items 1-4 above, then approve. Review completed following CLAUDE.md conventions and security best practices. |
- Add InputSanitizer with truncation, Unicode normalization, and template
injection prevention (escapes {{ and }} markers)
- Add TTL-based session state cleanup to prevent memory leaks (60min default)
- Add input validation on MCP tools (project, entity, agent tools)
- Improve SSE streaming error handling with persistence failure notifications
- Add comprehensive unit tests (24 tests covering session state, template
hydration, sanitization, and template loading)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Code Review: Claude Agent SDK Design AssistantThis is a significant and well-architected implementation! The code demonstrates strong engineering practices and thoughtful design. Below is my detailed feedback: 🎯 Overall AssessmentStrengths:
Areas for Improvement:
🔒 Security ConcernsHIGH PRIORITY: Template Injection Risk# tools.py:208-210
value = value.replace("{{", "{ {").replace("}}", "} }")Issue: This sanitization is insufficient for preventing template injection. An attacker could craft input like:
Recommendation: # Use allowlist approach instead
ALLOWED_PLACEHOLDERS = {"goal", "role", "capabilities", "expertise_areas", "interaction_style"}
def sanitize_template_value(cls, value: str | None) -> str:
if not value:
return ""
# Remove ALL template markers, not just specific ones
# Only allow them in predefined safe contexts
value = re.sub(r'\{\{[^}]*\}\}', '', value)
return valueMEDIUM: Missing Input Validation# design_sessions.py:56
class SendMessageRequest(BaseModel):
message: str # No length limit, could cause DoSRecommendation: Add validation: from pydantic import Field, validator
class SendMessageRequest(BaseModel):
message: str = Field(..., max_length=10000)
@validator('message')
def validate_message(cls, v):
if InputSanitizer.detect_injection_attempt(v):
logger.warning("Potential injection attempt detected")
return InputSanitizer.sanitize_message(v)LOW: Prompt Injection Detection Could Be BypassedThe Recommendation: Consider using a more robust detection library or adding character normalization. 🐛 Potential Bugs1. Race Condition in Session State# design_assistant.py:411
self._sync_state_from_tools()Issue: The state sync happens asynchronously while tool calls may still be executing. If a tool modifies state during the sync, you could get inconsistent state. Recommendation: Use a lock: def __init__(self, session_id: str, project_id: str):
self._state_lock = asyncio.Lock()
async def _sync_state_from_tools(self) -> None:
async with self._state_lock:
tool_state = get_session_state(self.session_id)
# ... rest of sync logic2. Missing Database Commit# design_sessions.py:220-245 (stream endpoint)Issue: The stream endpoint saves messages to the database but I don't see an explicit commit. With autocommit=False, this could lose data. Recommendation: Add explicit commit or ensure transaction is committed: finally:
await db.commit()3. Memory Leak in Event Queue# design_assistant.py:100
self._response_queue: asyncio.Queue[AGUIEvent] = asyncio.Queue()Issue: If events are emitted faster than consumed (e.g., network slow), the queue could grow unbounded. Recommendation: self._response_queue: asyncio.Queue[AGUIEvent] = asyncio.Queue(maxsize=100)4. Frontend: Unbounded Array Growth// useDesignSession.ts:101
setDebugEvents((prev) => [...prev, event]);Issue: Debug events accumulate indefinitely in memory. Long sessions could cause browser to slow down. Recommendation: Limit size: setDebugEvents((prev) => {
const updated = [...prev, event];
// Keep only last 100 events
return updated.slice(-100);
});⚡ Performance Considerations1. Session State Cleanup Not Triggered# tools.py:92-110
def cleanup_stale_sessions() -> int:Issue: This function is defined but never called. Stale sessions will accumulate in memory indefinitely. Recommendation: Add a background task: # main.py
from fastapi import BackgroundTasks
import asyncio
async def periodic_cleanup():
while True:
await asyncio.sleep(300) # Every 5 minutes
cleanup_stale_sessions()
@app.on_event("startup")
async def startup():
asyncio.create_task(periodic_cleanup())2. N+1 Database Query Pattern# design_sessions.py:164-193Issue: Getting session by project requires full table scan without index. Recommendation: Add database index: # In the migration/model definition
Index('idx_design_session_project_status',
DesignSession.project_id,
DesignSession.status)3. Template Loading on Every Request# tools.py:33-40
def load_template(phase: str) -> str:
template_path = PROMPTS_DIR / template_file
with open(template_path, encoding="utf-8") as f:
return f.read()Issue: Templates are read from disk on every hydration. This is slow. Recommendation: Cache templates: from functools import lru_cache
@lru_cache(maxsize=10)
def load_template(phase: str) -> str:
# ... existing code📝 Code QualityGood Practices Observed:✅ Comprehensive input sanitization Suggestions:1. Add Type Safety to Tool State: # tools.py:64
_session_state: dict[str, dict[str, Any]] = {}Consider creating a TypedDict: from typing import TypedDict
class SessionState(TypedDict, total=False):
project: dict | None
entities: list[dict]
agents: list[dict]
phase: str
agent_capabilities: dict | None
goal_summary: dict | None
hydrated_prompts: dict[str, dict]
_created_at: datetime
_last_activity: datetime
_on_phase_change: Callable[[str], None]2. Frontend: Add Error Boundaries The React components should have error boundaries to prevent crashes: // Add to DesignAssistantPage.tsx
class DesignAssistantErrorBoundary extends React.Component {
// ... error boundary implementation
}3. Add Retry Logic for Stream Interruptions // useDesignSession.ts
const sendMessage = async (message: string) => {
let retries = 3;
while (retries > 0) {
try {
await streamMessage(message);
break;
} catch (err) {
retries--;
if (retries === 0) throw err;
await sleep(1000 * (4 - retries));
}
}
};🧪 Test CoverageStrong:
Missing:
Recommendation: Add integration test: # tests/integration/test_design_flow.py
@pytest.mark.asyncio
async def test_full_design_flow():
"""Test complete flow from goal understanding to blueprint."""
session = DesignAssistantSession("test-id", "proj-123")
await session.start()
# Phase 1: Goal understanding
events = []
async for event in session.send_message("I need to interview users for M&A due diligence"):
events.append(event)
assert session.state.phase == DesignPhase.AGENT_CONFIGURATION
# ... continue through phases📚 DocumentationGood:
Could Improve:
🎨 Architecture FeedbackExcellent Decisions:
Considerations:1. Consider Separating Concerns: 2. Tool State vs Session State: class DesignSessionState:
"""Single source of truth for session state."""
def __init__(self, session_id: str):
self.session_id = session_id
self.tool_state = get_session_state(session_id)
@property
def phase(self):
return self.tool_state["phase"]✅ Recommendations SummaryMust Fix Before Merge:
Should Fix:
Nice to Have:
🎉 ConclusionThis is a well-engineered PR that demonstrates strong understanding of both the Claude Agent SDK and production-ready software development. The security-first approach is commendable, and the architecture is clean and maintainable. The issues identified are mostly minor refinements. Once the critical security and performance items are addressed, this will be ready to merge. Overall Score: 8.5/10 ⭐ Great work! |
Summary
Changes
Test plan
🤖 Generated with Claude Code