Skip to content

feat: Claude Agent SDK Design Assistant with editable prompts#47

Merged
mathaix merged 12 commits intomainfrom
feature/claude-agent-sdk-refactor
Dec 22, 2025
Merged

feat: Claude Agent SDK Design Assistant with editable prompts#47
mathaix merged 12 commits intomainfrom
feature/claude-agent-sdk-refactor

Conversation

@mathaix
Copy link
Copy Markdown
Owner

@mathaix mathaix commented Dec 21, 2025

Summary

  • Implements Design Assistant using Claude Agent SDK for 3-phase blueprint creation
  • Adds editable PromptEditor component allowing users to review/edit generated system prompts
  • Includes MCP tools for blueprint manipulation (ask, entity, agent, project, preview, prompt_editor)
  • Adds DebugPanel for development visibility into agent tool calls and state transitions
  • Blueprint state persists to database via DesignSession.blueprint_state column

Changes

  • Backend: prompt_editor tool, Phase 3 stepped flow, interview_orchestrator prompt
  • Frontend: PromptEditor, DebugPanel, ChatMessage with onPromptSave callback
  • Types: PromptEditorUIComponent, updated AG-UI event handling

Test plan

  • Start design session from project page
  • Complete Phase 1 (goal understanding)
  • Complete Phase 2 (agent configuration)
  • Complete Phase 3 (blueprint design) with prompt editing
  • Verify blueprint persists across session resume

🤖 Generated with Claude Code

Mathew and others added 11 commits December 21, 2025 04:47
Comprehensive design document for refactoring Clara's Design Assistant
from Pydantic AI to Claude Agent SDK with AG-UI integration.

Key aspects covered:
- Architecture comparison (current vs proposed)
- Claude Agent SDK concepts (query, subagents, custom tools, hooks)
- AG-UI integration for dynamic UI components
- Session state management
- Implementation plan (5 phases)
- File structure for new architecture
- Open questions for discussion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add claude-agent-sdk dependency for agent orchestration
- Create DesignAssistantSession using ClaudeSDKClient with subagents
- Define subagents: domain-expert, rubric-designer, agent-configurator
- Add prompts for architect and subagent roles
- Create API endpoints for design sessions with SSE streaming
- Register design sessions router in main.py with cleanup on shutdown

This replaces the Pydantic AI approach with Claude Agent SDK
for the Design Assistant, enabling:
- Native subagent delegation via AgentDefinition
- Pre/Post tool hooks for observability
- Async streaming responses via SSE

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Create tools.py with 6 custom MCP tools:
  - mcp__clara__project: Set project context
  - mcp__clara__entity: Add entity types to extract
  - mcp__clara__agent: Configure interview agents
  - mcp__clara__ask: Present interactive options to user
  - mcp__clara__phase: Transition design phases
  - mcp__clara__preview: Get current blueprint state

- Update design_assistant.py to register MCP tools
- Update architect prompt with tool usage guidelines
- In-memory state storage (database persistence to follow)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix tool return format to use {"content": [{"type": "text", "text": "..."}]}
- Fix hook callback signatures to match (input_data, tool_use_id, context)
- End-to-end tested with real Anthropic API - tools work correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implements the complete 3-phase design assistant with orchestrator pattern:
- Phase 1: Goal Understanding - discovers user project through conversation
- Phase 2: Agent Configuration - configures specialist interview agent
- Phase 3: Blueprint Design - creates interview blueprint with entities

Backend changes:
- Add phase-specific prompt templates with {{placeholder}} hydration
- Add MCP tools: hydrate_phase2, hydrate_phase3, get_prompt with session_id
- Add DesignSessionPrompt model for persisting hydrated prompts
- Update architect prompt with phase transition protocol
- Remove deprecated agent_configurator, domain_expert, rubric_designer prompts

Frontend changes:
- Add DesignAssistantPage with chat interface and blueprint sidebar
- Add components: ChatMessage, ChatInput, OptionCards, AgentConfiguredCard
- Add useDesignSession hook for SSE streaming and state management
- Add design-sessions API client
- Add routing and navigation from project detail

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…tions

Implements database persistence for design sessions so users can resume
conversations where they left off. Sessions now store conversation history,
phase state, blueprint state, goal summary, and agent capabilities.

- Add DesignSession database model with JSON columns for flexible state storage
- Update API to create-or-resume pattern: returns existing active session if one exists
- Add restore_session method to DesignAssistantManager for in-memory state restoration
- Update frontend hook to detect resumed sessions and restore UI state
- Persist assistant responses and state after SSE streaming completes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replaces the simpler phase 2 prompt with a more sophisticated 4-stage
approach that guides the LLM through explicit thinking stages:

- Stage 1: Initial thinking and scratchpad work
- Stage 2: Structured analysis with explicit requirement mapping
- Stage 3: Draft and validate JSON in thinking block
- Stage 4: Final polished JSON output in <json_output> tags

This should produce more thorough and well-reasoned agent configurations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…back loop

Replaces the phase 3 prompt with an improved version featuring:

- Structured analysis process with explicit <analysis> tags in thinking
- Systematic goal clarity assessment across 5 dimensions
- One clarifying question at a time (vs 3-5 at once)
- Draft blueprint with explicit feedback request before finalizing
- Clear output formats with XML tags for consistent parsing
- Iterative design pattern: draft → feedback → iterate → finalize

This encourages more thorough reasoning and collaborative blueprint design.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Renames and rewrites the orchestrator prompt with clearer architecture:

- Clarifies this is a single agent with phase-specific prompts (not subagents)
- Adds visual flow diagram showing data flow between phases
- Documents each phase's purpose, mode (interactive/automatic/iterative), and outputs
- Explains why Phase 2 exists (specialist persona primes Phase 3)
- Adds session persistence documentation
- Lists key principles for the agent's behavior

Updates code references in design_assistant.py and docs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Phase 3 was jumping straight to building the complete blueprint without
user input. Now uses a 3-step approach with mandatory confirmations:

Step 1: Propose entity types → wait for confirmation
Step 2: Propose agent configuration → wait for confirmation
Step 3: Build complete blueprint with all details

Key changes:
- Explicit "Wait for user confirmation before proceeding" at each step
- Example interaction flow showing the back-and-forth
- Clarifying questions encouraged even when goal seems clear
- Tools only used AFTER user confirms each section

This balances Phase 1's extensive discovery with Phase 3's design work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
… 3 flow

- Add PromptEditor component for reviewing/editing generated system prompts
- Add prompt_editor MCP tool to display prompts for user editing
- Update Phase 3 template with stepped blueprint design flow
- Add DebugPanel component for development visibility
- Improve OptionCards with multi-select support and clearer UX
- Update design-session types with PromptEditorUIComponent
- Wire up onPromptSave callback to send edited prompts back to backend

Blueprint persistence already works via DesignSession.blueprint_state column.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Dec 21, 2025

Test comment

@claude
Copy link
Copy Markdown

claude bot commented Dec 21, 2025

Pull Request Review: Claude Agent SDK Design Assistant

Overview

This is a significant architectural change migrating from Pydantic AI to Claude Agent SDK for the Design Assistant. The implementation is well-structured with a 3-phase conversation flow and introduces interactive prompt editing capabilities. Overall, this is solid work with clear separation of concerns.


Strengths

Architecture & Design

  1. Clean MCP Tool Architecture: The custom MCP tools in tools.py are well-organized with clear schemas and proper session-scoped state management
  2. Phase-Based Flow: The 3-phase approach provides good structure
  3. State Synchronization: Good pattern for syncing between tool state and session state
  4. Session Persistence: Proper handling of session restoration from database with context injection
  5. Separation of Concerns: Backend agent logic cleanly separated from frontend UI components

Code Quality

  • Type safety with Pydantic schemas for MCP tools
  • Comprehensive logging throughout
  • Clear docstrings explaining purpose
  • Good use of async/await patterns

@claude
Copy link
Copy Markdown

claude bot commented Dec 21, 2025

Issues & Concerns

1. Security: In-Memory State Storage (High Priority)

Location: src/backend/clara/agents/tools.py:52-72

Issue: Session state is stored in a global in-memory dictionary. This has several problems:

  • Memory Leak: State is never cleaned up except when clear_session_state() is explicitly called
  • Race Conditions: No locking mechanism for concurrent access
  • Scaling: Won't work in multi-process deployments
  • Session Hijacking Risk: No validation that session_id belongs to the requesting user

Recommendation: Add session ownership validation and consider Redis for production

2. Missing Input Validation (Medium Priority)

Location: src/backend/clara/agents/tools.py:285-360

The MCP tools accept user input without validation:

  • No length limits on strings (prompt injection risk)
  • No sanitization of entity/agent names
  • Array inputs not bounded (could cause DoS with large arrays)

Recommendation: Add validation using Pydantic Field constraints

3. Error Handling Gaps (Medium Priority)

Location: src/backend/clara/api/design_sessions.py:268-320

The SSE event generator catches exceptions but doesn't properly clean up or notify users when database persistence fails.

Recommendation: Add proper error recovery and user notification

4. Template Placeholder Injection Risk (Low-Medium Priority)

Location: src/backend/clara/agents/tools.py:37-48

If user input contains prompt injection patterns, they'll be directly inserted into agent prompts without sanitization.

Recommendation: Add sanitization or use a safer templating engine

5. Testing Coverage (High Priority)

Issue: No test files for the new Design Assistant functionality are included in this PR.

Expected but missing:

  • tests/unit/test_design_assistant.py
  • tests/unit/test_clara_tools.py
  • tests/integration/test_design_sessions_api.py

Recommendation: Add comprehensive tests covering MCP tool execution, session restoration, phase transitions, error handling, and SSE streaming

@claude
Copy link
Copy Markdown

claude bot commented Dec 21, 2025

Best Practice Recommendations

  1. Add Type Hints to MCP Tools - Use Pydantic models instead of dict[str, Any]
  2. Implement Circuit Breaker - Add timeout/retry limits for Claude API calls
  3. Add Observability - Per CLAUDE.md, Logfire integration is part of the stack but not used here
  4. Document API Contract - The SSE events should be documented
  5. Frontend: Debounce Prompt Editor - Could benefit from autosave

Code Quality Metrics

Aspect Score Notes
Architecture 8/10 Clean separation, good use of MCP
Security 5/10 Missing validation, in-memory state risks
Error Handling 6/10 Basic coverage, missing recovery
Testing 2/10 No tests included
Documentation 7/10 Good inline docs, missing API docs
Performance 7/10 Reasonable, some optimization opportunities

Action Items Before Merge

Must Fix (Blocking)

  1. Add comprehensive test coverage
  2. Implement input validation on all MCP tools
  3. Add session ownership validation
  4. Handle database persistence failures properly

Should Fix (High Value)

  1. Replace in-memory state with Redis or add proper cleanup
  2. Add sanitization to template hydration
  3. Improve error recovery in SSE streaming
  4. Document SSE event contract

@claude
Copy link
Copy Markdown

claude bot commented Dec 21, 2025

Specific File Comments

src/backend/clara/agents/design_assistant.py

  • Line 157-196: Excellent restoration context building!
  • Line 223-236: Phase 3 prompt has hardcoded instructions - consider moving to template file
  • Line 447: query() is async but not awaited? Check if this is correct per SDK docs

src/backend/clara/agents/tools.py

  • Lines 278-773: Consider splitting this into multiple files (one per tool category)
  • Line 676-718: Auto-hydration fallback is clever but adds complexity - ensure it's well-tested

src/backend/clara/api/design_sessions.py

  • Line 266: Should db.commit() before streaming starts to ensure message is persisted?
  • Line 290: Opening new DB session here - why not pass the existing one?

src/frontend/src/components/design-assistant/PromptEditor.tsx

  • Clean implementation! Consider adding keyboard shortcuts (Cmd+S to save)
  • Line 24-28: Once submitted, can't edit again - is this intentional? Might want to allow re-editing

Summary

This is quality work that significantly advances the Design Assistant functionality. The architecture is sound and the code is generally well-written. However, there are important security and reliability concerns that should be addressed before merging to production:

  1. Add test coverage (critical gap)
  2. Implement proper input validation (security)
  3. Fix in-memory state management (scaling/security)
  4. Improve error handling (reliability)

Once these are addressed, this will be an excellent addition to the codebase.

Recommendation: Request changes for items 1-4 above, then approve.


Review completed following CLAUDE.md conventions and security best practices.

- Add InputSanitizer with truncation, Unicode normalization, and template
  injection prevention (escapes {{ and }} markers)
- Add TTL-based session state cleanup to prevent memory leaks (60min default)
- Add input validation on MCP tools (project, entity, agent tools)
- Improve SSE streaming error handling with persistence failure notifications
- Add comprehensive unit tests (24 tests covering session state, template
  hydration, sanitization, and template loading)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Dec 22, 2025

Code Review: Claude Agent SDK Design Assistant

This is a significant and well-architected implementation! The code demonstrates strong engineering practices and thoughtful design. Below is my detailed feedback:


🎯 Overall Assessment

Strengths:

  • Clean architecture with proper separation of concerns
  • Excellent security practices with comprehensive input sanitization
  • Good observability through debug panel and logging
  • Well-structured 3-phase workflow with subagents
  • Proper state management and session restoration
  • Good test coverage for critical paths

Areas for Improvement:

  • Some potential performance issues
  • Missing error handling in a few places
  • Security concerns with template hydration
  • Documentation could be enhanced

🔒 Security Concerns

HIGH PRIORITY: Template Injection Risk

# tools.py:208-210
value = value.replace("{{", "{ {").replace("}}", "} }")

Issue: This sanitization is insufficient for preventing template injection. An attacker could craft input like:

  • { {malicious_var} } (already has spaces)
  • Nested placeholders that could execute after sanitization

Recommendation:

# Use allowlist approach instead
ALLOWED_PLACEHOLDERS = {"goal", "role", "capabilities", "expertise_areas", "interaction_style"}

def sanitize_template_value(cls, value: str | None) -> str:
    if not value:
        return ""
    # Remove ALL template markers, not just specific ones
    # Only allow them in predefined safe contexts
    value = re.sub(r'\{\{[^}]*\}\}', '', value)
    return value

MEDIUM: Missing Input Validation

# design_sessions.py:56
class SendMessageRequest(BaseModel):
    message: str  # No length limit, could cause DoS

Recommendation: Add validation:

from pydantic import Field, validator

class SendMessageRequest(BaseModel):
    message: str = Field(..., max_length=10000)
    
    @validator('message')
    def validate_message(cls, v):
        if InputSanitizer.detect_injection_attempt(v):
            logger.warning("Potential injection attempt detected")
        return InputSanitizer.sanitize_message(v)

LOW: Prompt Injection Detection Could Be Bypassed

The INJECTION_PATTERNS in security/init.py:24-33 can be easily bypassed with unicode tricks, ROT13, or other encoding methods.

Recommendation: Consider using a more robust detection library or adding character normalization.


🐛 Potential Bugs

1. Race Condition in Session State

# design_assistant.py:411
self._sync_state_from_tools()

Issue: The state sync happens asynchronously while tool calls may still be executing. If a tool modifies state during the sync, you could get inconsistent state.

Recommendation: Use a lock:

def __init__(self, session_id: str, project_id: str):
    self._state_lock = asyncio.Lock()
    
async def _sync_state_from_tools(self) -> None:
    async with self._state_lock:
        tool_state = get_session_state(self.session_id)
        # ... rest of sync logic

2. Missing Database Commit

# design_sessions.py:220-245 (stream endpoint)

Issue: The stream endpoint saves messages to the database but I don't see an explicit commit. With autocommit=False, this could lose data.

Recommendation: Add explicit commit or ensure transaction is committed:

finally:
    await db.commit()

3. Memory Leak in Event Queue

# design_assistant.py:100
self._response_queue: asyncio.Queue[AGUIEvent] = asyncio.Queue()

Issue: If events are emitted faster than consumed (e.g., network slow), the queue could grow unbounded.

Recommendation:

self._response_queue: asyncio.Queue[AGUIEvent] = asyncio.Queue(maxsize=100)

4. Frontend: Unbounded Array Growth

// useDesignSession.ts:101
setDebugEvents((prev) => [...prev, event]);

Issue: Debug events accumulate indefinitely in memory. Long sessions could cause browser to slow down.

Recommendation: Limit size:

setDebugEvents((prev) => {
  const updated = [...prev, event];
  // Keep only last 100 events
  return updated.slice(-100);
});

Performance Considerations

1. Session State Cleanup Not Triggered

# tools.py:92-110
def cleanup_stale_sessions() -> int:

Issue: This function is defined but never called. Stale sessions will accumulate in memory indefinitely.

Recommendation: Add a background task:

# main.py
from fastapi import BackgroundTasks
import asyncio

async def periodic_cleanup():
    while True:
        await asyncio.sleep(300)  # Every 5 minutes
        cleanup_stale_sessions()

@app.on_event("startup")
async def startup():
    asyncio.create_task(periodic_cleanup())

2. N+1 Database Query Pattern

# design_sessions.py:164-193

Issue: Getting session by project requires full table scan without index.

Recommendation: Add database index:

# In the migration/model definition
Index('idx_design_session_project_status', 
      DesignSession.project_id, 
      DesignSession.status)

3. Template Loading on Every Request

# tools.py:33-40
def load_template(phase: str) -> str:
    template_path = PROMPTS_DIR / template_file
    with open(template_path, encoding="utf-8") as f:
        return f.read()

Issue: Templates are read from disk on every hydration. This is slow.

Recommendation: Cache templates:

from functools import lru_cache

@lru_cache(maxsize=10)
def load_template(phase: str) -> str:
    # ... existing code

📝 Code Quality

Good Practices Observed:

✅ Comprehensive input sanitization
✅ Type hints throughout
✅ Proper logging
✅ Good separation of concerns
✅ Session restoration logic
✅ Unit tests for critical paths

Suggestions:

1. Add Type Safety to Tool State:

# tools.py:64
_session_state: dict[str, dict[str, Any]] = {}

Consider creating a TypedDict:

from typing import TypedDict

class SessionState(TypedDict, total=False):
    project: dict | None
    entities: list[dict]
    agents: list[dict]
    phase: str
    agent_capabilities: dict | None
    goal_summary: dict | None
    hydrated_prompts: dict[str, dict]
    _created_at: datetime
    _last_activity: datetime
    _on_phase_change: Callable[[str], None]

2. Frontend: Add Error Boundaries

The React components should have error boundaries to prevent crashes:

// Add to DesignAssistantPage.tsx
class DesignAssistantErrorBoundary extends React.Component {
  // ... error boundary implementation
}

3. Add Retry Logic for Stream Interruptions

// useDesignSession.ts
const sendMessage = async (message: string) => {
  let retries = 3;
  while (retries > 0) {
    try {
      await streamMessage(message);
      break;
    } catch (err) {
      retries--;
      if (retries === 0) throw err;
      await sleep(1000 * (4 - retries));
    }
  }
};

🧪 Test Coverage

Strong:

  • Unit tests for session state management ✅
  • Template hydration tests ✅
  • Input sanitization tests ✅

Missing:

  • Integration tests for the full design flow
  • Tests for session restoration
  • Tests for error conditions (network failures, tool errors)
  • Frontend component tests

Recommendation: Add integration test:

# tests/integration/test_design_flow.py
@pytest.mark.asyncio
async def test_full_design_flow():
    """Test complete flow from goal understanding to blueprint."""
    session = DesignAssistantSession("test-id", "proj-123")
    await session.start()
    
    # Phase 1: Goal understanding
    events = []
    async for event in session.send_message("I need to interview users for M&A due diligence"):
        events.append(event)
    
    assert session.state.phase == DesignPhase.AGENT_CONFIGURATION
    # ... continue through phases

📚 Documentation

Good:

  • Comprehensive design docs in docs/
  • Inline comments explaining complex logic
  • Docstrings on most functions

Could Improve:

  • Add API documentation (OpenAPI/Swagger descriptions)
  • Add architecture diagram showing agent communication flow
  • Document the state machine transitions between phases
  • Add troubleshooting guide for common issues

🎨 Architecture Feedback

Excellent Decisions:

  1. Subagent Pattern: Using Task tool with phase-specific subagents is elegant and maintainable
  2. MCP Tool Binding: Binding tools to session ID is a clean way to manage state
  3. Session Restoration: Context prepending approach is smart for resuming conversations
  4. Hook-based Events: Using pre/post tool hooks for UI events is clean separation

Considerations:

1. Consider Separating Concerns:
Currently design_assistant.py handles both orchestration AND state management. Consider:

agents/
  ├── design_assistant.py  # Orchestration only
  ├── session_manager.py   # Session lifecycle
  └── state_manager.py     # State sync logic

2. Tool State vs Session State:
You have two sources of truth (tool state in tools.py, session state in design_assistant.py). This creates sync complexity. Consider unifying:

class DesignSessionState:
    """Single source of truth for session state."""
    def __init__(self, session_id: str):
        self.session_id = session_id
        self.tool_state = get_session_state(session_id)
    
    @property
    def phase(self):
        return self.tool_state["phase"]

Recommendations Summary

Must Fix Before Merge:

  1. ⚠️ Strengthen template injection sanitization
  2. ⚠️ Add database commit in stream endpoint
  3. ⚠️ Add input validation to SendMessageRequest
  4. ⚠️ Call cleanup_stale_sessions() periodically
  5. ⚠️ Limit debug event array size in frontend

Should Fix:

  1. Add lock for state synchronization
  2. Add bounded queue for response events
  3. Cache template loading
  4. Add database index for project_id + status
  5. Add error boundaries to React components

Nice to Have:

  1. Integration tests for full flow
  2. TypedDict for session state
  3. Retry logic for stream interruptions
  4. Architecture diagrams
  5. API documentation

🎉 Conclusion

This is a well-engineered PR that demonstrates strong understanding of both the Claude Agent SDK and production-ready software development. The security-first approach is commendable, and the architecture is clean and maintainable.

The issues identified are mostly minor refinements. Once the critical security and performance items are addressed, this will be ready to merge.

Overall Score: 8.5/10

Great work!

@mathaix mathaix merged commit ca1b7e7 into main Dec 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant