Skip to content

Adaptive Architect: Simulation Testing (Phase 2) #15

@mathaix

Description

@mathaix

User Story

As a manager
I want to test my agents before deploying them to real interviewees
So that I can verify agent behavior and catch issues early

Priority: Phase 2

Simulation testing is deferred to Phase 2 to focus on core architecture first.

Acceptance Criteria

Manual Role-Play Testing

  • Manager can "chat with" any designed agent as if they were an interviewee
  • Real-time conversation in a test sandbox
  • See how agent asks questions, responds, extracts data
  • Observe exit conditions being evaluated
  • Test different interviewee behaviors (cooperative, vague, reluctant)

Automated Simulation Scenarios (Phase 3)

  • System generates synthetic interviewee personas
  • Personas based on domain and rubric:
    • Cooperative Expert: Full knowledge, eager to share
    • Vague Responder: Brief/unclear answers, needs probing
    • Reluctant Participant: Hesitant, needs rapport building
    • Time-Pressed: Wants to finish quickly
  • Run automated simulations against agent
  • Compare extracted data against ground truth
  • Calculate coverage metrics

Simulation Results

  • Show conversation transcript
  • Show extracted entities and rubric field coverage
  • Highlight what was captured vs. missed
  • Flag potential issues (e.g., agent stuck in loop, poor coverage)

API Requirements

  • POST /api/v1/design-sessions/{id}/simulate - Run automated simulation
  • WebSocket /api/v1/design-sessions/{id}/roleplay/{agent_id} - Manual role-play

Technical Notes

  • Manual role-play uses same Interview Agent infrastructure
  • Automated simulations use LLM to play interviewee role
  • Ground truth stored in simulation scenario
  • See TESTING-EVALUATION-FRAMEWORK.md for evaluation criteria

Definition of Done

  • Manual role-play working
  • Simulation results displayed
  • Coverage metrics calculated
  • Code reviewed and merged

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions