Outcome
Deliver a reliable and understandable large-document council experience with clear live progress, robust failure handling, and materially lower runtime/token cost.
Scope
- Keep council watch alive until terminal completion.
- Expose explicit run stages (queued/chunking/analysis/critique/synthesis/completed/failed).
- Isolate per-agent failures so one failure does not abort the whole run.
- Reduce large-doc cost and duration via map-reduce style context pipeline.
- Add SLOs/metrics and rollout guardrails.
Acceptance Criteria
- Users always see progress or terminal state; no ambiguous idle stop behavior.
- Council runs persist run metadata and failure reason codes.
- Full-run failure rate reduced with per-agent fault isolation.
- Cost/runtime targets defined and measured in production telemetry.
- Rollout has feature flags and rollback plan.
Outcome
Deliver a reliable and understandable large-document council experience with clear live progress, robust failure handling, and materially lower runtime/token cost.
Scope
Acceptance Criteria