[Epic] Large Document Council Reliability + Cost Optimization

## Outcome
Deliver a reliable and understandable large-document council experience with clear live progress, robust failure handling, and materially lower runtime/token cost.

## Scope
- Keep council watch alive until terminal completion.
- Expose explicit run stages (queued/chunking/analysis/critique/synthesis/completed/failed).
- Isolate per-agent failures so one failure does not abort the whole run.
- Reduce large-doc cost and duration via map-reduce style context pipeline.
- Add SLOs/metrics and rollout guardrails.

## Acceptance Criteria
- Users always see progress or terminal state; no ambiguous idle stop behavior.
- Council runs persist run metadata and failure reason codes.
- Full-run failure rate reduced with per-agent fault isolation.
- Cost/runtime targets defined and measured in production telemetry.
- Rollout has feature flags and rollback plan.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epic] Large Document Council Reliability + Cost Optimization #558

Outcome

Scope

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Epic] Large Document Council Reliability + Cost Optimization #558

Description

Outcome

Scope

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions