feat(ui): show live and average tokens per second in shell UI#1868
feat(ui): show live and average tokens per second in shell UI#1868cdhcrs wants to merge 2 commits intoMoonshotAI:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fad57ec910
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| def cleanup(self, is_interrupt: bool) -> None: | ||
| """Cleanup the live view on step end or interruption.""" | ||
| self.flush_content() | ||
| self.flush_content(show_summary=not is_interrupt) |
There was a problem hiding this comment.
Emit summaries only at true turn boundaries
cleanup() now forwards show_summary=not is_interrupt, but cleanup() is also used by the StepBegin and SteerInput paths, not just when a turn finishes. In turns that continue across multiple steps (for example after steer injection), this can print a Composed ... tok/s summary before the turn actually ends, which violates the stated behavior and adds mid-turn noise. Summary emission should be restricted to explicit turn-boundary handling.
Useful? React with 👍 / 👎.
| seconds = max(1, int(elapsed)) | ||
| if tokens_int > 0: | ||
| rate = int(tokens_int / seconds) | ||
| return rate if rate > 0 else None |
There was a problem hiding this comment.
Use fractional elapsed time for TPS calculation
The TPS calculation floors elapsed seconds via int(elapsed), which makes rates materially inaccurate once elapsed time is between integer boundaries (e.g. 100 tokens over 1.9s is reported as 100 tok/s instead of about 52). Since this function powers the new average throughput summary, users can be shown inflated numbers; compute with the real elapsed float (clamped away from zero) and round only for display.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 89823b65c3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| match msg: | ||
| case TurnBegin(): | ||
| self.flush_content() | ||
| self.flush_content(show_summary=True) |
There was a problem hiding this comment.
Emit end-of-turn TPS summary in non-interactive live view
This summary trigger is wired to TurnBegin, but in _LiveView a run starts with no active content block, so flush_content(show_summary=True) is effectively a no-op there. I checked _LiveView.visualize_loop and its shutdown path still ends with cleanup(is_interrupt=False), which calls flush_content() without summaries; in non-interactive shell usage (visualize(..., prompt_session=None), including replay), the new average TPS line is therefore never shown at turn end.
Useful? React with 👍 / 👎.
89823b6 to
3607d3e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3607d3e489
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| block = _ContentBlock(is_think=False) | ||
| block._start_time = time.monotonic() - 2.0 | ||
| block._token_count = 100.0 | ||
| assert block._average_tps() == 50 |
There was a problem hiding this comment.
Stabilize average TPS assertion against monotonic drift
This new assertion is effectively brittle on high-resolution clocks: block._start_time = time.monotonic() - 2.0 followed by _average_tps() always measures slightly more than 2.0s, so int(100 / elapsed) commonly evaluates to 49 instead of 50. That makes the test fail (or become platform-dependent) even when the implementation is correct; use a frozen/monkeypatched clock or assert a range/tolerance.
Useful? React with 👍 / 👎.
| match msg: | ||
| case TurnBegin(): | ||
| self.flush_content() | ||
| self.flush_content(show_summary=True) |
There was a problem hiding this comment.
Emit TPS summary on base live-view shutdown
The summary trigger was moved to TurnBegin, but the base _LiveView shutdown path still calls cleanup(is_interrupt=False) (which flushes without summaries), so the final turn in non-interactive runs never prints the new average TPS line. Fresh evidence: replay_recent_history feeds visualize() event streams that do not include TurnBegin events for each replayed turn (replay.py builds turns by consuming TurnBegin as a delimiter), so this path reliably skips summaries.
Useful? React with 👍 / 👎.
4f70eb2 to
3077a78
Compare
3077a78 to
f65f7b0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f65f7b09b8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| summary = block.compose_summary() | ||
| if summary is not None: | ||
| console.print(summary) |
There was a problem hiding this comment.
Aggregate TPS across the full turn before summarizing
The summary path now emits metrics from only self._current_content_block, so in turns that produce multiple composing blocks (for example text before/after a tool call, think↔compose transitions, or steer-triggered cleanup()), earlier block stats are flushed and dropped before the end-of-turn print. As a result, the new Composed ... tok/s line can report only the last segment rather than the whole turn, which makes the advertised average throughput misleading in common multi-step agent runs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Thanks for the suggestion. This is intentional per-segment behavior. The Composed ... tok/s summary reflects the throughput of a single continuous composing stream, which matches the live spinner's real-time rate. Aggregating across tool calls, thinking pauses, or steer injections would include idle time and produce a misleadingly low number that doesn't represent the actual streaming speed users experience.
Description
Show live tokens per second during streaming and display the average TPS at turn end in the shell UI.
Checklist
make gen-changelogto update the changelog.make gen-docsto update the user documentation.