fix(cli): prevent UTF-8 panic in MCP tool description truncation (fixes #1947)#2008
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors string truncation logic into a dedicated truncate_description function in src/cli/mcp.rs and updates src/config/channels.rs to use character-safe slicing for error messages. These changes aim to prevent panics when handling multi-byte UTF-8 characters. Comprehensive unit tests were added to verify the truncation logic across various character sets. The review feedback suggests an alternative manual implementation for finding character boundaries to ensure robustness.
| /// Truncate a description to at most 57 display chars, appending "..." if needed. | ||
| /// Uses char-safe boundary to avoid panicking on multi-byte UTF-8. | ||
| fn truncate_description(s: &str) -> String { | ||
| if s.len() <= 60 { | ||
| return s.to_string(); | ||
| } | ||
| let end = crate::util::floor_char_boundary(s, 57); | ||
| format!("{}...", &s[..end]) | ||
| } |
There was a problem hiding this comment.
The implementation should use character-aware truncation to avoid panics. In Rust, s.len() returns byte length, and slicing at an arbitrary index can fail if it's not a character boundary. When truncating, walk backwards from the desired length until a valid character boundary is found using is_char_boundary. Also, ensure tool outputs are truncated for previews to prevent excessive resource usage. Magic numbers are kept inline to maintain readability as per repository guidelines.
| /// Truncate a description to at most 57 display chars, appending "..." if needed. | |
| /// Uses char-safe boundary to avoid panicking on multi-byte UTF-8. | |
| fn truncate_description(s: &str) -> String { | |
| if s.len() <= 60 { | |
| return s.to_string(); | |
| } | |
| let end = crate::util::floor_char_boundary(s, 57); | |
| format!("{}...", &s[..end]) | |
| } | |
| fn truncate_description(s: &str) -> String { | |
| if s.len() <= 60 { | |
| return s.to_string(); | |
| } | |
| let mut end = 57; | |
| while end > 0 && !s.is_char_boundary(end) { | |
| end -= 1; | |
| } | |
| format!("{}...", &s[..end]) | |
| } |
References
- When truncating a UTF-8 string at a byte boundary, walk backwards from the desired length until a valid character boundary is found using is_char_boundary to prevent panics.
- Always truncate tool output for previews or status updates to a reasonable maximum length.
- Avoid extracting magic numbers into constants if the values are self-documenting in context and creating many constants would reduce readability.
9bd5195 to
cad1f87
Compare
|
Hi @henrypark133 @serrrfirat — small bug fix for UTF-8 panic in |
|
@henrypark133 @serrrfirat @ilblackdragon Ready for review — fixes UTF-8 panic in MCP tool description truncation (#1947). Tests pass, fmt/clippy clean. |
cad1f87 to
7e1132c
Compare
|
@henrypark133 @serrrfirat @ilblackdragon Rebased onto latest staging. Small bug fix — same UTF-8 panic pattern as merged #1679. |
henrypark133
left a comment
There was a problem hiding this comment.
Review: UTF-8 panic fix (Risk: Low)
Clean, focused bug fix. LGTM.
Positives:
- Correctly uses the existing
floor_char_boundaryutility to prevent byte-index panics - Both fix sites addressed:
mcp.rs(inline slicing) andchannels.rs(scope truncation) - Excellent test coverage: ASCII, CJK, emoji, and mixed-boundary edge cases
- Same pattern as the merged #1679 fix
Convention notes:
- Doc comment on
truncate_descriptionsays "57 display chars" but the implementation operates on byte length (s.len() <= 60). For multi-byte text the threshold is applied differently than documented. Consider changing the comment to "57 bytes" for accuracy. Non-blocking.
nearai#1947) `&tool.description[..57]` panics when byte 57 is inside a multi-byte character (CJK = 3 bytes, emoji = 4 bytes). Replace with `floor_char_boundary()` which walks back to the nearest valid boundary. Also fixes the same pattern in `config/channels.rs` where `&scope[..32]` could panic on non-ASCII OAuth scopes. Adds 4 regression tests: ASCII, CJK, emoji, and mixed-boundary truncation. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
7e1132c to
1416388
Compare
Summary
&tool.description[..57]byte-index slice panic inironclaw mcp testwhen tool descriptions contain multi-byte UTF-8 characters (CJK, emoji)truncate_description()helper using existingfloor_char_boundary()fromutil.rsconfig/channels.rswhere&scope[..32]could panic on non-ASCII OAuth scopesChanges
src/cli/mcp.rs&tool.description[..57]withtruncate_description()that usesfloor_char_boundary()src/config/channels.rs&scope[..32]with&scope[..floor_char_boundary(scope, 32)]Test plan
cargo fmtcleancargo clippyzero warnings[..pattern — no other hardcoded byte-index slices on user-facing strings remain