Skip to content

fix(cli): prevent UTF-8 panic in MCP tool description truncation (fixes #1947)#2008

Merged
ilblackdragon merged 1 commit intonearai:stagingfrom
willamhou:fix/mcp-utf8-truncation-panic
Apr 19, 2026
Merged

fix(cli): prevent UTF-8 panic in MCP tool description truncation (fixes #1947)#2008
ilblackdragon merged 1 commit intonearai:stagingfrom
willamhou:fix/mcp-utf8-truncation-panic

Conversation

@willamhou
Copy link
Copy Markdown
Contributor

Summary

  • Fix &tool.description[..57] byte-index slice panic in ironclaw mcp test when tool descriptions contain multi-byte UTF-8 characters (CJK, emoji)
  • Extract truncate_description() helper using existing floor_char_boundary() from util.rs
  • Also fix same pattern in config/channels.rs where &scope[..32] could panic on non-ASCII OAuth scopes

Changes

src/cli/mcp.rs

  • Replace inline &tool.description[..57] with truncate_description() that uses floor_char_boundary()
  • Add 4 regression tests: ASCII, CJK, emoji, mixed-boundary truncation

src/config/channels.rs

  • Replace &scope[..32] with &scope[..floor_char_boundary(scope, 32)]

Test plan

  • All 11 mcp cli tests pass
  • cargo fmt clean
  • cargo clippy zero warnings
  • Grep for [.. pattern — no other hardcoded byte-index slices on user-facing strings remain
  • CI (needs maintainer workflow approval for fork PR)

@github-actions github-actions Bot added scope: channel/cli TUI / CLI channel size: M 50-199 changed lines risk: low Changes to docs, tests, or low-risk modules contributor: regular 2-5 merged PRs labels Apr 4, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors string truncation logic into a dedicated truncate_description function in src/cli/mcp.rs and updates src/config/channels.rs to use character-safe slicing for error messages. These changes aim to prevent panics when handling multi-byte UTF-8 characters. Comprehensive unit tests were added to verify the truncation logic across various character sets. The review feedback suggests an alternative manual implementation for finding character boundaries to ensure robustness.

Comment thread src/cli/mcp.rs
Comment on lines +639 to +647
/// Truncate a description to at most 57 display chars, appending "..." if needed.
/// Uses char-safe boundary to avoid panicking on multi-byte UTF-8.
fn truncate_description(s: &str) -> String {
if s.len() <= 60 {
return s.to_string();
}
let end = crate::util::floor_char_boundary(s, 57);
format!("{}...", &s[..end])
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation should use character-aware truncation to avoid panics. In Rust, s.len() returns byte length, and slicing at an arbitrary index can fail if it's not a character boundary. When truncating, walk backwards from the desired length until a valid character boundary is found using is_char_boundary. Also, ensure tool outputs are truncated for previews to prevent excessive resource usage. Magic numbers are kept inline to maintain readability as per repository guidelines.

Suggested change
/// Truncate a description to at most 57 display chars, appending "..." if needed.
/// Uses char-safe boundary to avoid panicking on multi-byte UTF-8.
fn truncate_description(s: &str) -> String {
if s.len() <= 60 {
return s.to_string();
}
let end = crate::util::floor_char_boundary(s, 57);
format!("{}...", &s[..end])
}
fn truncate_description(s: &str) -> String {
if s.len() <= 60 {
return s.to_string();
}
let mut end = 57;
while end > 0 && !s.is_char_boundary(end) {
end -= 1;
}
format!("{}...", &s[..end])
}
References
  1. When truncating a UTF-8 string at a byte boundary, walk backwards from the desired length until a valid character boundary is found using is_char_boundary to prevent panics.
  2. Always truncate tool output for previews or status updates to a reasonable maximum length.
  3. Avoid extracting magic numbers into constants if the values are self-documenting in context and creating many constants would reduce readability.

@willamhou willamhou force-pushed the fix/mcp-utf8-truncation-panic branch 2 times, most recently from 9bd5195 to cad1f87 Compare April 11, 2026 03:28
@willamhou
Copy link
Copy Markdown
Contributor Author

Hi @henrypark133 @serrrfirat — small bug fix for UTF-8 panic in ironclaw mcp test (same pattern as merged #1679). Rebased onto latest staging. Would appreciate a review. Thanks!

@willamhou
Copy link
Copy Markdown
Contributor Author

@henrypark133 @serrrfirat @ilblackdragon Ready for review — fixes UTF-8 panic in MCP tool description truncation (#1947). Tests pass, fmt/clippy clean.

@willamhou willamhou force-pushed the fix/mcp-utf8-truncation-panic branch from cad1f87 to 7e1132c Compare April 15, 2026 06:23
@willamhou
Copy link
Copy Markdown
Contributor Author

@henrypark133 @serrrfirat @ilblackdragon Rebased onto latest staging. Small bug fix — same UTF-8 panic pattern as merged #1679.

Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: UTF-8 panic fix (Risk: Low)

Clean, focused bug fix. LGTM.

Positives:

  • Correctly uses the existing floor_char_boundary utility to prevent byte-index panics
  • Both fix sites addressed: mcp.rs (inline slicing) and channels.rs (scope truncation)
  • Excellent test coverage: ASCII, CJK, emoji, and mixed-boundary edge cases
  • Same pattern as the merged #1679 fix

Convention notes:

  • Doc comment on truncate_description says "57 display chars" but the implementation operates on byte length (s.len() <= 60). For multi-byte text the threshold is applied differently than documented. Consider changing the comment to "57 bytes" for accuracy. Non-blocking.

nearai#1947)

`&tool.description[..57]` panics when byte 57 is inside a multi-byte
character (CJK = 3 bytes, emoji = 4 bytes). Replace with
`floor_char_boundary()` which walks back to the nearest valid boundary.

Also fixes the same pattern in `config/channels.rs` where
`&scope[..32]` could panic on non-ASCII OAuth scopes.

Adds 4 regression tests: ASCII, CJK, emoji, and mixed-boundary
truncation.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
@willamhou willamhou force-pushed the fix/mcp-utf8-truncation-panic branch from 7e1132c to 1416388 Compare April 18, 2026 09:31
@ilblackdragon ilblackdragon merged commit 90a6dad into nearai:staging Apr 19, 2026
15 checks passed
@henrypark133 henrypark133 mentioned this pull request Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: regular 2-5 merged PRs risk: low Changes to docs, tests, or low-risk modules scope: channel/cli TUI / CLI channel size: M 50-199 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants