fix(llm): prevent UTF-8 panic in line_bounds() (fixes #1669) by willamhou · Pull Request #1679 · nearai/ironclaw

willamhou · 2026-03-26T12:43:28Z

Summary

Fix potential panic in line_bounds() (src/llm/reasoning.rs) when pos lands inside a multi-byte UTF-8 character
is_recoverable_tool_call_segment() calls line_bounds(text, end.saturating_sub(1)) which can step into the middle of a multi-byte char (e.g., emoji 🔥 is 4 bytes — subtracting 1 from the byte after it lands at byte 3 of the char)
Fix: clamp pos to text.len() and walk backward to the nearest is_char_boundary() before slicing
5 regression tests covering: mid-char positions, emoji before newline, at-length, and out-of-bounds

Also fixes pre-existing clippy unnecessary_sort_by warnings in web gateway handlers (zero-warning policy).

Test plan

cargo test -- tests::test_line_bounds — 5 new tests pass
cargo clippy --all --benches --tests --examples --all-features — zero warnings
cargo fmt --check — clean

Closes #1669

🤖 Generated with Claude Code

gemini-code-assist · 2026-03-26T12:43:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial fix to enhance the robustness of text processing, specifically addressing a potential panic in the line_bounds() function when dealing with multi-byte UTF-8 characters. By ensuring that string slicing always occurs at valid character boundaries, the change prevents crashes in scenarios involving complex text, such as emojis. The update also includes minor code quality improvements by resolving Clippy warnings, contributing to a cleaner and more stable codebase.

Highlights

UTF-8 Safety Fix: Prevented a potential panic in the line_bounds() function within src/llm/reasoning.rs when the input position falls inside a multi-byte UTF-8 character.
Robust Position Handling: Implemented logic to clamp the input position to text.len() and walk backward to the nearest character boundary, ensuring safe string slicing.
Comprehensive Testing: Added five new regression tests specifically for line_bounds() to cover various edge cases, including mid-character positions, emojis before newlines, and out-of-bounds positions.
Code Quality Improvement: Addressed and resolved pre-existing Clippy unnecessary_sort_by warnings in web gateway handlers (src/channels/web/handlers/chat.rs and src/channels/web/server.rs).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors thread sorting logic in chat_threads_handler to use sort_by_key with std::cmp::Reverse for improved idiomatic Rust. More significantly, it enhances the line_bounds function to be UTF-8 safe by correctly handling positions within multi-byte characters and clamping out-of-bounds inputs, preventing potential panics. Comprehensive unit tests were added to validate these changes. The review suggests strengthening two of these new tests, test_line_bounds_mid_multibyte_char and test_line_bounds_emoji_before_newline, by adding explicit assertions for the expected return values to confirm correct boundary calculation, rather than just verifying non-panicking behavior.

gemini-code-assist · 2026-03-26T12:45:56Z

+        let _ = line_bounds(text, 1); // would panic before fix
+        let _ = line_bounds(text, 2);
+        let _ = line_bounds(text, 3);


This test is great for ensuring the function doesn't panic. To make it more robust, consider also asserting the expected return value. This would verify that line_bounds not only avoids panicking but also correctly snaps to the valid character boundary as intended.

Suggested change

let _ = line_bounds(text, 1); // would panic before fix

let _ = line_bounds(text, 2);

let _ = line_bounds(text, 3);

assert_eq!(line_bounds(text, 1), (0, 4));

assert_eq!(line_bounds(text, 2), (0, 4));

assert_eq!(line_bounds(text, 3), (0, 4));

References

When handling UTF-8 strings, especially when dealing with truncation or boundary calculations, it's crucial to use character-aware methods and ensure that operations snap to valid character boundaries to prevent panics and ensure correctness.

gemini-code-assist · 2026-03-26T12:45:56Z

+        let text = "Result: 🔥\n<tool_call>";
+        let newline_pos = text.find('\n').unwrap();
+        // saturating_sub(1) lands inside '🔥' (byte 11 → 10, but char ends at 12)
+        let _ = line_bounds(text, newline_pos.saturating_sub(1));


Similar to the test_line_bounds_mid_multibyte_char test, this test would be stronger if it asserted the expected output. This ensures the function is not just avoiding a panic but is also correctly calculating the line bounds when the position is inside a multi-byte character.

Suggested change

let _ = line_bounds(text, newline_pos.saturating_sub(1));

assert_eq!(line_bounds(text, newline_pos.saturating_sub(1)), (0, 12));

References

When handling UTF-8 strings, especially when dealing with truncation or boundary calculations, it's crucial to use character-aware methods and ensure that operations snap to valid character boundaries to prevent panics and ensure correctness.

serrrfirat

Clean, focused fix. The backward walk to is_char_boundary is the standard Rust idiom for this class of bug, and the 5 regression tests cover the key scenarios well (mid-char positions, emoji before newline triggering the actual saturating_sub(1) path, pos beyond length, baseline ASCII).

The clippy sort_by → sort_by_key(Reverse(...)) change is correct (DateTime<Utc> implements Ord + Copy).

Re: gemini-bot comments — both are false positives. The tests at lines 2336 and 2349 already assert exact expected return values (0, 4) and (0, 12). The bot's "suggestions" are identical to the existing code.

One cosmetic nit (non-blocking): the comment on line 1388 says "at most 3 bytes for UTF-8" — this refers to the walk-back distance (up to 3 steps for a 4-byte char), not the max char width. Could be slightly clearer, but it's technically correct.

LGTM ✅

serrrfirat

Clean, focused fix. The backward walk to is_char_boundary is the standard Rust idiom for this class of bug, and the 5 regression tests cover the key scenarios well (mid-char positions, emoji before newline triggering the actual saturating_sub(1) path, pos beyond length, baseline ASCII).

The clippy sort_by to sort_by_key(Reverse(...)) change is correct (DateTime<Utc> implements Ord + Copy).

Re: gemini-bot comments — both are false positives. The tests at lines 2336 and 2349 already assert exact expected return values (0, 4) and (0, 12). The bot's "suggestions" are identical to the existing code.

One cosmetic nit (non-blocking): the comment on line 1388 says "at most 3 bytes for UTF-8" — this refers to the walk-back distance (up to 3 steps for a 4-byte char), not the max char width. Could be slightly clearer, but it's technically correct.

LGTM

zmanian

Review: fix(llm): prevent UTF-8 panic in line_bounds()

Correct fix, well-tested. The is_char_boundary() walk-back pattern follows project conventions.

Findings

Low (nit): The inline walk-back loop duplicates crate::util::floor_char_boundary(). Could simplify to:

let safe = crate::util::floor_char_boundary(text, pos);

Not blocking -- the inline version is correct and the comment is helpful.

Medium (out of scope): src/llm/nearai_chat.rs:454 has the same unsafe pattern:

&response_text[..response_text.len().min(300)]

This operates on HTTP response text from an external API. Should use floor_char_boundary. Worth a follow-up issue.

Verified

Core fix correctly prevents UTF-8 panic via is_char_boundary() walk-back
5 regression tests covering ASCII, boundary at len, mid-multibyte (positions 1/2/3 of 4-byte emoji), emoji-before-newline (exact crash scenario from #1669), and pos-beyond-len
Clippy sort_by -> sort_by_key(Reverse(...)) fixes are correct

Approve. Ship it.

`line_bounds()` used `text[..pos]` slicing which panics when `pos` lands inside a multi-byte UTF-8 character. This happens when `end.saturating_sub(1)` in `is_recoverable_tool_call_segment()` steps back into a multi-byte char like emoji. Fix: clamp `pos` to `text.len()` and walk backward to the nearest char boundary before slicing. Add 5 regression tests covering mid-char positions, emoji boundaries, and out-of-bounds pos. Also fix pre-existing clippy `unnecessary_sort_by` warnings in web gateway handlers. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>

Address Gemini review: strengthen regression tests to verify correct return values (not just absence of panic) when pos lands mid-char. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>

willamhou · 2026-03-27T06:45:51Z

Hi @zmanian, I've rebased this branch onto the latest staging. Could you please approve the workflow run so CI can execute? Thanks!

…earai#1679) * fix(llm): prevent UTF-8 panic in line_bounds() (fixes nearai#1669) `line_bounds()` used `text[..pos]` slicing which panics when `pos` lands inside a multi-byte UTF-8 character. This happens when `end.saturating_sub(1)` in `is_recoverable_tool_call_segment()` steps back into a multi-byte char like emoji. Fix: clamp `pos` to `text.len()` and walk backward to the nearest char boundary before slicing. Add 5 regression tests covering mid-char positions, emoji boundaries, and out-of-bounds pos. Also fix pre-existing clippy `unnecessary_sort_by` warnings in web gateway handlers. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> * test: assert expected values in line_bounds UTF-8 tests Address Gemini review: strengthen regression tests to verify correct return values (not just absence of panic) when pos lands mid-char. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> --------- Co-authored-by: willamhou <willamhou@ceresman.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Happy <yesreply@happy.engineering>

github-actions Bot added scope: channel/web Web gateway channel scope: llm LLM integration size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: new First-time contributor labels Mar 26, 2026

gemini-code-assist Bot reviewed Mar 26, 2026

View reviewed changes

serrrfirat approved these changes Mar 26, 2026

View reviewed changes

zmanian approved these changes Mar 26, 2026

View reviewed changes

github-actions Bot mentioned this pull request Mar 27, 2026

🦞 OpenClaw 生态日报 2026-03-27 gsscsd/big_model_radar#102

Open

zmanian mentioned this pull request Mar 27, 2026

fix(llm): prevent UTF-8 panic in is_recoverable_tool_call_segment #1695

Closed

4 tasks

willamhou and others added 2 commits March 27, 2026 14:17

willamhou force-pushed the fix/utf8-panic-1669 branch from 1d888be to bb6be89 Compare March 27, 2026 06:17

ilblackdragon approved these changes Mar 27, 2026

View reviewed changes

ilblackdragon merged commit 7234700 into nearai:staging Mar 27, 2026
14 checks passed

ironclaw-ci Bot mentioned this pull request Mar 27, 2026

chore: promote staging to main (2026-03-27 07:25 UTC) #1698

Merged

ironclaw-ci Bot mentioned this pull request Mar 30, 2026

chore(ironclaw): release v0.24.0 #1745

Merged

willamhou mentioned this pull request Apr 11, 2026

fix(cli): prevent UTF-8 panic in MCP tool description truncation (fixes #1947) #2008

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): prevent UTF-8 panic in line_bounds() (fixes #1669)#1679

fix(llm): prevent UTF-8 panic in line_bounds() (fixes #1669)#1679
ilblackdragon merged 2 commits intonearai:stagingfrom
willamhou:fix/utf8-panic-1669

willamhou commented Mar 26, 2026

Uh oh!

gemini-code-assist Bot commented Mar 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Uh oh!

serrrfirat left a comment

Uh oh!

serrrfirat left a comment

Uh oh!

zmanian left a comment

Uh oh!

willamhou commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	let _ = line_bounds(text, newline_pos.saturating_sub(1));
	assert_eq!(line_bounds(text, newline_pos.saturating_sub(1)), (0, 12));

Conversation

willamhou commented Mar 26, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Uh oh!

zmanian left a comment

Choose a reason for hiding this comment

Review: fix(llm): prevent UTF-8 panic in line_bounds()

Findings

Verified

Uh oh!

willamhou commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants