Skip to content

Fix skill-creator UTF-8 panic on multi-byte characters#362

Open
Mr-Neutr0n wants to merge 2 commits intoanthropics:mainfrom
Mr-Neutr0n:fix/skill-creator-utf8-panic
Open

Fix skill-creator UTF-8 panic on multi-byte characters#362
Mr-Neutr0n wants to merge 2 commits intoanthropics:mainfrom
Mr-Neutr0n:fix/skill-creator-utf8-panic

Conversation

@Mr-Neutr0n
Copy link

Summary

  • Replace character-based length checks with UTF-8 byte-length validation in quick_validate.py to prevent Rust panics when the CLI processes multi-byte characters
  • Add _utf8_byte_len() and _truncate_utf8_safe() helpers for byte-aware string operations
  • Switch name (64), description (1024), and compatibility (500) field validation from character counts to UTF-8 byte counts

Problem

When a skill description contains multi-byte UTF-8 characters (such as Chinese text), Python len() counts characters rather than bytes. A 350-character Chinese description is only 350 characters but 1050 bytes in UTF-8. The previous validation accepted this since 350 < 1024. When the downstream Rust CLI then attempted to process the string at byte boundaries, it sliced in the middle of a multi-byte character, causing a panic.

Test plan

  • Existing ASCII-only skills still pass validation
  • Chinese descriptions under 1024 bytes pass validation
  • Chinese descriptions over 1024 bytes are correctly rejected
  • _truncate_utf8_safe() correctly avoids splitting multi-byte characters

Fixes #263

Add a pre-parse check in quick_validate.py that scans raw frontmatter
text for unquoted description and compatibility values containing
special YAML characters (: # { } [ ]). These characters cause
yaml.safe_load() to silently misparse values into unexpected types
(e.g., dicts instead of strings), making skills fail to load with
no clear error message.

The check runs before yaml.safe_load() and provides an actionable
error message telling the user to wrap their value in quotes.

Fixes anthropics#338
Replace character-based length checks with UTF-8 byte-length validation
in quick_validate.py. The previous code used Python's len() which counts
characters, allowing strings that fit within character limits but exceed
byte limits to pass validation. When the downstream Rust CLI attempted to
truncate these strings at byte boundaries, it could slice in the middle
of multi-byte UTF-8 characters (e.g., Chinese full-stop U+3002), causing
a panic: "byte index 2 is not a char boundary".

Changes:
- Add _utf8_byte_len() helper for byte-aware length checking
- Add _truncate_utf8_safe() helper that respects character boundaries
- Switch name (64), description (1024), and compatibility (500) field
  validation from character counts to UTF-8 byte counts

Fixes anthropics#263
@Mr-Neutr0n
Copy link
Author

Friendly bump! Let me know if there's anything I should update or improve to help move this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] skill-creator crashes with UTF-8 boundary panic when processing Chinese text

1 participant