Fix skill-creator UTF-8 panic on multi-byte characters#362
Open
Mr-Neutr0n wants to merge 2 commits intoanthropics:mainfrom
Open
Fix skill-creator UTF-8 panic on multi-byte characters#362Mr-Neutr0n wants to merge 2 commits intoanthropics:mainfrom
Mr-Neutr0n wants to merge 2 commits intoanthropics:mainfrom
Conversation
Add a pre-parse check in quick_validate.py that scans raw frontmatter
text for unquoted description and compatibility values containing
special YAML characters (: # { } [ ]). These characters cause
yaml.safe_load() to silently misparse values into unexpected types
(e.g., dicts instead of strings), making skills fail to load with
no clear error message.
The check runs before yaml.safe_load() and provides an actionable
error message telling the user to wrap their value in quotes.
Fixes anthropics#338
Replace character-based length checks with UTF-8 byte-length validation in quick_validate.py. The previous code used Python's len() which counts characters, allowing strings that fit within character limits but exceed byte limits to pass validation. When the downstream Rust CLI attempted to truncate these strings at byte boundaries, it could slice in the middle of multi-byte UTF-8 characters (e.g., Chinese full-stop U+3002), causing a panic: "byte index 2 is not a char boundary". Changes: - Add _utf8_byte_len() helper for byte-aware length checking - Add _truncate_utf8_safe() helper that respects character boundaries - Switch name (64), description (1024), and compatibility (500) field validation from character counts to UTF-8 byte counts Fixes anthropics#263
Author
|
Friendly bump! Let me know if there's anything I should update or improve to help move this forward. |
This was referenced Feb 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Problem
When a skill description contains multi-byte UTF-8 characters (such as Chinese text), Python len() counts characters rather than bytes. A 350-character Chinese description is only 350 characters but 1050 bytes in UTF-8. The previous validation accepted this since 350 < 1024. When the downstream Rust CLI then attempted to process the string at byte boundaries, it sliced in the middle of a multi-byte character, causing a panic.
Test plan
Fixes #263