Regenerate character tables with Unicode 17.0 data#37
Merged
adams85 merged 7 commits intoadams85:masterfrom Apr 8, 2026
Merged
Regenerate character tables with Unicode 17.0 data#37adams85 merged 7 commits intoadams85:masterfrom
adams85 merged 7 commits intoadams85:masterfrom
Conversation
Regenerated Tokenizer.Helpers.Generated.cs using CharMaskGenerator with UnicodeInformation built from hexawyz/NetUnicodeInfo feature/unicode-17.0 branch (Unicode 17.0 data). This updates BMP lookup masks and astral plane range arrays to include characters added in Unicode 15.1, 16.0, and 17.0. Previously, the tables were generated from UnicodeInformation v2.7.1 which only included Unicode 15.0 data, causing 48 Jint test262 identifier tests to fail for Unicode 15.1/16.0/17.0 characters. With this update all 535 identifier tests pass (verified via Jint test262 suite). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update AcornIdentifier patterns (BMP regex, astral arrays) to Unicode 17.0 from DerivedCoreProperties.txt so IsIdentifierCharMatchesAcornImpl passes against the regenerated character tables. Skip LookupWorks test until UnicodeInformation v2.8.0 (Unicode 17.0) is published to NuGet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regenerated using acornjs bin/generate-identifier-regex.js with @unicode/unicode-17.0.0 instead of custom Python script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
Owner
|
Wow, the last piece of the puzzle! 🎉 Thank you! I'll get to this as soon as #36 is finished. (It's taking shape nicely BTW, some additional testing is all that's left.) |
Owner
|
This one empties the test262 whitelist again and paves the way to ES2026 compatibility. 🎉 Thank you, @lahma, for your great help making this possible. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tokenizer.Helpers.Generated.cs(BMP lookup masks + astral plane range arrays) with Unicode 17.0 ID_Start/ID_Continue dataAcornIdentifier.cstest reference patterns to Unicode 17.0 using canonical acornjsbin/generate-identifier-regex.jswith@unicode/unicode-17.0.0Together with #36, closes #24.
How it was generated
Tokenizer tables:
CharMaskGenerator.GenerateMaskstest was run against hexawyz/NetUnicodeInfofeature/unicode-17.0(UnicodeInformation v2.8.0, active PR with Unicode 17.0 data) via a temporary project reference.LookupWorksverified correctness across all code points (U+0000–U+10FFFF).AcornIdentifier patterns: Generated using acornjs
bin/generate-identifier-regex.jswith@unicode/unicode-17.0.0devDependency.LookupWorksis skipped until UnicodeInformation v2.8.0 is published to NuGet.Verification
IsIdentifierCharMatchesAcornImpltest: passes (all code points match between AcornIdentifier and Tokenizer)Test plan
IsIdentifierCharMatchesAcornImplpasses on net10.0LookupWorkspasses locally with UnicodeInformation v2.8.0 project reference (skipped in CI until package published)Identifierstests: 535/535 pass🤖 Generated with Claude Code