Skip to content

Regenerate character tables with Unicode 17.0 data#37

Merged
adams85 merged 7 commits intoadams85:masterfrom
lahma:unicode-17-tables
Apr 8, 2026
Merged

Regenerate character tables with Unicode 17.0 data#37
adams85 merged 7 commits intoadams85:masterfrom
lahma:unicode-17-tables

Conversation

@lahma
Copy link
Copy Markdown
Collaborator

@lahma lahma commented Apr 6, 2026

Summary

  • Regenerated Tokenizer.Helpers.Generated.cs (BMP lookup masks + astral plane range arrays) with Unicode 17.0 ID_Start/ID_Continue data
  • Updated AcornIdentifier.cs test reference patterns to Unicode 17.0 using canonical acornjs bin/generate-identifier-regex.js with @unicode/unicode-17.0.0
  • This adds identifier support for characters introduced in Unicode 15.1, 16.0, and 17.0

Together with #36, closes #24.

How it was generated

Tokenizer tables: CharMaskGenerator.GenerateMasks test was run against hexawyz/NetUnicodeInfo feature/unicode-17.0 (UnicodeInformation v2.8.0, active PR with Unicode 17.0 data) via a temporary project reference. LookupWorks verified correctness across all code points (U+0000–U+10FFFF).

AcornIdentifier patterns: Generated using acornjs bin/generate-identifier-regex.js with @unicode/unicode-17.0.0 devDependency.

LookupWorks is skipped until UnicodeInformation v2.8.0 is published to NuGet.

Verification

  • Acornima IsIdentifierCharMatchesAcornImpl test: passes (all code points match between AcornIdentifier and Tokenizer)
  • Jint test262 identifier tests: 535 passed, 0 failed (48 tests for Unicode 15.1/16.0/17.0 characters that previously failed now pass, 0 regressions)

Test plan

  • IsIdentifierCharMatchesAcornImpl passes on net10.0
  • LookupWorks passes locally with UnicodeInformation v2.8.0 project reference (skipped in CI until package published)
  • Jint test262 Identifiers tests: 535/535 pass

🤖 Generated with Claude Code

lahma and others added 3 commits April 6, 2026 03:35
Regenerated Tokenizer.Helpers.Generated.cs using CharMaskGenerator with
UnicodeInformation built from hexawyz/NetUnicodeInfo feature/unicode-17.0
branch (Unicode 17.0 data). This updates BMP lookup masks and astral
plane range arrays to include characters added in Unicode 15.1, 16.0,
and 17.0.

Previously, the tables were generated from UnicodeInformation v2.7.1
which only included Unicode 15.0 data, causing 48 Jint test262
identifier tests to fail for Unicode 15.1/16.0/17.0 characters.
With this update all 535 identifier tests pass (verified via Jint
test262 suite).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update AcornIdentifier patterns (BMP regex, astral arrays) to Unicode
17.0 from DerivedCoreProperties.txt so IsIdentifierCharMatchesAcornImpl
passes against the regenerated character tables.

Skip LookupWorks test until UnicodeInformation v2.8.0 (Unicode 17.0)
is published to NuGet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regenerated using acornjs bin/generate-identifier-regex.js with
@unicode/unicode-17.0.0 instead of custom Python script.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adams85
Copy link
Copy Markdown
Owner

adams85 commented Apr 6, 2026

Wow, the last piece of the puzzle! 🎉 Thank you!

I'll get to this as soon as #36 is finished. (It's taking shape nicely BTW, some additional testing is all that's left.)

@adams85
Copy link
Copy Markdown
Owner

adams85 commented Apr 8, 2026

This one empties the test262 whitelist again and paves the way to ES2026 compatibility. 🎉

Thank you, @lahma, for your great help making this possible.

@adams85 adams85 closed this Apr 8, 2026
@adams85 adams85 reopened this Apr 8, 2026
@adams85 adams85 merged commit 65d1b21 into adams85:master Apr 8, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade to Unicode 17

2 participants