Skip to content

Conversation

@robobun
Copy link
Collaborator

@robobun robobun commented Feb 7, 2026

Summary

  • Fixes regex printing of backslash-escaped non-ASCII characters
  • Tracks whether previous character was a backslash to avoid doubling backslashes when generating unicode escape sequences

Problem

When the regex printer encounters a non-ASCII character that was preceded by a backslash escape, it was incorrectly adding another backslash before the unicode escape sequence, resulting in \\uXXXX instead of \uXXXX.

For example:

const R = /[\⁄]/;  // backslash + U+2044 (fraction slash)

Was being printed as /[\\u2044]/, which changes the regex semantics:

  • Expected: matches the fraction slash character (⁄)
  • Actual (bug): matches \, u, 2, 0, 4, 4 as separate characters

This caused the regex from the issue to fail matching fractions:

const R = /[½¼¾]|([¹²³]+|[]+|[0-9]+)([\/\⁄])([¹²³]+|[]+|[0-9]+)/;
'³⁄₅₂ cup of stuff'.match(R);
// Expected: ["³⁄₅₂", "³", "⁄", "₅₂"]
// Actual: null

Solution

Track whether the previous character was a backslash when iterating through the regex literal. When generating unicode escape sequences for non-ASCII characters, omit the leading backslash if the previous character was already a backslash (which was escaping the non-ASCII character).

Test plan

  • Added regression test in test/regression/issue/26785.test.ts
  • Test verifies the regex source is correctly printed
  • Test verifies the regex matches the expected characters

Closes #26785

🤖 Generated with Claude Code

When the regex printer encounters a non-ASCII character that was
preceded by a backslash escape, it was incorrectly adding another
backslash before the unicode escape sequence, resulting in `\\uXXXX`
instead of `\uXXXX`.

For example, `/[\⁄]/` (backslash + U+2044 fraction slash) was being
printed as `/[\\u2044]/`, which changes the regex semantics from
matching the fraction slash to matching `\`, `u`, `2`, `0`, `4`, `4`
as separate characters.

This fix tracks whether the previous character was a backslash and
omits the leading backslash when generating unicode escape sequences
for escaped non-ASCII characters.

Closes #26785

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added the claude label Feb 7, 2026
@robobun
Copy link
Collaborator Author

robobun commented Feb 7, 2026

Updated 10:49 PM PT - Feb 6th, 2026

❌ Your commit 6757c4d1 has 1 failures in Build #36670 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 26786

That installs a local version of the PR into your bun-26786 executable, so you can run:

bun-26786 --bun

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2026

Walkthrough

Adds tracking of whether the previous character was a backslash during bun_platform non-ASCII escaping, and uses that to choose between emitting full Unicode escapes (\uXXXX) or partial escapes (uXXXX) in template literals and RegExp literals. Also adds regression tests for the behavior.

Changes

Cohort / File(s) Summary
Non-ASCII Character Escaping Logic
src/js_printer.zig
Adds prev_was_backslash state initialization and reset in the bun_platform non-ASCII escaping routine. Updates emission paths to choose between \\uXXXX and uXXXX for BMP code points and between paired \\uXXXX/uXXXX sequences for surrogate pairs. Mirrors logic in RegExp literal printing to avoid doubled backslashes.
Regression Test Suite
test/regression/issue/26785.test.ts
Adds three Bun-based regression tests verifying printing and matching of backslash-escaped and non-escaped non-ASCII characters in regex literals (including fraction slash cases). Tests run via Bun.spawn and assert source, match contents, index, and exit code.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: avoiding double backslash in regex with escaped non-ASCII characters, which is the primary change in this PR.
Description check ✅ Passed The description provides clear sections for problem, solution, and test plan, though it doesn't strictly follow the template structure with 'What does this PR do?' and 'How did you verify?' headings.
Linked Issues check ✅ Passed The PR successfully addresses issue #26785 by fixing the regex printer to correctly handle backslash-escaped non-ASCII characters, preventing double backslashes and enabling proper regex matching.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the backslash-unicode escape sequence issue in both the JS printer and adding regression tests, with no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@test/regression/issue/26785.test.ts`:
- Around line 11-33: Replace the tempDir + file I/O pattern used in the test
(the using dir = tempDir("issue-26785", { "test.js": ... }) and subsequent
Bun.spawn with cwd) with a single-file spawn using bunExe() -e and inline
source: stop creating "test.js" on disk, pass the script string as the argument
to bun via ["bunExe()", "-e", "<script source>"] in the Bun.spawn call, remove
cwd/file-related setup and teardown, and apply the same replacement to the other
two test cases referenced (the blocks around lines 47–62 and 78–97) so each test
uses bunExe() with -e and no tempDir.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RegEx unicode mismatch?

1 participant