Skip to content

Preserve non-ASCII in tagged template literals and regex source#28214

Open
robobun wants to merge 12 commits intomainfrom
farm/2c1f0120/fix-template-literal-unicode
Open

Preserve non-ASCII in tagged template literals and regex source#28214
robobun wants to merge 12 commits intomainfrom
farm/2c1f0120/fix-template-literal-unicode

Conversation

@robobun
Copy link
Collaborator

@robobun robobun commented Mar 18, 2026

Reproduction

const text = String.raw`a中`;
for (const char of text) { console.log(char); }
// Bun (before): a \ u 4 E 2 D
// Node / Bun (after): a 中

console.log(String.raw`Redémarrage`);
// Bun (before): Red\u00E9marrage
// Node / Bun (after): Redémarrage

console.log(/æ/.source);
// Bun (before): \u00E6\u2122
// Node / Bun (after): æ™

Root Cause

In src/js_printer.zig, the printRawTemplateLiteral function escapes non-ASCII characters to \uXXXX sequences when ascii_only is true (which it is for all bun platform output). But raw template literals must preserve the exact source bytes, since they are exposed to JavaScript via String.raw and the .raw property of tagged template arguments. Escaping to \u4E2D changes the runtime string value from one character to six.

The same issue exists in printRegExpLiteral, where non-ASCII characters in regex source are escaped even though the .source property is visible to JavaScript.

Fix

  • Replace the comptime ascii_only parameter on NewPrinter with a runtime prefers_ascii field
  • When printing raw template content or regex source containing non-ASCII, set prefers_ascii = false so bytes are preserved verbatim
  • Switch cloneLatin1cloneUTF8 for transpiler output since it may now contain non-ASCII bytes
  • Bump the runtime transpiler cache version to invalidate stale Latin-1 cached entries

Based on the approach from #15047 by @pfg.

Verification

  • USE_SYSTEM_BUN=1 bun test test/regression/issue/18115.test.ts → 6 failures (bug exists)
  • bun bd test test/regression/issue/18115.test.ts → 6 passes (fix works)
  • Previously skipped test in test/regression/issue/14976/14976.test.ts now passes
  • Shell unicode escape test in bunshell.test.ts updated with correct expectation

Fixes #18115, fixes #8745, fixes #15492, fixes #16763, fixes #8207

The transpiler's printRawTemplateLiteral was escaping non-ASCII
characters to \uXXXX sequences, but raw template literals must
preserve the exact source bytes since they are exposed to JavaScript
via String.raw and the .raw property. Same issue in printRegExpLiteral
for regex .source property.

Replace the comptime ascii_only parameter on NewPrinter with a
runtime prefers_ascii field. When printing raw template content or
regex source containing non-ASCII, disable ASCII-only mode so bytes
are preserved verbatim. Switch cloneLatin1 to cloneUTF8 for transpiler
output since it may now contain non-ASCII, and bump the transpiler
cache version.

Closes #18115
Closes #8745
Closes #15492
Closes #16763
Closes #8207
@robobun
Copy link
Collaborator Author

robobun commented Mar 18, 2026

Updated 1:51 PM PT - Mar 18th, 2026

@robobun, your commit b9c8d67 has 1 failures in Build #40038 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 28214

That installs a local version of the PR into your bun-28214 executable, so you can run:

bun-28214 --bun

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Tightens watcher-only path to require ASCII output; switches cloning from Latin‑1 to UTF‑8 across transpilation and cache paths (cache version bumped, UTF‑8 hash checks added); adjusts JS printing to emit raw bytes for template/RegExp literals; adds/enables tests covering non‑ASCII handling.

Changes

Cohort / File(s) Summary
Module & Transpiler Encoding
src/bun.js/AsyncModule.zig, src/bun.js/ModuleLoader.zig, src/bun.js/RuntimeTranspilerStore.zig, src/bun.js/RuntimeTranspilerCache.zig
Watcher path now requires ASCII-only output to use the ref-counted watcher flow; otherwise falls back to UTF‑8 handling. Replaced cloneLatin1 with cloneUTF8 for source/output cloning. Bumped runtime transpiler cache version (18→19), added UTF‑8 post-read hash verification and adjusted logging to report UTF‑8 lengths.
JS Printer (literals & regexps)
src/js_printer.zig
Removed non-ASCII escape/transformation paths for RawTemplateLiteral and RegExp printing; printing now emits bytes verbatim so raw/template/RegExp literals preserve their runtime values. Related ASCII-only translation branches removed.
Tests — Additions & Updates
test/regression/issue/18115.test.ts, test/regression/issue/14976/14976.test.ts, test/js/bun/shell/bunshell.test.ts
Added comprehensive regression tests for String.raw, template and RegExp non‑ASCII handling and build/runtime flows; enabled a previously skipped unicode/raw test; updated bunshell test expectation to assert literal non‑ASCII output instead of Unicode-escape sequences.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: preserving non-ASCII characters in tagged template literals and regex source, which directly aligns with the changeset's core objective.
Description check ✅ Passed The pull request description provides clear reproduction cases, root cause analysis, fix approach, and verification steps matching the template structure with 'What does this PR do?' and 'How did you verify your code works?' sections.
Linked Issues check ✅ Passed All code changes directly address the linked issues: #18115, #8745, #15492, #16763, and #8207 requiring non-ASCII preservation in template literals and regex source, implemented via js_printer.zig, ModuleLoader.zig, and transpiler cache modifications.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing non-ASCII preservation in template literals and regex source: js_printer.zig modifications, transpiler output encoding switches, cache version bump, and related test updates directly support the stated objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/RuntimeTranspilerCache.zig`:
- Around line 690-698: Entry.load()'s .utf8 branch currently accepts cached
output without verifying metadata.output_hash, allowing same-length corruption
to pass; update the .utf8 load path to compute the output hash for the restored
UTF‑8 bytes (using the same hash function/format used by the .latin1 and .utf16
branches) and compare it against metadata.output_hash, rejecting the entry on
mismatch (same behavior/cleanup path as used in the .latin1/.utf16 branches) so
that the new entries created where output_code = bun.String.cloneUTF8(...) are
validated before being returned.

In `@src/js_printer.zig`:
- Around line 1968-1974: The function printRawTemplateLiteral currently flips
the printer-global flag p.prefers_ascii for all subsequent output; change this
to a temporary/local override: save the original p.prefers_ascii, set
p.prefers_ascii = false only while emitting the raw template bytes with
p.print(bytes), then restore the original value immediately after; apply the
same pattern to the RegExp emission code referenced around the other occurrence
(the RegExp emission function) so neither printRawTemplateLiteral nor the RegExp
emitter mutate the global prefers_ascii state permanently.

In `@test/regression/issue/18115.test.ts`:
- Around line 6-16: All Bun.spawn invocations (creating variable proc) should
use the resource-safe pattern "await using proc = Bun.spawn(...)" instead of
"const proc = Bun.spawn(...)" so processes are disposed automatically; after
awaiting outputs replace the existing Promise.all usage with reads that capture
stdout and stderr (e.g., new Response(proc.stdout).text(), new
Response(proc.stderr).text()) and assert that stderr is empty on success paths
to catch warnings, and continue to await proc.exited for exitCode; additionally,
for the build step that consumes both streams, make sure to fully drain both
stdout and stderr before asserting the exit code (i.e., read both streams to
completion then check exitCode) and keep references to stdout/stderr variables
when making the assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4a167012-b7d5-42a7-b0b5-7f9407a082ea

📥 Commits

Reviewing files that changed from the base of the PR and between 76f4f38 and 7c060a4.

📒 Files selected for processing (8)
  • src/bun.js/AsyncModule.zig
  • src/bun.js/ModuleLoader.zig
  • src/bun.js/RuntimeTranspilerCache.zig
  • src/bun.js/RuntimeTranspilerStore.zig
  • src/js_printer.zig
  • test/js/bun/shell/bunshell.test.ts
  • test/regression/issue/14976/14976.test.ts
  • test/regression/issue/18115.test.ts

robobun and others added 2 commits March 18, 2026 07:54
…dation

- Save and restore prefers_ascii in printRawTemplateLiteral and
  printRegExpLiteral so the flag only affects the specific literal
  being printed, not all subsequent output
- Skip the Latin-1 ref-counted watcher path when output contains
  non-ASCII bytes, falling through to the cloneUTF8 path
- Add hash validation to the UTF-8 cache load path to match the
  Latin-1 and UTF-16 branches
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/js_printer.zig (1)

5893-5909: ⚠️ Potential issue | 🔴 Critical

Fix prefers_ascii wiring for ASCII format variants.

The change at lines 5908 and 6121 binds prefers_ascii to is_bun_platform, but this breaks the intended behavior of Format.esm_ascii and Format.cjs_ascii. These formats should always force ASCII-only output (escaping non-ASCII characters), regardless of target platform.

Current issue: In src/transpiler.zig line 849–856, the .esm_ascii case passes is_bun as the is_bun_platform parameter to printAst(), which then becomes prefers_ascii. On non-Bun targets, this results in prefers_ascii=false, causing non-ASCII characters to be emitted when they should always be escaped.

Expected: Format.esm_ascii should map to prefers_ascii=true and Format.esm to prefers_ascii=false, independent of platform. Similarly for cjs vs. cjs_ascii. The format information must be preserved through the call chain, not lost.

Also note: No corresponding handling exists for Format.cjs_ascii in the transpiler.zig format switch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/js_printer.zig` around lines 5893 - 5909, The printer is being given
is_bun_platform as the prefers_ascii flag (see PrinterType.init call passing
is_bun_platform), which loses the Format.*_ascii semantics; compute a dedicated
prefers_ascii boolean from the selected Format (e.g. true for Format.esm_ascii
and Format.cjs_ascii, false for Format.esm and Format.cjs) and pass that boolean
into PrinterType.init instead of is_bun_platform; update the call in
printAst()/transpiler code paths (reference the printAst callsite and the Format
switch in transpiler.zig) to set prefers_ascii correctly and add the missing
handling for Format.cjs_ascii so ASCII-forcing formats always set prefers_ascii
= true independent of platform.
♻️ Duplicate comments (1)
test/regression/issue/18115.test.ts (1)

12-18: ⚠️ Potential issue | 🟡 Minor

Assert stderr on successful runs.

Each process already drains stderr, but the value is discarded. Keep it and assert expect(stderr).toBe("") so warnings/regressions on the success path fail this suite.

Also applies to: 28-34, 44-50, 60-66, 77-83, 98-103, 112-118

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/regression/issue/18115.test.ts` around lines 12 - 18, The tests
currently discard stderr in the Promise.all destructuring (const [stdout, ,
exitCode] = ...) which hides warnings; change the destructuring to capture
stderr (const [stdout, stderr, exitCode] = await Promise.all([...])) and add an
assertion expect(stderr).toBe("") immediately after the stdout assertion in each
test block (e.g., the block containing the Promise.all and
expect(stdout).toBe("a\n中\n")). Apply the same change to the other similar
blocks mentioned (the ones around lines 28–34, 44–50, 60–66, 77–83, 98–103,
112–118) so all runs verify stderr is empty on success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Line 2: Add a new regression test in test/regression/issue/18115.test.ts that
creates a temp directory and a >50 KB temp file, then invokes the binary twice
with bunEnv overridden to set only BUN_RUNTIME_TRANSPILER_CACHE_PATH to that
temp dir (use bunExe and tempDir fixtures), asserting the first run writes the
cache and the second run restores from it; this ensures the
RuntimeTranspilerCache path in src/bun.js/RuntimeTranspilerCache.zig
(MINIMUM_CACHE_SIZE = 50 KB) is exercised. Also duplicate the same cache-enabled
case for the lines noted around 87-89 so both variants run with the cache
override. Ensure you only override BUN_RUNTIME_TRANSPILER_CACHE_PATH in the
environment and keep other bunEnv behavior unchanged.

---

Outside diff comments:
In `@src/js_printer.zig`:
- Around line 5893-5909: The printer is being given is_bun_platform as the
prefers_ascii flag (see PrinterType.init call passing is_bun_platform), which
loses the Format.*_ascii semantics; compute a dedicated prefers_ascii boolean
from the selected Format (e.g. true for Format.esm_ascii and Format.cjs_ascii,
false for Format.esm and Format.cjs) and pass that boolean into PrinterType.init
instead of is_bun_platform; update the call in printAst()/transpiler code paths
(reference the printAst callsite and the Format switch in transpiler.zig) to set
prefers_ascii correctly and add the missing handling for Format.cjs_ascii so
ASCII-forcing formats always set prefers_ascii = true independent of platform.

---

Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-18: The tests currently discard stderr in the Promise.all
destructuring (const [stdout, , exitCode] = ...) which hides warnings; change
the destructuring to capture stderr (const [stdout, stderr, exitCode] = await
Promise.all([...])) and add an assertion expect(stderr).toBe("") immediately
after the stdout assertion in each test block (e.g., the block containing the
Promise.all and expect(stdout).toBe("a\n中\n")). Apply the same change to the
other similar blocks mentioned (the ones around lines 28–34, 44–50, 60–66,
77–83, 98–103, 112–118) so all runs verify stderr is empty on success.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f04d29fc-00fd-46f3-88c4-4740c764a464

📥 Commits

Reviewing files that changed from the base of the PR and between 7c060a4 and 098b663.

📒 Files selected for processing (5)
  • src/bun.js/AsyncModule.zig
  • src/bun.js/ModuleLoader.zig
  • src/bun.js/RuntimeTranspilerCache.zig
  • src/js_printer.zig
  • test/regression/issue/18115.test.ts

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
test/regression/issue/18115.test.ts (1)

12-18: ⚠️ Potential issue | 🟡 Minor

Assert stderr is empty on success paths.

These tests read stderr but never assert it, so warnings/regressions can slip through while exit code remains 0. Please assert stderr === "" in each success path (including the build step).

Suggested patch pattern
-const [stdout, , exitCode] = await Promise.all([
+const [stdout, stderr, exitCode] = await Promise.all([
   new Response(proc.stdout).text(),
   new Response(proc.stderr).text(),
   proc.exited,
 ]);
 expect(stdout).toBe("a\n中\n");
+expect(stderr).toBe("");
 expect(exitCode).toBe(0);
-const [, , buildExitCode] = await Promise.all([
+const [, buildStderr, buildExitCode] = await Promise.all([
   new Response(buildProc.stdout).text(),
   new Response(buildProc.stderr).text(),
   buildProc.exited,
 ]);
+expect(buildStderr).toBe("");
 expect(buildExitCode).toBe(0);

As per coding guidelines, "Always check exit codes and test error scenarios when spawning processes in tests."

Also applies to: 28-35, 44-51, 60-67, 77-84, 101-108, 116-123, 137-143, 151-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/regression/issue/18115.test.ts` around lines 12 - 18, The test collects
stdout/stderr/exitCode from the spawned process but never asserts stderr is
empty; update the assertions where the variables are collected (the
destructuring of new Response(proc.stdout).text(), new
Response(proc.stderr).text(), proc.exited) to also assert
expect(stderr).toBe("") alongside expect(stdout).toBe(...) and
expect(exitCode).toBe(0), and apply the same added assertion to all other
success-path blocks in this file (the similar blocks at the ranges called out in
the review) so every successful spawn verifies stderr === "".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 88-90: The test's padding currently uses String.repeat on the
"index.ts" source to exceed MINIMUM_CACHE_SIZE; replace that repeat usage with a
Buffer-based allocation per guideline: use Buffer.alloc with the same byte count
and a space fill, then call toString(), so the "index.ts" value uses
Buffer.alloc(..., ' ').toString() for padding instead of " ".repeat(64 * 1024);
update the single occurrence in the test file regression/issue/18115.test.ts
where the "index.ts" entry is constructed.

---

Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-18: The test collects stdout/stderr/exitCode from the spawned
process but never asserts stderr is empty; update the assertions where the
variables are collected (the destructuring of new Response(proc.stdout).text(),
new Response(proc.stderr).text(), proc.exited) to also assert
expect(stderr).toBe("") alongside expect(stdout).toBe(...) and
expect(exitCode).toBe(0), and apply the same added assertion to all other
success-path blocks in this file (the similar blocks at the ranges called out in
the review) so every successful spawn verifies stderr === "".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2a10f862-a2f2-47b8-9839-da3448ba3698

📥 Commits

Reviewing files that changed from the base of the PR and between 098b663 and 1a7b895.

📒 Files selected for processing (1)
  • test/regression/issue/18115.test.ts

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three bugs from my previous review have been fixed in the latest commits (save/restore prefers_ascii, watcher ASCII gate, UTF-8 hash validation). However, this PR refactors a comptime parameter to a runtime field across the entire JS printer and changes the encoding model for module loading -- a human should verify the encoding transition from Latin-1 to UTF-8 doesn't have subtle downstream effects.

Extended reasoning...

Overview

This PR fixes non-ASCII character preservation in tagged template literals (String.raw) and regex .source by replacing the comptime ascii_only parameter on NewPrinter with a runtime prefers_ascii field. It touches 5 core source files (js_printer.zig, ModuleLoader.zig, AsyncModule.zig, RuntimeTranspilerCache.zig, RuntimeTranspilerStore.zig) and adds/updates 3 test files. The encoding model changes from cloneLatin1 to cloneUTF8 across all transpiler output paths, and the cache version is bumped to 19.

Security risks

No direct security risks identified. The changes are about character encoding correctness, not auth, crypto, or permissions. The cache hash validation was missing for UTF-8 but has been added.

Level of scrutiny

This deserves high scrutiny. The comptime-to-runtime refactor of ascii_only to prefers_ascii is a fundamental change to the JS printer that affects every code path that prints identifiers, strings, templates, and regex. The Latin-1 to UTF-8 encoding switch in module loading affects how JSC interprets transpiled source code. Three bugs were found during review (watcher path mojibake, permanent prefers_ascii flip, missing cache hash validation), all now fixed, but this pattern suggests the change has significant surface area for subtle issues.

Other factors

  • CI is showing build-zig failures across multiple platforms on the latest commit, though these appear to point at scripts/build/ci.ts rather than the changed files.
  • The PR is AI-generated (claude label) and has gone through multiple fix iterations.
  • The approach is based on PR #15047, which provides some design precedent.
  • Good regression test coverage was added, including a >50KB cache test that exercises the RuntimeTranspilerCache UTF-8 path.
  • All inline review comments (from both CodeRabbit and my previous review) have been addressed with code fixes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 1-158: The test uses tempDir (which returns an AsyncDisposable)
but declares its disposables with plain using; change the tempDir disposables to
use await using to ensure proper async disposal. Replace the three declarations
that use tempDir—"using dir = tempDir(\"string-raw-rtc\", ...)", "using cacheDir
= tempDir(\"string-raw-rtc-cache\", ...)", and the later "using dir =
tempDir(\"string-raw-unicode\", ...)"—with await using, keeping the same
variable names (dir, cacheDir) and surrounding logic unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7d6b3508-ec65-4e69-8755-2ec912f6f2d4

📥 Commits

Reviewing files that changed from the base of the PR and between 1a7b895 and 5046bdf.

📒 Files selected for processing (1)
  • test/regression/issue/18115.test.ts

Copy link
Member

@alii alii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior fix is correct, but the comptime→runtime conversion in js_printer.zig is unnecessary and can be reverted.

The save/restore pattern is dead code. In both printRawTemplateLiteral and the regex block:

const saved = p.prefers_ascii;
defer p.prefers_ascii = saved;
if (p.prefers_ascii and !strings.isAllASCII(bytes)) p.prefers_ascii = false;
p.print(bytes);

p.print() forwards to p.writer.print()BufferWriter.writeAllbuffer.append() — no access to prefers_ascii anywhere in the chain. The old escaping happened inline inside these functions (the deleted CodepointIterator loops); p.print() was always a raw byte write. Deleting the escaping loop and calling p.print(bytes) is the entire fix; toggling prefers_ascii around it has zero effect.

(Note: pfg's #15047 set prefers_ascii = false permanently, which at least affects subsequent printIdentifier calls. The save/restore added here scopes it to the one line that ignores it — strictly less effect.)

Because the runtime flag is never usefully mutated, the comptime→runtime conversion is pure complexity. All four PrinterType.init() callers pass compile-time constants. The field could go back to being the comptime ascii_only parameter.

Suggested minimal fix:

  • printRawTemplateLiteral: body becomes just p.print(bytes); (keep the comment explaining why we don't escape)
  • Regex print: delete the is_bun_platform escaping block, leave p.print(e.value);
  • Revert ascii_only back to a comptime parameter on NewPrinter — undo the propagation through canPrintWithoutEscape, writePreQuotedString, printIdentifier, printIdentifierUTF16, canPrintIdentifierUTF16
  • Keep everything else: cloneLatin1cloneUTF8, watcher isAllASCII guard, cache version bump, cache UTF-8 hash check, tests. These are all correct and necessary.

Nit: tests use new Response(proc.stdout).text()proc.stdout.text() is the simpler form.

robobun and others added 2 commits March 18, 2026 16:39
Per review: the comptime ascii_only parameter doesn't need to become a
runtime field. The fix is just removing the escaping loops from
printRawTemplateLiteral and printRegExpLiteral — p.print(bytes) was
always a raw byte write that never checked ascii_only.

Also use proc.stdout.text() instead of new Response(proc.stdout).text()
in tests per nit.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/regression/issue/18115.test.ts (1)

12-16: ⚠️ Potential issue | 🟡 Minor

Assert stderr is empty on success-path spawns.

At Line 12, Line 28, Line 44, Line 60, Line 77, Line 101, Line 116, Line 137, and Line 151, stderr is read but discarded. That weakens regression signal because warnings/errors on stderr can slip through with exit code 0.

Suggested patch pattern
- const [stdout, , exitCode] = await Promise.all([
+ const [stdout, stderr, exitCode] = await Promise.all([
    proc.stdout.text(),
    proc.stderr.text(),
    proc.exited,
  ]);
  expect(stdout).toBe("a\n中\n");
+ expect(stderr).toBe("");
  expect(exitCode).toBe(0);
- const [, , buildExitCode] = await Promise.all([
+ const [, buildStderr, buildExitCode] = await Promise.all([
    buildProc.stdout.text(),
    buildProc.stderr.text(),
    buildProc.exited,
  ]);
+ expect(buildStderr).toBe("");
  expect(buildExitCode).toBe(0);

Based on learnings: "In oven-sh/bun regression tests, for files under test/regression/issue/*.test.ts, prefer explicit per-test setup and per-test assertions (including repeated tempDir, Bun.spawn, bunExe/bunEnv, and checks of stdout, stderr, and exitCode) rather than extracting shared helpers."

Also applies to: 28-32, 44-48, 60-64, 77-81, 101-105, 116-120, 137-141, 151-155

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/regression/issue/18115.test.ts` around lines 12 - 16, The current
Promise.all destructuring discards stderr (using proc.stderr.text()) which lets
warnings slip through; change each instance where you await [stdout, , exitCode]
from Promise.all to capture stderr (e.g., [stdout, stderr, exitCode] = await
Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited])) and add an
explicit assertion that stderr is empty on a success exit (for example
expect(exitCode).toBe(0); expect(stderr).toBe('') or equivalent), keeping the
existing stdout assertions; apply this change for the proc usage sites
(proc.stdout.text(), proc.stderr.text(), proc.exited) in the test so successful
spawns fail if anything was written to stderr.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-16: The current Promise.all destructuring discards stderr
(using proc.stderr.text()) which lets warnings slip through; change each
instance where you await [stdout, , exitCode] from Promise.all to capture stderr
(e.g., [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(),
proc.stderr.text(), proc.exited])) and add an explicit assertion that stderr is
empty on a success exit (for example expect(exitCode).toBe(0);
expect(stderr).toBe('') or equivalent), keeping the existing stdout assertions;
apply this change for the proc usage sites (proc.stdout.text(),
proc.stderr.text(), proc.exited) in the test so successful spawns fail if
anything was written to stderr.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6fff0fb9-6f61-4c31-aa57-620abb0de7c6

📥 Commits

Reviewing files that changed from the base of the PR and between 5046bdf and 546b4d1.

📒 Files selected for processing (2)
  • src/js_printer.zig
  • test/regression/issue/18115.test.ts

- DevServer.zig: change cloneLatin1 to cloneUTF8 on the no-source-map
  HMR path (line 2564), matching the with-source-map path
- AsyncModule.zig: move addFile watcher registration outside the ASCII
  guard so files are always registered for hot reload regardless of
  whether output contains non-ASCII bytes
- StandaloneModuleGraph.zig: use .utf8 encoding for JS output in
  bun build --compile, since bundler output may contain raw UTF-8
- OutputFile.zig: fall back to cloneUTF8 when output contains
  non-ASCII, since createExternal with isLatin1=true can't handle
  multi-byte UTF-8 sequences
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All bugs found during review have been addressed, but this PR touches encoding-sensitive paths across 11 files (module loader, watcher, HMR, compile, caching) and required 7+ fix commits during review — it would benefit from a human look to confirm no edge cases remain.

Extended reasoning...

Overview

This PR fixes non-ASCII character handling in tagged template literals (String.raw) and regex .source by removing the \uXXXX escape logic from printRawTemplateLiteral and printRegExpLiteral in js_printer.zig. Since output can now contain raw UTF-8 multi-byte sequences, all downstream consumers were updated from cloneLatin1 to cloneUTF8: ModuleLoader.zig, AsyncModule.zig, RuntimeTranspilerStore.zig, RuntimeTranspilerCache.zig, DevServer.zig, StandaloneModuleGraph.zig, and OutputFile.zig. The watcher paths use an isAllASCII guard to preserve the ref-counted Latin-1 fast path when possible.

Security risks

No direct security concerns — this is an encoding correctness fix. The change ensures runtime string values match source intent, which is arguably more correct from a security perspective (no silent data transformation).

Level of scrutiny

High scrutiny warranted. This PR modifies encoding assumptions across the entire module loading stack. During review, 7 distinct bugs were found and fixed (3 from my reviews, 3 from CodeRabbit, 1 from a human reviewer), spanning: missing UTF-8 hash validation in the cache, watcher paths skipping addFile registration, permanent prefers_ascii flag mutation, DevServer HMR using wrong encoding, and StandaloneModuleGraph/OutputFile hardcoding Latin-1. The volume of issues found suggests the initial implementation was incomplete, and while all were addressed, the pattern raises concern about untested edge cases in less common code paths.

Other factors

The approach was revised mid-review (commit 546b4d1 reverted the comptime-to-runtime prefers_ascii conversion in favor of simply always emitting raw bytes for template literals and regex). This is a cleaner design. Test coverage is comprehensive — runtime, build, cache (with >50KB file to exceed MINIMUM_CACHE_SIZE), and the previously-skipped 14976 test is now enabled. The cache version was bumped to 19. However, the claude label indicates this was AI-authored, and the number of bugs found during review suggests a human should verify the final state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

2 participants