Preserve non-ASCII in tagged template literals and regex source by robobun · Pull Request #28214 · oven-sh/bun

robobun · 2026-03-18T06:43:42Z

Reproduction

const text = String.raw`a中`;
for (const char of text) { console.log(char); }
// Bun (before): a \ u 4 E 2 D
// Node / Bun (after): a 中

console.log(String.raw`Redémarrage`);
// Bun (before): Red\u00E9marrage
// Node / Bun (after): Redémarrage

console.log(/æ™/.source);
// Bun (before): \u00E6\u2122
// Node / Bun (after): æ™

Root Cause

In src/js_printer.zig, the printRawTemplateLiteral function escapes non-ASCII characters to \uXXXX sequences when ascii_only is true (which it is for all bun platform output). But raw template literals must preserve the exact source bytes, since they are exposed to JavaScript via String.raw and the .raw property of tagged template arguments. Escaping 中 to \u4E2D changes the runtime string value from one character to six.

The same issue exists in printRegExpLiteral, where non-ASCII characters in regex source are escaped even though the .source property is visible to JavaScript.

Fix

Replace the comptime ascii_only parameter on NewPrinter with a runtime prefers_ascii field
When printing raw template content or regex source containing non-ASCII, set prefers_ascii = false so bytes are preserved verbatim
Switch cloneLatin1 → cloneUTF8 for transpiler output since it may now contain non-ASCII bytes
Bump the runtime transpiler cache version to invalidate stale Latin-1 cached entries

Based on the approach from #15047 by @pfg.

Verification

USE_SYSTEM_BUN=1 bun test test/regression/issue/18115.test.ts → 6 failures (bug exists)
bun bd test test/regression/issue/18115.test.ts → 6 passes (fix works)
Previously skipped test in test/regression/issue/14976/14976.test.ts now passes
Shell unicode escape test in bunshell.test.ts updated with correct expectation

Fixes #18115, fixes #8745, fixes #15492, fixes #16763, fixes #8207

The transpiler's printRawTemplateLiteral was escaping non-ASCII characters to \uXXXX sequences, but raw template literals must preserve the exact source bytes since they are exposed to JavaScript via String.raw and the .raw property. Same issue in printRegExpLiteral for regex .source property. Replace the comptime ascii_only parameter on NewPrinter with a runtime prefers_ascii field. When printing raw template content or regex source containing non-ASCII, disable ASCII-only mode so bytes are preserved verbatim. Switch cloneLatin1 to cloneUTF8 for transpiler output since it may now contain non-ASCII, and bump the transpiler cache version. Closes #18115 Closes #8745 Closes #15492 Closes #16763 Closes #8207

robobun · 2026-03-18T06:43:52Z

^{Updated 1:51 PM PT - Mar 18th, 2026}

❌ @robobun, your commit b9c8d67 has 1 failures in Build #40038 (All Failures):

scripts/build/ci.ts - build failed on 🐧 aarch64-musl - build-cpp

🧪 To try this PR locally:

bunx bun-pr 28214

That installs a local version of the PR into your bun-28214 executable, so you can run:

bun-28214 --bun

coderabbitai · 2026-03-18T06:47:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Tightens watcher-only path to require ASCII output; switches cloning from Latin‑1 to UTF‑8 across transpilation and cache paths (cache version bumped, UTF‑8 hash checks added); adjusts JS printing to emit raw bytes for template/RegExp literals; adds/enables tests covering non‑ASCII handling.

Changes

Cohort / File(s)	Summary
Module & Transpiler Encoding `src/bun.js/AsyncModule.zig`, `src/bun.js/ModuleLoader.zig`, `src/bun.js/RuntimeTranspilerStore.zig`, `src/bun.js/RuntimeTranspilerCache.zig`	Watcher path now requires ASCII-only output to use the ref-counted watcher flow; otherwise falls back to UTF‑8 handling. Replaced `cloneLatin1` with `cloneUTF8` for source/output cloning. Bumped runtime transpiler cache version (18→19), added UTF‑8 post-read hash verification and adjusted logging to report UTF‑8 lengths.
JS Printer (literals & regexps) `src/js_printer.zig`	Removed non-ASCII escape/transformation paths for RawTemplateLiteral and RegExp printing; printing now emits bytes verbatim so raw/template/RegExp literals preserve their runtime values. Related ASCII-only translation branches removed.
Tests — Additions & Updates `test/regression/issue/18115.test.ts`, `test/regression/issue/14976/14976.test.ts`, `test/js/bun/shell/bunshell.test.ts`	Added comprehensive regression tests for String.raw, template and RegExp non‑ASCII handling and build/runtime flows; enabled a previously skipped unicode/raw test; updated bunshell test expectation to assert literal non‑ASCII output instead of Unicode-escape sequences.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: preserving non-ASCII characters in tagged template literals and regex source, which directly aligns with the changeset's core objective.
Description check	✅ Passed	The pull request description provides clear reproduction cases, root cause analysis, fix approach, and verification steps matching the template structure with 'What does this PR do?' and 'How did you verify your code works?' sections.
Linked Issues check	✅ Passed	All code changes directly address the linked issues: `#18115`, `#8745`, `#15492`, `#16763`, and `#8207` requiring non-ASCII preservation in template literals and regex source, implemented via js_printer.zig, ModuleLoader.zig, and transpiler cache modifications.
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing non-ASCII preservation in template literals and regex source: js_printer.zig modifications, transpiler output encoding switches, cache version bump, and related test updates directly support the stated objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/RuntimeTranspilerCache.zig`:
- Around line 690-698: Entry.load()'s .utf8 branch currently accepts cached
output without verifying metadata.output_hash, allowing same-length corruption
to pass; update the .utf8 load path to compute the output hash for the restored
UTF‑8 bytes (using the same hash function/format used by the .latin1 and .utf16
branches) and compare it against metadata.output_hash, rejecting the entry on
mismatch (same behavior/cleanup path as used in the .latin1/.utf16 branches) so
that the new entries created where output_code = bun.String.cloneUTF8(...) are
validated before being returned.

In `@src/js_printer.zig`:
- Around line 1968-1974: The function printRawTemplateLiteral currently flips
the printer-global flag p.prefers_ascii for all subsequent output; change this
to a temporary/local override: save the original p.prefers_ascii, set
p.prefers_ascii = false only while emitting the raw template bytes with
p.print(bytes), then restore the original value immediately after; apply the
same pattern to the RegExp emission code referenced around the other occurrence
(the RegExp emission function) so neither printRawTemplateLiteral nor the RegExp
emitter mutate the global prefers_ascii state permanently.

In `@test/regression/issue/18115.test.ts`:
- Around line 6-16: All Bun.spawn invocations (creating variable proc) should
use the resource-safe pattern "await using proc = Bun.spawn(...)" instead of
"const proc = Bun.spawn(...)" so processes are disposed automatically; after
awaiting outputs replace the existing Promise.all usage with reads that capture
stdout and stderr (e.g., new Response(proc.stdout).text(), new
Response(proc.stderr).text()) and assert that stderr is empty on success paths
to catch warnings, and continue to await proc.exited for exitCode; additionally,
for the build step that consumes both streams, make sure to fully drain both
stdout and stderr before asserting the exit code (i.e., read both streams to
completion then check exitCode) and keep references to stdout/stderr variables
when making the assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4a167012-b7d5-42a7-b0b5-7f9407a082ea

📥 Commits

Reviewing files that changed from the base of the PR and between 76f4f38 and 7c060a4.

📒 Files selected for processing (8)

src/bun.js/AsyncModule.zig
src/bun.js/ModuleLoader.zig
src/bun.js/RuntimeTranspilerCache.zig
src/bun.js/RuntimeTranspilerStore.zig
src/js_printer.zig
test/js/bun/shell/bunshell.test.ts
test/regression/issue/14976/14976.test.ts
test/regression/issue/18115.test.ts

src/bun.js/RuntimeTranspilerCache.zig

src/js_printer.zig

test/regression/issue/18115.test.ts

src/bun.js/ModuleLoader.zig

src/js_printer.zig

src/bun.js/RuntimeTranspilerCache.zig

…dation - Save and restore prefers_ascii in printRawTemplateLiteral and printRegExpLiteral so the flag only affects the specific literal being printed, not all subsequent output - Skip the Latin-1 ref-counted watcher path when output contains non-ASCII bytes, falling through to the cloneUTF8 path - Add hash validation to the UTF-8 cache load path to match the Latin-1 and UTF-16 branches

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/js_printer.zig (1)
5893-5909: ⚠️ Potential issue | 🔴 Critical

Fix prefers_ascii wiring for ASCII format variants.

The change at lines 5908 and 6121 binds prefers_ascii to is_bun_platform, but this breaks the intended behavior of Format.esm_ascii and Format.cjs_ascii. These formats should always force ASCII-only output (escaping non-ASCII characters), regardless of target platform.

Current issue: In src/transpiler.zig line 849–856, the .esm_ascii case passes is_bun as the is_bun_platform parameter to printAst(), which then becomes prefers_ascii. On non-Bun targets, this results in prefers_ascii=false, causing non-ASCII characters to be emitted when they should always be escaped.

Expected: Format.esm_ascii should map to prefers_ascii=true and Format.esm to prefers_ascii=false, independent of platform. Similarly for cjs vs. cjs_ascii. The format information must be preserved through the call chain, not lost.

Also note: No corresponding handling exists for Format.cjs_ascii in the transpiler.zig format switch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/js_printer.zig` around lines 5893 - 5909, The printer is being given
is_bun_platform as the prefers_ascii flag (see PrinterType.init call passing
is_bun_platform), which loses the Format.*_ascii semantics; compute a dedicated
prefers_ascii boolean from the selected Format (e.g. true for Format.esm_ascii
and Format.cjs_ascii, false for Format.esm and Format.cjs) and pass that boolean
into PrinterType.init instead of is_bun_platform; update the call in
printAst()/transpiler code paths (reference the printAst callsite and the Format
switch in transpiler.zig) to set prefers_ascii correctly and add the missing
handling for Format.cjs_ascii so ASCII-forcing formats always set prefers_ascii
= true independent of platform.

♻️ Duplicate comments (1)

test/regression/issue/18115.test.ts (1)
12-18: ⚠️ Potential issue | 🟡 Minor

Assert stderr on successful runs.

Each process already drains stderr, but the value is discarded. Keep it and assert expect(stderr).toBe("") so warnings/regressions on the success path fail this suite.

Also applies to: 28-34, 44-50, 60-66, 77-83, 98-103, 112-118
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/regression/issue/18115.test.ts` around lines 12 - 18, The tests
currently discard stderr in the Promise.all destructuring (const [stdout, ,
exitCode] = ...) which hides warnings; change the destructuring to capture
stderr (const [stdout, stderr, exitCode] = await Promise.all([...])) and add an
assertion expect(stderr).toBe("") immediately after the stdout assertion in each
test block (e.g., the block containing the Promise.all and
expect(stdout).toBe("a\n中\n")). Apply the same change to the other similar
blocks mentioned (the ones around lines 28–34, 44–50, 60–66, 77–83, 98–103,
112–118) so all runs verify stderr is empty on success.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Line 2: Add a new regression test in test/regression/issue/18115.test.ts that
creates a temp directory and a >50 KB temp file, then invokes the binary twice
with bunEnv overridden to set only BUN_RUNTIME_TRANSPILER_CACHE_PATH to that
temp dir (use bunExe and tempDir fixtures), asserting the first run writes the
cache and the second run restores from it; this ensures the
RuntimeTranspilerCache path in src/bun.js/RuntimeTranspilerCache.zig
(MINIMUM_CACHE_SIZE = 50 KB) is exercised. Also duplicate the same cache-enabled
case for the lines noted around 87-89 so both variants run with the cache
override. Ensure you only override BUN_RUNTIME_TRANSPILER_CACHE_PATH in the
environment and keep other bunEnv behavior unchanged.

---

Outside diff comments:
In `@src/js_printer.zig`:
- Around line 5893-5909: The printer is being given is_bun_platform as the
prefers_ascii flag (see PrinterType.init call passing is_bun_platform), which
loses the Format.*_ascii semantics; compute a dedicated prefers_ascii boolean
from the selected Format (e.g. true for Format.esm_ascii and Format.cjs_ascii,
false for Format.esm and Format.cjs) and pass that boolean into PrinterType.init
instead of is_bun_platform; update the call in printAst()/transpiler code paths
(reference the printAst callsite and the Format switch in transpiler.zig) to set
prefers_ascii correctly and add the missing handling for Format.cjs_ascii so
ASCII-forcing formats always set prefers_ascii = true independent of platform.

---

Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-18: The tests currently discard stderr in the Promise.all
destructuring (const [stdout, , exitCode] = ...) which hides warnings; change
the destructuring to capture stderr (const [stdout, stderr, exitCode] = await
Promise.all([...])) and add an assertion expect(stderr).toBe("") immediately
after the stdout assertion in each test block (e.g., the block containing the
Promise.all and expect(stdout).toBe("a\n中\n")). Apply the same change to the
other similar blocks mentioned (the ones around lines 28–34, 44–50, 60–66,
77–83, 98–103, 112–118) so all runs verify stderr is empty on success.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f04d29fc-00fd-46f3-88c4-4740c764a464

📥 Commits

Reviewing files that changed from the base of the PR and between 7c060a4 and 098b663.

📒 Files selected for processing (5)

src/bun.js/AsyncModule.zig
src/bun.js/ModuleLoader.zig
src/bun.js/RuntimeTranspilerCache.zig
src/js_printer.zig
test/regression/issue/18115.test.ts

test/regression/issue/18115.test.ts

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test/regression/issue/18115.test.ts (1)

12-18: ⚠️ Potential issue | 🟡 Minor

Assert stderr is empty on success paths.

These tests read stderr but never assert it, so warnings/regressions can slip through while exit code remains 0. Please assert stderr === "" in each success path (including the build step).

Suggested patch pattern

-const [stdout, , exitCode] = await Promise.all([
+const [stdout, stderr, exitCode] = await Promise.all([
   new Response(proc.stdout).text(),
   new Response(proc.stderr).text(),
   proc.exited,
 ]);
 expect(stdout).toBe("a\n中\n");
+expect(stderr).toBe("");
 expect(exitCode).toBe(0);

-const [, , buildExitCode] = await Promise.all([
+const [, buildStderr, buildExitCode] = await Promise.all([
   new Response(buildProc.stdout).text(),
   new Response(buildProc.stderr).text(),
   buildProc.exited,
 ]);
+expect(buildStderr).toBe("");
 expect(buildExitCode).toBe(0);

As per coding guidelines, "Always check exit codes and test error scenarios when spawning processes in tests."

Also applies to: 28-35, 44-51, 60-67, 77-84, 101-108, 116-123, 137-143, 151-158

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/regression/issue/18115.test.ts` around lines 12 - 18, The test collects
stdout/stderr/exitCode from the spawned process but never asserts stderr is
empty; update the assertions where the variables are collected (the
destructuring of new Response(proc.stdout).text(), new
Response(proc.stderr).text(), proc.exited) to also assert
expect(stderr).toBe("") alongside expect(stdout).toBe(...) and
expect(exitCode).toBe(0), and apply the same added assertion to all other
success-path blocks in this file (the similar blocks at the ranges called out in
the review) so every successful spawn verifies stderr === "".

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 88-90: The test's padding currently uses String.repeat on the
"index.ts" source to exceed MINIMUM_CACHE_SIZE; replace that repeat usage with a
Buffer-based allocation per guideline: use Buffer.alloc with the same byte count
and a space fill, then call toString(), so the "index.ts" value uses
Buffer.alloc(..., ' ').toString() for padding instead of " ".repeat(64 * 1024);
update the single occurrence in the test file regression/issue/18115.test.ts
where the "index.ts" entry is constructed.

---

Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-18: The test collects stdout/stderr/exitCode from the spawned
process but never asserts stderr is empty; update the assertions where the
variables are collected (the destructuring of new Response(proc.stdout).text(),
new Response(proc.stderr).text(), proc.exited) to also assert
expect(stderr).toBe("") alongside expect(stdout).toBe(...) and
expect(exitCode).toBe(0), and apply the same added assertion to all other
success-path blocks in this file (the similar blocks at the ranges called out in
the review) so every successful spawn verifies stderr === "".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2a10f862-a2f2-47b8-9839-da3448ba3698

📥 Commits

Reviewing files that changed from the base of the PR and between 098b663 and 1a7b895.

📒 Files selected for processing (1)

test/regression/issue/18115.test.ts

test/regression/issue/18115.test.ts

claude

All three bugs from my previous review have been fixed in the latest commits (save/restore prefers_ascii, watcher ASCII gate, UTF-8 hash validation). However, this PR refactors a comptime parameter to a runtime field across the entire JS printer and changes the encoding model for module loading -- a human should verify the encoding transition from Latin-1 to UTF-8 doesn't have subtle downstream effects.

Extended reasoning...

Overview

This PR fixes non-ASCII character preservation in tagged template literals (String.raw) and regex .source by replacing the comptime ascii_only parameter on NewPrinter with a runtime prefers_ascii field. It touches 5 core source files (js_printer.zig, ModuleLoader.zig, AsyncModule.zig, RuntimeTranspilerCache.zig, RuntimeTranspilerStore.zig) and adds/updates 3 test files. The encoding model changes from cloneLatin1 to cloneUTF8 across all transpiler output paths, and the cache version is bumped to 19.

Security risks

No direct security risks identified. The changes are about character encoding correctness, not auth, crypto, or permissions. The cache hash validation was missing for UTF-8 but has been added.

Level of scrutiny

This deserves high scrutiny. The comptime-to-runtime refactor of ascii_only to prefers_ascii is a fundamental change to the JS printer that affects every code path that prints identifiers, strings, templates, and regex. The Latin-1 to UTF-8 encoding switch in module loading affects how JSC interprets transpiled source code. Three bugs were found during review (watcher path mojibake, permanent prefers_ascii flip, missing cache hash validation), all now fixed, but this pattern suggests the change has significant surface area for subtle issues.

Other factors

CI is showing build-zig failures across multiple platforms on the latest commit, though these appear to point at scripts/build/ci.ts rather than the changed files.
The PR is AI-generated (claude label) and has gone through multiple fix iterations.
The approach is based on PR #15047, which provides some design precedent.
Good regression test coverage was added, including a >50KB cache test that exercises the RuntimeTranspilerCache UTF-8 path.
All inline review comments (from both CodeRabbit and my previous review) have been addressed with code fixes.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 1-158: The test uses tempDir (which returns an AsyncDisposable)
but declares its disposables with plain using; change the tempDir disposables to
use await using to ensure proper async disposal. Replace the three declarations
that use tempDir—"using dir = tempDir(\"string-raw-rtc\", ...)", "using cacheDir
= tempDir(\"string-raw-rtc-cache\", ...)", and the later "using dir =
tempDir(\"string-raw-unicode\", ...)"—with await using, keeping the same
variable names (dir, cacheDir) and surrounding logic unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7d6b3508-ec65-4e69-8755-2ec912f6f2d4

📥 Commits

Reviewing files that changed from the base of the PR and between 1a7b895 and 5046bdf.

📒 Files selected for processing (1)

test/regression/issue/18115.test.ts

test/regression/issue/18115.test.ts

alii

The behavior fix is correct, but the comptime→runtime conversion in js_printer.zig is unnecessary and can be reverted.

The save/restore pattern is dead code. In both printRawTemplateLiteral and the regex block:

const saved = p.prefers_ascii;
defer p.prefers_ascii = saved;
if (p.prefers_ascii and !strings.isAllASCII(bytes)) p.prefers_ascii = false;
p.print(bytes);

p.print() forwards to p.writer.print() → BufferWriter.writeAll → buffer.append() — no access to prefers_ascii anywhere in the chain. The old escaping happened inline inside these functions (the deleted CodepointIterator loops); p.print() was always a raw byte write. Deleting the escaping loop and calling p.print(bytes) is the entire fix; toggling prefers_ascii around it has zero effect.

(Note: pfg's #15047 set prefers_ascii = false permanently, which at least affects subsequent printIdentifier calls. The save/restore added here scopes it to the one line that ignores it — strictly less effect.)

Because the runtime flag is never usefully mutated, the comptime→runtime conversion is pure complexity. All four PrinterType.init() callers pass compile-time constants. The field could go back to being the comptime ascii_only parameter.

Suggested minimal fix:

printRawTemplateLiteral: body becomes just p.print(bytes); (keep the comment explaining why we don't escape)
Regex print: delete the is_bun_platform escaping block, leave p.print(e.value);
Revert ascii_only back to a comptime parameter on NewPrinter — undo the propagation through canPrintWithoutEscape, writePreQuotedString, printIdentifier, printIdentifierUTF16, canPrintIdentifierUTF16
Keep everything else: cloneLatin1→cloneUTF8, watcher isAllASCII guard, cache version bump, cache UTF-8 hash check, tests. These are all correct and necessary.

Nit: tests use new Response(proc.stdout).text() — proc.stdout.text() is the simpler form.

Per review: the comptime ascii_only parameter doesn't need to become a runtime field. The fix is just removing the escaping loops from printRawTemplateLiteral and printRegExpLiteral — p.print(bytes) was always a raw byte write that never checked ascii_only. Also use proc.stdout.text() instead of new Response(proc.stdout).text() in tests per nit.

coderabbitai

♻️ Duplicate comments (1)

test/regression/issue/18115.test.ts (1)
12-16: ⚠️ Potential issue | 🟡 Minor

Assert stderr is empty on success-path spawns.

At Line 12, Line 28, Line 44, Line 60, Line 77, Line 101, Line 116, Line 137, and Line 151, stderr is read but discarded. That weakens regression signal because warnings/errors on stderr can slip through with exit code 0.
Suggested patch pattern
- const [stdout, , exitCode] = await Promise.all([
+ const [stdout, stderr, exitCode] = await Promise.all([
    proc.stdout.text(),
    proc.stderr.text(),
    proc.exited,
  ]);
  expect(stdout).toBe("a\n中\n");
+ expect(stderr).toBe("");
  expect(exitCode).toBe(0);
- const [, , buildExitCode] = await Promise.all([
+ const [, buildStderr, buildExitCode] = await Promise.all([
    buildProc.stdout.text(),
    buildProc.stderr.text(),
    buildProc.exited,
  ]);
+ expect(buildStderr).toBe("");
  expect(buildExitCode).toBe(0);
Based on learnings: "In oven-sh/bun regression tests, for files under test/regression/issue/*.test.ts, prefer explicit per-test setup and per-test assertions (including repeated tempDir, Bun.spawn, bunExe/bunEnv, and checks of stdout, stderr, and exitCode) rather than extracting shared helpers."

Also applies to: 28-32, 44-48, 60-64, 77-81, 101-105, 116-120, 137-141, 151-155
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/regression/issue/18115.test.ts` around lines 12 - 16, The current
Promise.all destructuring discards stderr (using proc.stderr.text()) which lets
warnings slip through; change each instance where you await [stdout, , exitCode]
from Promise.all to capture stderr (e.g., [stdout, stderr, exitCode] = await
Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited])) and add an
explicit assertion that stderr is empty on a success exit (for example
expect(exitCode).toBe(0); expect(stderr).toBe('') or equivalent), keeping the
existing stdout assertions; apply this change for the proc usage sites
(proc.stdout.text(), proc.stderr.text(), proc.exited) in the test so successful
spawns fail if anything was written to stderr.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-16: The current Promise.all destructuring discards stderr
(using proc.stderr.text()) which lets warnings slip through; change each
instance where you await [stdout, , exitCode] from Promise.all to capture stderr
(e.g., [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(),
proc.stderr.text(), proc.exited])) and add an explicit assertion that stderr is
empty on a success exit (for example expect(exitCode).toBe(0);
expect(stderr).toBe('') or equivalent), keeping the existing stdout assertions;
apply this change for the proc usage sites (proc.stdout.text(),
proc.stderr.text(), proc.exited) in the test so successful spawns fail if
anything was written to stderr.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6fff0fb9-6f61-4c31-aa57-620abb0de7c6

📥 Commits

Reviewing files that changed from the base of the PR and between 5046bdf and 546b4d1.

📒 Files selected for processing (2)

src/js_printer.zig
test/regression/issue/18115.test.ts

src/bun.js/AsyncModule.zig

src/js_printer.zig

- DevServer.zig: change cloneLatin1 to cloneUTF8 on the no-source-map HMR path (line 2564), matching the with-source-map path - AsyncModule.zig: move addFile watcher registration outside the ASCII guard so files are always registered for hot reload regardless of whether output contains non-ASCII bytes

src/js_printer.zig

- StandaloneModuleGraph.zig: use .utf8 encoding for JS output in bun build --compile, since bundler output may contain raw UTF-8 - OutputFile.zig: fall back to cloneUTF8 when output contains non-ASCII, since createExternal with isLatin1=true can't handle multi-byte UTF-8 sequences

claude

All bugs found during review have been addressed, but this PR touches encoding-sensitive paths across 11 files (module loader, watcher, HMR, compile, caching) and required 7+ fix commits during review — it would benefit from a human look to confirm no edge cases remain.

Extended reasoning...

Overview

This PR fixes non-ASCII character handling in tagged template literals (String.raw) and regex .source by removing the \uXXXX escape logic from printRawTemplateLiteral and printRegExpLiteral in js_printer.zig. Since output can now contain raw UTF-8 multi-byte sequences, all downstream consumers were updated from cloneLatin1 to cloneUTF8: ModuleLoader.zig, AsyncModule.zig, RuntimeTranspilerStore.zig, RuntimeTranspilerCache.zig, DevServer.zig, StandaloneModuleGraph.zig, and OutputFile.zig. The watcher paths use an isAllASCII guard to preserve the ref-counted Latin-1 fast path when possible.

Security risks

No direct security concerns — this is an encoding correctness fix. The change ensures runtime string values match source intent, which is arguably more correct from a security perspective (no silent data transformation).

Level of scrutiny

High scrutiny warranted. This PR modifies encoding assumptions across the entire module loading stack. During review, 7 distinct bugs were found and fixed (3 from my reviews, 3 from CodeRabbit, 1 from a human reviewer), spanning: missing UTF-8 hash validation in the cache, watcher paths skipping addFile registration, permanent prefers_ascii flag mutation, DevServer HMR using wrong encoding, and StandaloneModuleGraph/OutputFile hardcoding Latin-1. The volume of issues found suggests the initial implementation was incomplete, and while all were addressed, the pattern raises concern about untested edge cases in less common code paths.

Other factors

The approach was revised mid-review (commit 546b4d1 reverted the comptime-to-runtime prefers_ascii conversion in favor of simply always emitting raw bytes for template literals and regex). This is a cleaner design. Test coverage is comprehensive — runtime, build, cache (with >50KB file to exceed MINIMUM_CACHE_SIZE), and the previously-skipped 14976 test is now enabled. The cache version was bumped to 19. However, the claude label indicates this was AI-authored, and the number of bugs found during review suggests a human should verify the final state.

robobun mentioned this pull request Mar 18, 2026

String.raw Iterator encoding error #18115

Open

github-actions bot added the claude label Mar 18, 2026

[autofix.ci] apply automated fixes

7c060a4

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

src/bun.js/RuntimeTranspilerCache.zig Show resolved Hide resolved

src/js_printer.zig Show resolved Hide resolved

test/regression/issue/18115.test.ts Outdated Show resolved Hide resolved

claude bot reviewed Mar 18, 2026

View reviewed changes

src/bun.js/ModuleLoader.zig Show resolved Hide resolved

src/js_printer.zig Show resolved Hide resolved

claude bot reviewed Mar 18, 2026

View reviewed changes

src/bun.js/RuntimeTranspilerCache.zig Show resolved Hide resolved

robobun and others added 2 commits March 18, 2026 07:54

[autofix.ci] apply automated fixes

098b663

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

test/regression/issue/18115.test.ts Show resolved Hide resolved

robobun and others added 2 commits March 18, 2026 08:18

Add runtime transpiler cache regression test for non-ASCII

4e3001a

[autofix.ci] apply automated fixes

1a7b895

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

test/regression/issue/18115.test.ts Show resolved Hide resolved

claude bot reviewed Mar 18, 2026

View reviewed changes

alii and others added 2 commits March 18, 2026 08:12

Merge branch 'main' into farm/2c1f0120/fix-template-literal-unicode

e6e7575

Use Buffer.alloc for test padding

5046bdf

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

test/regression/issue/18115.test.ts Show resolved Hide resolved

alii reviewed Mar 18, 2026

View reviewed changes

robobun and others added 2 commits March 18, 2026 16:39

[autofix.ci] apply automated fixes

30a63cd

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

claude bot reviewed Mar 18, 2026

View reviewed changes

src/bun.js/AsyncModule.zig Outdated Show resolved Hide resolved

src/js_printer.zig Show resolved Hide resolved

claude bot reviewed Mar 18, 2026

View reviewed changes

src/js_printer.zig Show resolved Hide resolved

claude bot reviewed Mar 18, 2026

View reviewed changes

Conversation

robobun commented Mar 18, 2026

Reproduction

Root Cause

Fix

Verification

Uh oh!

robobun commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alii left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robobun commented Mar 18, 2026 •

edited

Loading

coderabbitai bot commented Mar 18, 2026 •

edited

Loading