Preserve non-ASCII in tagged template literals and regex source#28214
Preserve non-ASCII in tagged template literals and regex source#28214
Conversation
The transpiler's printRawTemplateLiteral was escaping non-ASCII characters to \uXXXX sequences, but raw template literals must preserve the exact source bytes since they are exposed to JavaScript via String.raw and the .raw property. Same issue in printRegExpLiteral for regex .source property. Replace the comptime ascii_only parameter on NewPrinter with a runtime prefers_ascii field. When printing raw template content or regex source containing non-ASCII, disable ASCII-only mode so bytes are preserved verbatim. Switch cloneLatin1 to cloneUTF8 for transpiler output since it may now contain non-ASCII, and bump the transpiler cache version. Closes #18115 Closes #8745 Closes #15492 Closes #16763 Closes #8207
|
Updated 1:51 PM PT - Mar 18th, 2026
❌ @robobun, your commit b9c8d67 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 28214That installs a local version of the PR into your bun-28214 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughTightens watcher-only path to require ASCII output; switches cloning from Latin‑1 to UTF‑8 across transpilation and cache paths (cache version bumped, UTF‑8 hash checks added); adjusts JS printing to emit raw bytes for template/RegExp literals; adds/enables tests covering non‑ASCII handling. Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/RuntimeTranspilerCache.zig`:
- Around line 690-698: Entry.load()'s .utf8 branch currently accepts cached
output without verifying metadata.output_hash, allowing same-length corruption
to pass; update the .utf8 load path to compute the output hash for the restored
UTF‑8 bytes (using the same hash function/format used by the .latin1 and .utf16
branches) and compare it against metadata.output_hash, rejecting the entry on
mismatch (same behavior/cleanup path as used in the .latin1/.utf16 branches) so
that the new entries created where output_code = bun.String.cloneUTF8(...) are
validated before being returned.
In `@src/js_printer.zig`:
- Around line 1968-1974: The function printRawTemplateLiteral currently flips
the printer-global flag p.prefers_ascii for all subsequent output; change this
to a temporary/local override: save the original p.prefers_ascii, set
p.prefers_ascii = false only while emitting the raw template bytes with
p.print(bytes), then restore the original value immediately after; apply the
same pattern to the RegExp emission code referenced around the other occurrence
(the RegExp emission function) so neither printRawTemplateLiteral nor the RegExp
emitter mutate the global prefers_ascii state permanently.
In `@test/regression/issue/18115.test.ts`:
- Around line 6-16: All Bun.spawn invocations (creating variable proc) should
use the resource-safe pattern "await using proc = Bun.spawn(...)" instead of
"const proc = Bun.spawn(...)" so processes are disposed automatically; after
awaiting outputs replace the existing Promise.all usage with reads that capture
stdout and stderr (e.g., new Response(proc.stdout).text(), new
Response(proc.stderr).text()) and assert that stderr is empty on success paths
to catch warnings, and continue to await proc.exited for exitCode; additionally,
for the build step that consumes both streams, make sure to fully drain both
stdout and stderr before asserting the exit code (i.e., read both streams to
completion then check exitCode) and keep references to stdout/stderr variables
when making the assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4a167012-b7d5-42a7-b0b5-7f9407a082ea
📒 Files selected for processing (8)
src/bun.js/AsyncModule.zigsrc/bun.js/ModuleLoader.zigsrc/bun.js/RuntimeTranspilerCache.zigsrc/bun.js/RuntimeTranspilerStore.zigsrc/js_printer.zigtest/js/bun/shell/bunshell.test.tstest/regression/issue/14976/14976.test.tstest/regression/issue/18115.test.ts
…dation - Save and restore prefers_ascii in printRawTemplateLiteral and printRegExpLiteral so the flag only affects the specific literal being printed, not all subsequent output - Skip the Latin-1 ref-counted watcher path when output contains non-ASCII bytes, falling through to the cloneUTF8 path - Add hash validation to the UTF-8 cache load path to match the Latin-1 and UTF-16 branches
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/js_printer.zig (1)
5893-5909:⚠️ Potential issue | 🔴 CriticalFix
prefers_asciiwiring for ASCII format variants.The change at lines 5908 and 6121 binds
prefers_asciitois_bun_platform, but this breaks the intended behavior ofFormat.esm_asciiandFormat.cjs_ascii. These formats should always force ASCII-only output (escaping non-ASCII characters), regardless of target platform.Current issue: In
src/transpiler.zigline 849–856, the.esm_asciicase passesis_bunas theis_bun_platformparameter toprintAst(), which then becomesprefers_ascii. On non-Bun targets, this results inprefers_ascii=false, causing non-ASCII characters to be emitted when they should always be escaped.Expected:
Format.esm_asciishould map toprefers_ascii=trueandFormat.esmtoprefers_ascii=false, independent of platform. Similarly forcjsvs.cjs_ascii. The format information must be preserved through the call chain, not lost.Also note: No corresponding handling exists for
Format.cjs_asciiin thetranspiler.zigformat switch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/js_printer.zig` around lines 5893 - 5909, The printer is being given is_bun_platform as the prefers_ascii flag (see PrinterType.init call passing is_bun_platform), which loses the Format.*_ascii semantics; compute a dedicated prefers_ascii boolean from the selected Format (e.g. true for Format.esm_ascii and Format.cjs_ascii, false for Format.esm and Format.cjs) and pass that boolean into PrinterType.init instead of is_bun_platform; update the call in printAst()/transpiler code paths (reference the printAst callsite and the Format switch in transpiler.zig) to set prefers_ascii correctly and add the missing handling for Format.cjs_ascii so ASCII-forcing formats always set prefers_ascii = true independent of platform.
♻️ Duplicate comments (1)
test/regression/issue/18115.test.ts (1)
12-18:⚠️ Potential issue | 🟡 MinorAssert
stderron successful runs.Each process already drains
stderr, but the value is discarded. Keep it and assertexpect(stderr).toBe("")so warnings/regressions on the success path fail this suite.Also applies to: 28-34, 44-50, 60-66, 77-83, 98-103, 112-118
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/regression/issue/18115.test.ts` around lines 12 - 18, The tests currently discard stderr in the Promise.all destructuring (const [stdout, , exitCode] = ...) which hides warnings; change the destructuring to capture stderr (const [stdout, stderr, exitCode] = await Promise.all([...])) and add an assertion expect(stderr).toBe("") immediately after the stdout assertion in each test block (e.g., the block containing the Promise.all and expect(stdout).toBe("a\n中\n")). Apply the same change to the other similar blocks mentioned (the ones around lines 28–34, 44–50, 60–66, 77–83, 98–103, 112–118) so all runs verify stderr is empty on success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Line 2: Add a new regression test in test/regression/issue/18115.test.ts that
creates a temp directory and a >50 KB temp file, then invokes the binary twice
with bunEnv overridden to set only BUN_RUNTIME_TRANSPILER_CACHE_PATH to that
temp dir (use bunExe and tempDir fixtures), asserting the first run writes the
cache and the second run restores from it; this ensures the
RuntimeTranspilerCache path in src/bun.js/RuntimeTranspilerCache.zig
(MINIMUM_CACHE_SIZE = 50 KB) is exercised. Also duplicate the same cache-enabled
case for the lines noted around 87-89 so both variants run with the cache
override. Ensure you only override BUN_RUNTIME_TRANSPILER_CACHE_PATH in the
environment and keep other bunEnv behavior unchanged.
---
Outside diff comments:
In `@src/js_printer.zig`:
- Around line 5893-5909: The printer is being given is_bun_platform as the
prefers_ascii flag (see PrinterType.init call passing is_bun_platform), which
loses the Format.*_ascii semantics; compute a dedicated prefers_ascii boolean
from the selected Format (e.g. true for Format.esm_ascii and Format.cjs_ascii,
false for Format.esm and Format.cjs) and pass that boolean into PrinterType.init
instead of is_bun_platform; update the call in printAst()/transpiler code paths
(reference the printAst callsite and the Format switch in transpiler.zig) to set
prefers_ascii correctly and add the missing handling for Format.cjs_ascii so
ASCII-forcing formats always set prefers_ascii = true independent of platform.
---
Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-18: The tests currently discard stderr in the Promise.all
destructuring (const [stdout, , exitCode] = ...) which hides warnings; change
the destructuring to capture stderr (const [stdout, stderr, exitCode] = await
Promise.all([...])) and add an assertion expect(stderr).toBe("") immediately
after the stdout assertion in each test block (e.g., the block containing the
Promise.all and expect(stdout).toBe("a\n中\n")). Apply the same change to the
other similar blocks mentioned (the ones around lines 28–34, 44–50, 60–66,
77–83, 98–103, 112–118) so all runs verify stderr is empty on success.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f04d29fc-00fd-46f3-88c4-4740c764a464
📒 Files selected for processing (5)
src/bun.js/AsyncModule.zigsrc/bun.js/ModuleLoader.zigsrc/bun.js/RuntimeTranspilerCache.zigsrc/js_printer.zigtest/regression/issue/18115.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/regression/issue/18115.test.ts (1)
12-18:⚠️ Potential issue | 🟡 MinorAssert
stderris empty on success paths.These tests read
stderrbut never assert it, so warnings/regressions can slip through while exit code remains0. Please assertstderr === ""in each success path (including the build step).Suggested patch pattern
-const [stdout, , exitCode] = await Promise.all([ +const [stdout, stderr, exitCode] = await Promise.all([ new Response(proc.stdout).text(), new Response(proc.stderr).text(), proc.exited, ]); expect(stdout).toBe("a\n中\n"); +expect(stderr).toBe(""); expect(exitCode).toBe(0);-const [, , buildExitCode] = await Promise.all([ +const [, buildStderr, buildExitCode] = await Promise.all([ new Response(buildProc.stdout).text(), new Response(buildProc.stderr).text(), buildProc.exited, ]); +expect(buildStderr).toBe(""); expect(buildExitCode).toBe(0);As per coding guidelines, "Always check exit codes and test error scenarios when spawning processes in tests."
Also applies to: 28-35, 44-51, 60-67, 77-84, 101-108, 116-123, 137-143, 151-158
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/regression/issue/18115.test.ts` around lines 12 - 18, The test collects stdout/stderr/exitCode from the spawned process but never asserts stderr is empty; update the assertions where the variables are collected (the destructuring of new Response(proc.stdout).text(), new Response(proc.stderr).text(), proc.exited) to also assert expect(stderr).toBe("") alongside expect(stdout).toBe(...) and expect(exitCode).toBe(0), and apply the same added assertion to all other success-path blocks in this file (the similar blocks at the ranges called out in the review) so every successful spawn verifies stderr === "".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 88-90: The test's padding currently uses String.repeat on the
"index.ts" source to exceed MINIMUM_CACHE_SIZE; replace that repeat usage with a
Buffer-based allocation per guideline: use Buffer.alloc with the same byte count
and a space fill, then call toString(), so the "index.ts" value uses
Buffer.alloc(..., ' ').toString() for padding instead of " ".repeat(64 * 1024);
update the single occurrence in the test file regression/issue/18115.test.ts
where the "index.ts" entry is constructed.
---
Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-18: The test collects stdout/stderr/exitCode from the spawned
process but never asserts stderr is empty; update the assertions where the
variables are collected (the destructuring of new Response(proc.stdout).text(),
new Response(proc.stderr).text(), proc.exited) to also assert
expect(stderr).toBe("") alongside expect(stdout).toBe(...) and
expect(exitCode).toBe(0), and apply the same added assertion to all other
success-path blocks in this file (the similar blocks at the ranges called out in
the review) so every successful spawn verifies stderr === "".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2a10f862-a2f2-47b8-9839-da3448ba3698
📒 Files selected for processing (1)
test/regression/issue/18115.test.ts
There was a problem hiding this comment.
All three bugs from my previous review have been fixed in the latest commits (save/restore prefers_ascii, watcher ASCII gate, UTF-8 hash validation). However, this PR refactors a comptime parameter to a runtime field across the entire JS printer and changes the encoding model for module loading -- a human should verify the encoding transition from Latin-1 to UTF-8 doesn't have subtle downstream effects.
Extended reasoning...
Overview
This PR fixes non-ASCII character preservation in tagged template literals (String.raw) and regex .source by replacing the comptime ascii_only parameter on NewPrinter with a runtime prefers_ascii field. It touches 5 core source files (js_printer.zig, ModuleLoader.zig, AsyncModule.zig, RuntimeTranspilerCache.zig, RuntimeTranspilerStore.zig) and adds/updates 3 test files. The encoding model changes from cloneLatin1 to cloneUTF8 across all transpiler output paths, and the cache version is bumped to 19.
Security risks
No direct security risks identified. The changes are about character encoding correctness, not auth, crypto, or permissions. The cache hash validation was missing for UTF-8 but has been added.
Level of scrutiny
This deserves high scrutiny. The comptime-to-runtime refactor of ascii_only to prefers_ascii is a fundamental change to the JS printer that affects every code path that prints identifiers, strings, templates, and regex. The Latin-1 to UTF-8 encoding switch in module loading affects how JSC interprets transpiled source code. Three bugs were found during review (watcher path mojibake, permanent prefers_ascii flip, missing cache hash validation), all now fixed, but this pattern suggests the change has significant surface area for subtle issues.
Other factors
- CI is showing build-zig failures across multiple platforms on the latest commit, though these appear to point at
scripts/build/ci.tsrather than the changed files. - The PR is AI-generated (claude label) and has gone through multiple fix iterations.
- The approach is based on PR #15047, which provides some design precedent.
- Good regression test coverage was added, including a >50KB cache test that exercises the RuntimeTranspilerCache UTF-8 path.
- All inline review comments (from both CodeRabbit and my previous review) have been addressed with code fixes.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 1-158: The test uses tempDir (which returns an AsyncDisposable)
but declares its disposables with plain using; change the tempDir disposables to
use await using to ensure proper async disposal. Replace the three declarations
that use tempDir—"using dir = tempDir(\"string-raw-rtc\", ...)", "using cacheDir
= tempDir(\"string-raw-rtc-cache\", ...)", and the later "using dir =
tempDir(\"string-raw-unicode\", ...)"—with await using, keeping the same
variable names (dir, cacheDir) and surrounding logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7d6b3508-ec65-4e69-8755-2ec912f6f2d4
📒 Files selected for processing (1)
test/regression/issue/18115.test.ts
alii
left a comment
There was a problem hiding this comment.
The behavior fix is correct, but the comptime→runtime conversion in js_printer.zig is unnecessary and can be reverted.
The save/restore pattern is dead code. In both printRawTemplateLiteral and the regex block:
const saved = p.prefers_ascii;
defer p.prefers_ascii = saved;
if (p.prefers_ascii and !strings.isAllASCII(bytes)) p.prefers_ascii = false;
p.print(bytes);p.print() forwards to p.writer.print() → BufferWriter.writeAll → buffer.append() — no access to prefers_ascii anywhere in the chain. The old escaping happened inline inside these functions (the deleted CodepointIterator loops); p.print() was always a raw byte write. Deleting the escaping loop and calling p.print(bytes) is the entire fix; toggling prefers_ascii around it has zero effect.
(Note: pfg's #15047 set prefers_ascii = false permanently, which at least affects subsequent printIdentifier calls. The save/restore added here scopes it to the one line that ignores it — strictly less effect.)
Because the runtime flag is never usefully mutated, the comptime→runtime conversion is pure complexity. All four PrinterType.init() callers pass compile-time constants. The field could go back to being the comptime ascii_only parameter.
Suggested minimal fix:
printRawTemplateLiteral: body becomes justp.print(bytes);(keep the comment explaining why we don't escape)- Regex print: delete the
is_bun_platformescaping block, leavep.print(e.value); - Revert
ascii_onlyback to acomptimeparameter onNewPrinter— undo the propagation throughcanPrintWithoutEscape,writePreQuotedString,printIdentifier,printIdentifierUTF16,canPrintIdentifierUTF16 - Keep everything else:
cloneLatin1→cloneUTF8, watcherisAllASCIIguard, cache version bump, cache UTF-8 hash check, tests. These are all correct and necessary.
Nit: tests use new Response(proc.stdout).text() — proc.stdout.text() is the simpler form.
Per review: the comptime ascii_only parameter doesn't need to become a runtime field. The fix is just removing the escaping loops from printRawTemplateLiteral and printRegExpLiteral — p.print(bytes) was always a raw byte write that never checked ascii_only. Also use proc.stdout.text() instead of new Response(proc.stdout).text() in tests per nit.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/regression/issue/18115.test.ts (1)
12-16:⚠️ Potential issue | 🟡 MinorAssert
stderris empty on success-path spawns.At Line 12, Line 28, Line 44, Line 60, Line 77, Line 101, Line 116, Line 137, and Line 151,
stderris read but discarded. That weakens regression signal because warnings/errors on stderr can slip through with exit code0.Suggested patch pattern
- const [stdout, , exitCode] = await Promise.all([ + const [stdout, stderr, exitCode] = await Promise.all([ proc.stdout.text(), proc.stderr.text(), proc.exited, ]); expect(stdout).toBe("a\n中\n"); + expect(stderr).toBe(""); expect(exitCode).toBe(0);- const [, , buildExitCode] = await Promise.all([ + const [, buildStderr, buildExitCode] = await Promise.all([ buildProc.stdout.text(), buildProc.stderr.text(), buildProc.exited, ]); + expect(buildStderr).toBe(""); expect(buildExitCode).toBe(0);Based on learnings: "In oven-sh/bun regression tests, for files under test/regression/issue/*.test.ts, prefer explicit per-test setup and per-test assertions (including repeated tempDir, Bun.spawn, bunExe/bunEnv, and checks of stdout, stderr, and exitCode) rather than extracting shared helpers."
Also applies to: 28-32, 44-48, 60-64, 77-81, 101-105, 116-120, 137-141, 151-155
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/regression/issue/18115.test.ts` around lines 12 - 16, The current Promise.all destructuring discards stderr (using proc.stderr.text()) which lets warnings slip through; change each instance where you await [stdout, , exitCode] from Promise.all to capture stderr (e.g., [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited])) and add an explicit assertion that stderr is empty on a success exit (for example expect(exitCode).toBe(0); expect(stderr).toBe('') or equivalent), keeping the existing stdout assertions; apply this change for the proc usage sites (proc.stdout.text(), proc.stderr.text(), proc.exited) in the test so successful spawns fail if anything was written to stderr.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/regression/issue/18115.test.ts`:
- Around line 12-16: The current Promise.all destructuring discards stderr
(using proc.stderr.text()) which lets warnings slip through; change each
instance where you await [stdout, , exitCode] from Promise.all to capture stderr
(e.g., [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(),
proc.stderr.text(), proc.exited])) and add an explicit assertion that stderr is
empty on a success exit (for example expect(exitCode).toBe(0);
expect(stderr).toBe('') or equivalent), keeping the existing stdout assertions;
apply this change for the proc usage sites (proc.stdout.text(),
proc.stderr.text(), proc.exited) in the test so successful spawns fail if
anything was written to stderr.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6fff0fb9-6f61-4c31-aa57-620abb0de7c6
📒 Files selected for processing (2)
src/js_printer.zigtest/regression/issue/18115.test.ts
- DevServer.zig: change cloneLatin1 to cloneUTF8 on the no-source-map HMR path (line 2564), matching the with-source-map path - AsyncModule.zig: move addFile watcher registration outside the ASCII guard so files are always registered for hot reload regardless of whether output contains non-ASCII bytes
- StandaloneModuleGraph.zig: use .utf8 encoding for JS output in bun build --compile, since bundler output may contain raw UTF-8 - OutputFile.zig: fall back to cloneUTF8 when output contains non-ASCII, since createExternal with isLatin1=true can't handle multi-byte UTF-8 sequences
There was a problem hiding this comment.
All bugs found during review have been addressed, but this PR touches encoding-sensitive paths across 11 files (module loader, watcher, HMR, compile, caching) and required 7+ fix commits during review — it would benefit from a human look to confirm no edge cases remain.
Extended reasoning...
Overview
This PR fixes non-ASCII character handling in tagged template literals (String.raw) and regex .source by removing the \uXXXX escape logic from printRawTemplateLiteral and printRegExpLiteral in js_printer.zig. Since output can now contain raw UTF-8 multi-byte sequences, all downstream consumers were updated from cloneLatin1 to cloneUTF8: ModuleLoader.zig, AsyncModule.zig, RuntimeTranspilerStore.zig, RuntimeTranspilerCache.zig, DevServer.zig, StandaloneModuleGraph.zig, and OutputFile.zig. The watcher paths use an isAllASCII guard to preserve the ref-counted Latin-1 fast path when possible.
Security risks
No direct security concerns — this is an encoding correctness fix. The change ensures runtime string values match source intent, which is arguably more correct from a security perspective (no silent data transformation).
Level of scrutiny
High scrutiny warranted. This PR modifies encoding assumptions across the entire module loading stack. During review, 7 distinct bugs were found and fixed (3 from my reviews, 3 from CodeRabbit, 1 from a human reviewer), spanning: missing UTF-8 hash validation in the cache, watcher paths skipping addFile registration, permanent prefers_ascii flag mutation, DevServer HMR using wrong encoding, and StandaloneModuleGraph/OutputFile hardcoding Latin-1. The volume of issues found suggests the initial implementation was incomplete, and while all were addressed, the pattern raises concern about untested edge cases in less common code paths.
Other factors
The approach was revised mid-review (commit 546b4d1 reverted the comptime-to-runtime prefers_ascii conversion in favor of simply always emitting raw bytes for template literals and regex). This is a cleaner design. Test coverage is comprehensive — runtime, build, cache (with >50KB file to exceed MINIMUM_CACHE_SIZE), and the previously-skipped 14976 test is now enabled. The cache version was bumped to 19. However, the claude label indicates this was AI-authored, and the number of bugs found during review suggests a human should verify the final state.
Reproduction
Root Cause
In
src/js_printer.zig, theprintRawTemplateLiteralfunction escapes non-ASCII characters to\uXXXXsequences whenascii_onlyis true (which it is for all bun platform output). But raw template literals must preserve the exact source bytes, since they are exposed to JavaScript viaString.rawand the.rawproperty of tagged template arguments. Escaping中to\u4E2Dchanges the runtime string value from one character to six.The same issue exists in
printRegExpLiteral, where non-ASCII characters in regex source are escaped even though the.sourceproperty is visible to JavaScript.Fix
ascii_onlyparameter onNewPrinterwith a runtimeprefers_asciifieldprefers_ascii = falseso bytes are preserved verbatimcloneLatin1→cloneUTF8for transpiler output since it may now contain non-ASCII bytesBased on the approach from #15047 by @pfg.
Verification
USE_SYSTEM_BUN=1 bun test test/regression/issue/18115.test.ts→ 6 failures (bug exists)bun bd test test/regression/issue/18115.test.ts→ 6 passes (fix works)test/regression/issue/14976/14976.test.tsnow passesbunshell.test.tsupdated with correct expectationFixes #18115, fixes #8745, fixes #15492, fixes #16763, fixes #8207