Skip to content

fix(heredoc): handle unbalanced single quotes and backticks in heredoc bodies inside $(…) (#420)#1067

Merged
reubeno merged 1 commit intoreubeno:mainfrom
StudioLE:#420-fix-unbalanced-single-quote-backtick-in-heredoc-command-sub
Mar 28, 2026
Merged

fix(heredoc): handle unbalanced single quotes and backticks in heredoc bodies inside $(…) (#420)#1067
reubeno merged 1 commit intoreubeno:mainfrom
StudioLE:#420-fix-unbalanced-single-quote-backtick-in-heredoc-command-sub

Conversation

@StudioLE
Copy link
Copy Markdown
Contributor

@StudioLE StudioLE commented Mar 17, 2026

Fixes #420

Test results without fix

---
source: brush-parser/src/word.rs
expression: "test_parse(\"\\\"$(cat <<'EOF'\\na ` b\\nEOF\\n)\\\"\")?"
---
ParseTestResults(
  input: "\"$(cat <<\'EOF\'\na ` b\nEOF\n)\"",
  result: [
    WordPieceWithSource(
      piece: DoubleQuotedSequence([
        WordPieceWithSource(
          piece: Text("$"),
          start_index: 1,
          end_index: 2,
        ),
        WordPieceWithSource(
          piece: Text("(cat <<\'EOF\'\na ` b\nEOF\n)"),
          start_index: 2,
          end_index: 26,
        ),
      ]),
      start_index: 0,
      end_index: 27,
    ),
  ],
)
---
source: brush-parser/src/word.rs
expression: "test_parse(\"\\\"$(cat <<'EOF'\\nit's here\\nEOF\\n)\\\"\")?"
---
ParseTestResults(
  input: "\"$(cat <<\'EOF\'\nit\'s here\nEOF\n)\"",
  result: [
    WordPieceWithSource(
      piece: DoubleQuotedSequence([
        WordPieceWithSource(
          piece: Text("$"),
          start_index: 1,
          end_index: 2,
        ),
        WordPieceWithSource(
          piece: Text("(cat <<\'EOF\'\nit\'s here\nEOF\n)"),
          start_index: 2,
          end_index: 30,
        ),
      ]),
      start_index: 0,
      end_index: 31,
    ),
  ],
)
* Test case: [Here doc with unpaired single quote in command substitution]... 
    Oracle comparison:
      status matches (exit status: 0) ✔️
      stdout DIFFERS:
          ------ Oracle <> Test: stdout ---------------------------------
          + $(cat <<'EOF'
            it's here
          + EOF
          + )
            
          ---------------------------------------------------------------
      stderr matches ✔️
      temp dir matches ✔️
    FAILED.
* Test case: [Here doc with unpaired backtick in command substitution]... 
    Oracle comparison:
      status matches (exit status: 0) ✔️
      stdout DIFFERS:
          ------ Oracle <> Test: stdout ---------------------------------
          + $(cat <<'EOF'
            a ` b
          + EOF
          + )
            
          ---------------------------------------------------------------
      stderr matches ✔️
      temp dir matches ✔️
    FAILED.
* Test case: [Here doc with three single quotes in command substitution]... 
    Oracle comparison:
      status matches (exit status: 0) ✔️
      stdout DIFFERS:
          ------ Oracle <> Test: stdout ---------------------------------
          + $(cat <<'EOF'
            it's Bob's it's
          + EOF
          + )
            
          ---------------------------------------------------------------
      stderr matches ✔️
      temp dir matches ✔️
    FAILED.
* Test case: [Here doc with unpaired single quote in command substitution (#420)]... 
    Oracle comparison:
      status matches (exit status: 0) ✔️
      stdout DIFFERS:
          ------ Oracle <> Test: stdout ---------------------------------
          - that's "cool"
          + 
            
          ---------------------------------------------------------------
      stderr DIFFERS:
          ------ Oracle <> Test: stderr ---------------------------------
          + error: failed to parse word '$(
          +   sed 's/nice/cool/' - <<-EOF
          +   that's
          +   "nice"
          + EOF
          + )'
          + 
          ---------------------------------------------------------------
      temp dir matches ✔️
    FAILED.

Context

The tokenizer flattens heredoc bodies into $(...) token text. When the body contains unbalanced single quotes or backticks, the word parser's quote-matching rules fail to consume them, causing the entire command substitution parse to fail.

If you compare the snapshots of parse_command_sub_with_unbalanced_single_quote and parse_command_sub_with_unbalanced_backtick prior to the fix with the snapshot of brush_parser__word__tests__parse_command_sub_with_balanced_backticks you can clearly see the parser is failing to match the CommandSubstitution.

Solution

The solution is to add a fallback in command_piece so orphan ' and ` are consumed as literal characters when word_piece cannot match them.

Caution

I believe this is safe because of PEG's ordered choice semantics: properly paired quotes are consumed by word_piece first; the fallback only fires for characters no other rule can handle.

However, it would be worth a maintainer validating this

Note

Unbalanced " inside heredoc bodies within "$(…)" is a separate issue tracked in #1066. The " is consumed by the outer double_quoted_sequence rule before command_piece is reached, so this fix does not address it.


RE: Contributor Policy

Please be transparent about your use of AI.

It also remains your responsibility to review, test, understand, and vouch for all code and content that you submit. Accordingly, you must fully understand all code and other content in your contributions and be able to explain their behavior and purpose.

  • I have reviewed, tested and understand WHY the fix works.

Caution

  • However, the fix is inside peg::parser! macro and I definitely do not have knowledge of how the macro works. Therefore I'm heavily relying on existing tests to identify if this causes regressions.

reubeno#420)

The tokenizer flattens heredoc bodies into `$(...)` token text. When the
body contains unbalanced single quotes or backticks, the word parser's
quote-matching rules fail to consume them, causing the entire command
substitution parse to fail.

Add a fallback in `command_piece` so orphan `'` and `` ` `` are consumed
as literal characters when `word_piece` cannot match them.
@github-actions
Copy link
Copy Markdown

Performance Benchmark Report

Benchmark name Baseline (μs) Test/PR (μs) Delta (μs) Delta %
clone_shell_object 17.01 μs 17.17 μs 0.16 μs 🟠 +0.93%
eval_arithmetic 0.15 μs 0.15 μs -0.00 μs ⚪ Unchanged
expand_one_string 1.51 μs 1.52 μs 0.01 μs ⚪ Unchanged
for_loop 27.05 μs 26.77 μs -0.28 μs 🟢 -1.05%
full_peg_complex 57.85 μs 57.68 μs -0.17 μs ⚪ Unchanged
full_peg_for_loop 6.25 μs 6.20 μs -0.05 μs 🟢 -0.85%
full_peg_nested_expansions 16.57 μs 16.57 μs -0.01 μs ⚪ Unchanged
full_peg_pipeline 4.26 μs 4.21 μs -0.05 μs 🟢 -1.13%
full_peg_simple 1.82 μs 1.79 μs -0.03 μs 🟢 -1.37%
function_call 2.94 μs 2.97 μs 0.02 μs ⚪ Unchanged
instantiate_shell 56.13 μs 55.63 μs -0.50 μs ⚪ Unchanged
instantiate_shell_with_init_scripts 26429.60 μs 27457.51 μs 1027.91 μs 🟠 +3.89%
parse_peg_bash_completion 2086.11 μs 2075.02 μs -11.10 μs ⚪ Unchanged
parse_peg_complex 19.35 μs 19.94 μs 0.59 μs 🟠 +3.04%
parse_peg_for_loop 2.04 μs 2.00 μs -0.05 μs 🟢 -2.35%
parse_peg_pipeline 2.16 μs 2.15 μs -0.01 μs ⚪ Unchanged
parse_peg_simple 1.12 μs 1.10 μs -0.01 μs 🟢 -1.25%
run_echo_builtin_command 18.17 μs 18.54 μs 0.37 μs ⚪ Unchanged
tokenize_sample_script 3.43 μs 3.36 μs -0.07 μs 🟢 -2.07%

Code Coverage Report: Only Changed Files listed

Package Base Coverage New Coverage Difference
brush-core/src/results.rs 🟢 83.33% 🟢 80.16% 🔴 -3.17%
brush-parser/src/word.rs 🟢 93.7% 🟢 94% 🟢 0.3%
Overall Coverage 🟢 75.39% 🟢 75.4% 🟢 0.01%

Minimum allowed coverage is 70%, this run produced 75.4%

Test Summary: bash-completion test suite

Outcome Count Percentage
✅ Pass 1583 75.06
❗️ Error 18 0.85
❌ Fail 154 7.30
⏩ Skip 339 16.07
❎ Expected Fail 13 0.62
✔️ Unexpected Pass 2 0.09
📊 Total 2109 100.00

@reubeno reubeno added the state: needs review PR or issue author is waiting for a review label Mar 27, 2026
@reubeno
Copy link
Copy Markdown
Owner

reubeno commented Mar 28, 2026

@StudioLE Thanks for another contribution, and for the detailed summary 😄

Thanks also for your patience -- I've been extremely busy and hadn't yet had time to give the change the review it deserves. I aim to get to it within the next couple of days.

Copy link
Copy Markdown
Owner

@reubeno reubeno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic behind the fix seems sound. I think there are more cases that will be problematic for us in this area (e.g., perhaps unmatched parentheses chars inside the embedded here doc?) -- but let's at least get in this fix.

@reubeno reubeno merged commit dd57e07 into reubeno:main Mar 28, 2026
43 checks passed
takeshiD pushed a commit to takeshiD/brush that referenced this pull request Mar 30, 2026
…c bodies inside `$(…)` (reubeno#1067)

Adds a fallback in `command_piece` so orphan `'` and `` ` `` are consumed
as literal characters when `word_piece` cannot match them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

state: needs review PR or issue author is waiting for a review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed to parse undelimited quote inside of here-doc

2 participants