refactor: replace shell-utils string parsing with tree-sitter AST by LaZzyMan · Pull Request #2652 · QwenLM/qwen-code

LaZzyMan · 2026-03-25T03:49:23Z

TLDR

Replace all string-based shell command parsing in shell-utils.ts, rule-parser.ts, and shell-semantics.ts with tree-sitter AST-first approach, using the existing shellAstParser.ts infrastructure. This significantly improves parsing robustness for quoted strings, heredocs, nested constructs, and complex compound commands.

Screenshots / Video Demo

N/A — no user-facing change. This is an internal refactor of shell command parsing logic.

Dive Deeper

Architecture: Sync/Async Hybrid with Lazy Initialization

The key challenge is that tree-sitter WASM initialization is async (Parser.init() + Language.load()), but most callers are synchronous. The solution:

ensureParserInitStarted() — All sync AST functions call this when the parser isn't ready, triggering a fire-and-forget initParser(). The current call falls back to the legacy implementation, but the parser starts loading in the background.
Subsequent calls — Once the WASM loads (typically within milliseconds), all sync functions use the AST path.
Legacy fallback preserved — Original string-based implementations are retained as *Legacy() functions, ensuring correctness even if the parser fails to load.

Changes by File

File	Change
`shellAstParser.ts`	Added sync AST functions: `splitCommandsAST`, `getCommandRootAST`, `getCommandRootsAST`, `detectCommandSubstitutionAST`, `tokenizeCommandAST`, `isShellCommandReadOnlySync`, `ensureParserInitStarted`
`shell-utils.ts`	Migrated `splitCommands`, `getCommandRoot`, `getCommandRoots`, `detectCommandSubstitution`, `isCommandNeedsPermission` to AST-first with legacy fallback. Added `BASIC_READ_ONLY_COMMANDS` lightweight fallback set.
`rule-parser.ts`	Migrated `splitCompoundCommand` to AST-first with legacy fallback
`shell-semantics.ts`	Migrated `extractShellOperations` to AST-first tokenization with legacy fallback
`shellReadOnlyChecker.ts`	Deleted (superseded by `isShellCommandReadOnlySync` in shellAstParser)
`shellAstParser.test.ts`	Added 42 new tests for sync AST functions
`shell-semantics.test.ts`	Added 29 new AST-path integration tests

Three-tier fallback for `isCommandNeedsPermission`

AST full analysis (isShellCommandReadOnlySync) — when parser is ready
BASIC_READ_ONLY_COMMANDS set (getCommandRoot + set lookup) — lightweight sync fallback
Conservative deny — requires permission if neither of the above can determine safety

Reviewer Test Plan

Run unit tests: npx vitest run packages/core/src/utils/shellAstParser.test.ts packages/core/src/utils/shell-utils.test.ts packages/core/src/permissions/shell-semantics.test.ts packages/core/src/permissions/permission-manager.test.ts packages/core/src/tools/shell.test.ts
Verify all 500+ related tests pass
Run full test suite: npm run test
Run typecheck: npx tsc --noEmit --project packages/core/tsconfig.json
Test interactively: run npm start, execute shell commands, verify permission prompts work correctly for both read-only and mutating commands

Testing Matrix

	🍏	🪟	🐧
npm run	✅	❓	❓
npx	❓	❓	❓
Docker	❓	❓	❓
Podman	❓	-	-
Seatbelt	❓	-	-

Linked issues / bugs

N/A — proactive refactor to improve shell parsing robustness.

- Add sync AST functions to shellAstParser.ts (splitCommandsAST, getCommandRootAST, getCommandRootsAST, detectCommandSubstitutionAST, tokenizeCommandAST, isShellCommandReadOnlySync) - Add ensureParserInitStarted() for fire-and-forget lazy initialization - Migrate shell-utils.ts functions to AST-first with legacy fallback - Migrate rule-parser.ts splitCompoundCommand to AST-first - Migrate shell-semantics.ts extractShellOperations to AST-first - Add BASIC_READ_ONLY_COMMANDS fallback set for isCommandNeedsPermission - Delete deprecated shellReadOnlyChecker.ts and its tests - Add comprehensive tests for all new sync AST functions - Add AST-path integration tests for shell-semantics

github-actions · 2026-03-25T03:50:29Z

📋 Review Summary

This PR replaces string-based shell command parsing with a tree-sitter AST-first approach across multiple files (shellAstParser.ts, shell-utils.ts, rule-parser.ts, shell-semantics.ts). The architecture uses a clever lazy initialization pattern with graceful fallback to legacy implementations. The changes are well-tested with 71 new tests and demonstrate thoughtful handling of the async/sync hybrid requirements.

🔍 General Feedback

Excellent architecture: The fire-and-forget lazy initialization pattern (ensureParserInitStarted()) elegantly solves the async WASM loading problem while maintaining sync APIs
Comprehensive testing: 42 new tests for AST functions and 29 integration tests provide strong coverage
Good fallback strategy: Three-tier fallback for isCommandNeedsPermission shows careful consideration of edge cases
Clean deprecation: Removing shellReadOnlyChecker.ts in favor of consolidated AST-based logic reduces code duplication
Well-documented: JSDoc comments clearly explain the purpose and behavior of new functions

🎯 Specific Feedback

🟡 High

File: shellAstParser.ts:1177 - The collectLeafCommands function handles negated_command by including the whole text, but this could miss nested command substitutions within the negation. Consider recursively processing the inner command to detect substitutions.
File: shell-semantics.ts:1617 - The check if (cmdName.includes('=')) for skipping environment variable assignments is fragile. A command like my=cmd arg1 would be incorrectly skipped. Consider using a more robust pattern like /^[A-Za-z_][A-Za-z0-9_]*=$/ to match pure assignments.

🟢 Medium

File: shellAstParser.ts:1428 - The WRITE_REDIRECT_OPERATORS set is imported from the top-level constant but used in extractFileRedirect without explicit import. Verify this is correctly scoped within the module.
File: shell-utils.ts:1052 - The BASIC_READ_ONLY_COMMANDS set is intentionally conservative but lacks some commands that are read-only (e.g., file, stat, realpath). Consider whether this should be more comprehensive or if the AST path will handle these cases quickly enough after initialization.
File: shellAstParser.ts:1203 - The getCommandRootAST function returns quoted command names as-is (e.g., "my-cmd"). While the test acknowledges this behavior, consider stripping outer quotes to maintain consistency with the legacy implementation which returns the unquoted basename.

🔵 Low

File: shellAstParser.ts:1088 - The comment mentions "fire-and-forget" initialization but doesn't document the expected warm-up time. Consider adding a note about typical WASM load times (milliseconds) to set expectations for developers.
File: shellAstParser.test.ts:784 - The test comment mentions omitted "parser not ready" tests due to unreliable reset behavior. Consider adding a unit test that mocks parserInstance to null to verify the null-guard code paths return null as expected.
File: shell-utils.ts - The import of isShellCommandReadOnly from the deleted shellReadOnlyChecker.js module should be removed to avoid potential confusion (though it appears to still exist as a deprecated wrapper).

✅ Highlights

Smart async handling: The ensureParserInitStarted() pattern is a textbook solution for lazy async initialization with sync callers
Thorough test coverage: Tests cover edge cases like quoted operators, heredocs, nested substitutions, and complex compound commands
Clean migration path: Legacy functions preserved as *Legacy() variants ensure backward compatibility during transition
Security-conscious: The conservative "deny by default" approach when parser isn't ready is the right security posture
Good documentation: The "Dive Deeper" section in the PR body clearly explains the architecture and changes by file

LaZzyMan requested review from DennisYu07, DragonnZhang, Mingholy, gwinthis, pomelo-nwu and tanzhenxin as code owners March 25, 2026 03:49

github-actions bot mentioned this pull request Mar 26, 2026

📊 AI CLI 工具社区动态日报 2026-03-26 gsscsd/big_model_radar#96

Open

LaZzyMan added 0.13.2 0.14.0 and removed 0.13.2 0.14.0 labels Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: replace shell-utils string parsing with tree-sitter AST#2652

refactor: replace shell-utils string parsing with tree-sitter AST#2652
LaZzyMan wants to merge 1 commit intoQwenLM:mainfrom
LaZzyMan:feat-shell-utils-ast

LaZzyMan commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LaZzyMan commented Mar 25, 2026

TLDR

Screenshots / Video Demo

Dive Deeper

Architecture: Sync/Async Hybrid with Lazy Initialization

Changes by File

Three-tier fallback for isCommandNeedsPermission

Reviewer Test Plan

Testing Matrix

Linked issues / bugs

Uh oh!

github-actions bot commented Mar 25, 2026

📋 Review Summary

🔍 General Feedback

🎯 Specific Feedback

🟡 High

🟢 Medium

🔵 Low

✅ Highlights

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Three-tier fallback for `isCommandNeedsPermission`