Skip to content

refactor: replace shell-utils string parsing with tree-sitter AST#2652

Open
LaZzyMan wants to merge 1 commit intoQwenLM:mainfrom
LaZzyMan:feat-shell-utils-ast
Open

refactor: replace shell-utils string parsing with tree-sitter AST#2652
LaZzyMan wants to merge 1 commit intoQwenLM:mainfrom
LaZzyMan:feat-shell-utils-ast

Conversation

@LaZzyMan
Copy link
Copy Markdown
Collaborator

TLDR

Replace all string-based shell command parsing in shell-utils.ts, rule-parser.ts, and shell-semantics.ts with tree-sitter AST-first approach, using the existing shellAstParser.ts infrastructure. This significantly improves parsing robustness for quoted strings, heredocs, nested constructs, and complex compound commands.

Screenshots / Video Demo

N/A — no user-facing change. This is an internal refactor of shell command parsing logic.

Dive Deeper

Architecture: Sync/Async Hybrid with Lazy Initialization

The key challenge is that tree-sitter WASM initialization is async (Parser.init() + Language.load()), but most callers are synchronous. The solution:

  1. ensureParserInitStarted() — All sync AST functions call this when the parser isn't ready, triggering a fire-and-forget initParser(). The current call falls back to the legacy implementation, but the parser starts loading in the background.
  2. Subsequent calls — Once the WASM loads (typically within milliseconds), all sync functions use the AST path.
  3. Legacy fallback preserved — Original string-based implementations are retained as *Legacy() functions, ensuring correctness even if the parser fails to load.

Changes by File

File Change
shellAstParser.ts Added sync AST functions: splitCommandsAST, getCommandRootAST, getCommandRootsAST, detectCommandSubstitutionAST, tokenizeCommandAST, isShellCommandReadOnlySync, ensureParserInitStarted
shell-utils.ts Migrated splitCommands, getCommandRoot, getCommandRoots, detectCommandSubstitution, isCommandNeedsPermission to AST-first with legacy fallback. Added BASIC_READ_ONLY_COMMANDS lightweight fallback set.
rule-parser.ts Migrated splitCompoundCommand to AST-first with legacy fallback
shell-semantics.ts Migrated extractShellOperations to AST-first tokenization with legacy fallback
shellReadOnlyChecker.ts Deleted (superseded by isShellCommandReadOnlySync in shellAstParser)
shellAstParser.test.ts Added 42 new tests for sync AST functions
shell-semantics.test.ts Added 29 new AST-path integration tests

Three-tier fallback for isCommandNeedsPermission

  1. AST full analysis (isShellCommandReadOnlySync) — when parser is ready
  2. BASIC_READ_ONLY_COMMANDS set (getCommandRoot + set lookup) — lightweight sync fallback
  3. Conservative deny — requires permission if neither of the above can determine safety

Reviewer Test Plan

  1. Run unit tests: npx vitest run packages/core/src/utils/shellAstParser.test.ts packages/core/src/utils/shell-utils.test.ts packages/core/src/permissions/shell-semantics.test.ts packages/core/src/permissions/permission-manager.test.ts packages/core/src/tools/shell.test.ts
  2. Verify all 500+ related tests pass
  3. Run full test suite: npm run test
  4. Run typecheck: npx tsc --noEmit --project packages/core/tsconfig.json
  5. Test interactively: run npm start, execute shell commands, verify permission prompts work correctly for both read-only and mutating commands

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

N/A — proactive refactor to improve shell parsing robustness.

- Add sync AST functions to shellAstParser.ts (splitCommandsAST, getCommandRootAST,
  getCommandRootsAST, detectCommandSubstitutionAST, tokenizeCommandAST,
  isShellCommandReadOnlySync)
- Add ensureParserInitStarted() for fire-and-forget lazy initialization
- Migrate shell-utils.ts functions to AST-first with legacy fallback
- Migrate rule-parser.ts splitCompoundCommand to AST-first
- Migrate shell-semantics.ts extractShellOperations to AST-first
- Add BASIC_READ_ONLY_COMMANDS fallback set for isCommandNeedsPermission
- Delete deprecated shellReadOnlyChecker.ts and its tests
- Add comprehensive tests for all new sync AST functions
- Add AST-path integration tests for shell-semantics
@github-actions
Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR replaces string-based shell command parsing with a tree-sitter AST-first approach across multiple files (shellAstParser.ts, shell-utils.ts, rule-parser.ts, shell-semantics.ts). The architecture uses a clever lazy initialization pattern with graceful fallback to legacy implementations. The changes are well-tested with 71 new tests and demonstrate thoughtful handling of the async/sync hybrid requirements.

🔍 General Feedback

  • Excellent architecture: The fire-and-forget lazy initialization pattern (ensureParserInitStarted()) elegantly solves the async WASM loading problem while maintaining sync APIs
  • Comprehensive testing: 42 new tests for AST functions and 29 integration tests provide strong coverage
  • Good fallback strategy: Three-tier fallback for isCommandNeedsPermission shows careful consideration of edge cases
  • Clean deprecation: Removing shellReadOnlyChecker.ts in favor of consolidated AST-based logic reduces code duplication
  • Well-documented: JSDoc comments clearly explain the purpose and behavior of new functions

🎯 Specific Feedback

🟡 High

  • File: shellAstParser.ts:1177 - The collectLeafCommands function handles negated_command by including the whole text, but this could miss nested command substitutions within the negation. Consider recursively processing the inner command to detect substitutions.

  • File: shell-semantics.ts:1617 - The check if (cmdName.includes('=')) for skipping environment variable assignments is fragile. A command like my=cmd arg1 would be incorrectly skipped. Consider using a more robust pattern like /^[A-Za-z_][A-Za-z0-9_]*=$/ to match pure assignments.

🟢 Medium

  • File: shellAstParser.ts:1428 - The WRITE_REDIRECT_OPERATORS set is imported from the top-level constant but used in extractFileRedirect without explicit import. Verify this is correctly scoped within the module.

  • File: shell-utils.ts:1052 - The BASIC_READ_ONLY_COMMANDS set is intentionally conservative but lacks some commands that are read-only (e.g., file, stat, realpath). Consider whether this should be more comprehensive or if the AST path will handle these cases quickly enough after initialization.

  • File: shellAstParser.ts:1203 - The getCommandRootAST function returns quoted command names as-is (e.g., "my-cmd"). While the test acknowledges this behavior, consider stripping outer quotes to maintain consistency with the legacy implementation which returns the unquoted basename.

🔵 Low

  • File: shellAstParser.ts:1088 - The comment mentions "fire-and-forget" initialization but doesn't document the expected warm-up time. Consider adding a note about typical WASM load times (milliseconds) to set expectations for developers.

  • File: shellAstParser.test.ts:784 - The test comment mentions omitted "parser not ready" tests due to unreliable reset behavior. Consider adding a unit test that mocks parserInstance to null to verify the null-guard code paths return null as expected.

  • File: shell-utils.ts - The import of isShellCommandReadOnly from the deleted shellReadOnlyChecker.js module should be removed to avoid potential confusion (though it appears to still exist as a deprecated wrapper).

✅ Highlights

  • Smart async handling: The ensureParserInitStarted() pattern is a textbook solution for lazy async initialization with sync callers
  • Thorough test coverage: Tests cover edge cases like quoted operators, heredocs, nested substitutions, and complex compound commands
  • Clean migration path: Legacy functions preserved as *Legacy() variants ensure backward compatibility during transition
  • Security-conscious: The conservative "deny by default" approach when parser isn't ready is the right security posture
  • Good documentation: The "Dive Deeper" section in the PR body clearly explains the architecture and changes by file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant