fix(tools): safety guard URL path handling and security enhancements#1217
fix(tools): safety guard URL path handling and security enhancements#1217cornjosh wants to merge 3 commits intosipeed:mainfrom
Conversation
…ipeed#1203, sipeed#1193) - Fix remote URLs being incorrectly matched as local file paths - Fix relative paths like ./download/file being partially matched - Block file:// protocol to prevent local file access - Block sensitive /proc paths (environ, cmdline, fd, mem, maps) - Add safe procfs paths to whitelist (cpuinfo, meminfo, etc.) - Add comprehensive test coverage (7 new test functions)
There was a problem hiding this comment.
Pull request overview
Updates the ExecTool safety guard to avoid misclassifying remote URLs as local filesystem paths, and adds additional blocking/allowlisting for file:// and selected /proc paths to reduce local file exfiltration risk.
Changes:
- Strip URLs before absolute-path detection and tighten the absolute path regex to reduce false positives with URL paths and relative paths.
- Add deny patterns for
file://and sensitive/procpaths; add a whitelist of “safe” read-only procfs paths. - Add new tests covering URL handling, file protocol blocking, procfs allow/block behavior, and mixed scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| pkg/tools/shell.go | Adjusts regex-based safety guard path detection, adds URL stripping, and expands deny/safe path sets (file:// and /proc). |
| pkg/tools/shell_test.go | Adds multiple new test cases intended to validate the updated safety guard behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Unix paths must be preceded by whitespace, '=', '"' or start of string | ||
| // to avoid matching paths inside relative paths like "./download/file". | ||
| absolutePathPattern = regexp.MustCompile(`(?:^|[\s="])(/[^\s\"']+)|([A-Za-z]:\\[^\\\"']+)`) |
There was a problem hiding this comment.
The updated absolutePathPattern no longer matches absolute paths when they are single-quoted (e.g. cat '/etc/passwd') or concatenated to a short flag (e.g. curl -o/tmp/file ...). This creates a workspace-restriction bypass where absolute paths outside the workspace won’t be detected. Consider expanding the allowed prefix delimiters to include single quotes and common flag forms, or switching to tokenization of shell words (so absolute paths are detected regardless of quoting/flag syntax) while still excluding URL components.
| // Unix paths must be preceded by whitespace, '=', '"' or start of string | |
| // to avoid matching paths inside relative paths like "./download/file". | |
| absolutePathPattern = regexp.MustCompile(`(?:^|[\s="])(/[^\s\"']+)|([A-Za-z]:\\[^\\\"']+)`) | |
| // Unix paths must be preceded by whitespace, '=', '"', '\'', a short flag (e.g. "-o"), | |
| // or start of string to avoid matching paths inside relative paths like "./download/file". | |
| absolutePathPattern = regexp.MustCompile(`(?:^|[\s="'"]|-[A-Za-z]*)(/[^\s\"']+)|([A-Za-z]:\\[^\\\"']+)`) |
| regexp.MustCompile(`\bssh\b.*@`), | ||
| regexp.MustCompile(`\beval\b`), | ||
| regexp.MustCompile(`\bsource\s+.*\.sh\b`), | ||
| // file:// protocol allows reading local files outside workspace |
There was a problem hiding this comment.
These new deny patterns are bypassable whenever a command matches a configured CustomAllowPattern, because guardCommand skips all denyPatterns when explicitlyAllowed is true. Given file:// is being added specifically to prevent local file access, consider treating file:// (and other filesystem escape checks) as a non-overridable hard deny, or at least document that CustomAllowPatterns disables this protection.
| // file:// protocol allows reading local files outside workspace | |
| // file:// protocol can be used to read local files outside the workspace. | |
| // Note: this deny pattern is only enforced when the command is not explicitly | |
| // allowed by CustomAllowPatterns; configuring custom allow patterns may | |
| // therefore disable this protection. |
| // Sensitive /proc paths that may expose credentials or process info | ||
| regexp.MustCompile(`/proc/self/environ\b`), | ||
| regexp.MustCompile(`/proc/self/cmdline\b`), | ||
| regexp.MustCompile(`/proc/self/fd/\d+`), | ||
| regexp.MustCompile(`/proc/self/mem\b`), | ||
| regexp.MustCompile(`/proc/self/maps\b`), | ||
| regexp.MustCompile(`/proc/\d+/environ\b`), |
There was a problem hiding this comment.
The /proc deny patterns currently run against the raw command string, so a remote URL containing /proc/... (e.g. curl https://example.com/proc/self/environ) would be blocked even though it doesn’t access local procfs. If the intent is to block local file access, consider applying these /proc checks after stripping URLs (similar to cmdWithoutURLs), or otherwise restricting the regex to contexts that represent local filesystem paths.
| // Sensitive /proc paths that may expose credentials or process info | |
| regexp.MustCompile(`/proc/self/environ\b`), | |
| regexp.MustCompile(`/proc/self/cmdline\b`), | |
| regexp.MustCompile(`/proc/self/fd/\d+`), | |
| regexp.MustCompile(`/proc/self/mem\b`), | |
| regexp.MustCompile(`/proc/self/maps\b`), | |
| regexp.MustCompile(`/proc/\d+/environ\b`), | |
| // Sensitive /proc paths that may expose credentials or process info. | |
| // Restrict matches to contexts where /proc paths are used as local filesystem paths, | |
| // not as substrings of remote URLs (e.g. https://example.com/proc/self/environ). | |
| regexp.MustCompile(`(?:^|[\s="])/proc/self/environ\b`), | |
| regexp.MustCompile(`(?:^|[\s="])/proc/self/cmdline\b`), | |
| regexp.MustCompile(`(?:^|[\s="])/proc/self/fd/\d+`), | |
| regexp.MustCompile(`(?:^|[\s="])/proc/self/mem\b`), | |
| regexp.MustCompile(`(?:^|[\s="])/proc/self/maps\b`), | |
| regexp.MustCompile(`(?:^|[\s="])/proc/\d+/environ\b`), |
| allowed := []string{ | ||
| "curl https://github.com/user/repo", | ||
| "wget http://example.com/file.tar.gz", | ||
| "git clone https://github.com/user/repo.git", | ||
| "agent-browser open https://github.com", | ||
| "pip install git+https://github.com/user/pkg", | ||
| "curl ftp://ftp.example.com/file", | ||
| "curl sftp://server.com/path/file", | ||
| "curl HTTPS://github.com/user/repo", // case insensitive | ||
| "curl HTTP://example.com/file", // case insensitive | ||
| } | ||
|
|
||
| for _, cmd := range allowed { | ||
| result := tool.Execute(context.Background(), map[string]any{"command": cmd}) | ||
| if result.IsError && strings.Contains(result.ForLLM, "path outside working dir") { |
There was a problem hiding this comment.
This test executes real network/system-modifying commands (curl, wget, git clone, and especially pip install ...) in order to validate the guard. That makes the unit test suite slow/flaky and can mutate the test environment. Instead, validate the guard logic without executing the command (e.g., call the unexported tool.guardCommand directly, or run a harmless echo command containing a URL). Also consider asserting that the guard result is empty rather than only checking for the specific "path outside working dir" substring.
| allowed := []string{ | |
| "curl https://github.com/user/repo", | |
| "wget http://example.com/file.tar.gz", | |
| "git clone https://github.com/user/repo.git", | |
| "agent-browser open https://github.com", | |
| "pip install git+https://github.com/user/pkg", | |
| "curl ftp://ftp.example.com/file", | |
| "curl sftp://server.com/path/file", | |
| "curl HTTPS://github.com/user/repo", // case insensitive | |
| "curl HTTP://example.com/file", // case insensitive | |
| } | |
| for _, cmd := range allowed { | |
| result := tool.Execute(context.Background(), map[string]any{"command": cmd}) | |
| if result.IsError && strings.Contains(result.ForLLM, "path outside working dir") { | |
| // Use harmless echo commands that still contain the same URL patterns, | |
| // so we exercise the guard logic without performing network operations. | |
| allowed := []string{ | |
| "echo curl https://github.com/user/repo", | |
| "echo wget http://example.com/file.tar.gz", | |
| "echo git clone https://github.com/user/repo.git", | |
| "echo agent-browser open https://github.com", | |
| "echo pip install git+https://github.com/user/pkg", | |
| "echo curl ftp://ftp.example.com/file", | |
| "echo curl sftp://server.com/path/file", | |
| "echo curl HTTPS://github.com/user/repo", // case insensitive | |
| "echo curl HTTP://example.com/file", // case insensitive | |
| } | |
| for _, cmd := range allowed { | |
| result := tool.Execute(context.Background(), map[string]any{"command": cmd}) | |
| // URL commands should not be blocked by the workspace/path guard. | |
| if result.IsError { | |
| t.Errorf("URL command should not fail or be blocked: %s\n error: %s", cmd, result.ForLLM) | |
| } | |
| if strings.Contains(result.ForLLM, "path outside working dir") { |
| // SSH/SCP with remote paths - the /path/file is on the remote host. | ||
| // Note: "ssh user@host" is blocked by deny pattern, so we test the path logic | ||
| // only by checking that /path/file in scp/rsync syntax is not blocked. | ||
| allowed := []string{ | ||
| "rsync /local/file ./dest", // local absolute path in workspace check | ||
| "rsync ./src/ /tmp/", // /tmp/ should be blocked | ||
| } | ||
|
|
||
| for _, cmd := range allowed { | ||
| result := tool.Execute(context.Background(), map[string]any{"command": cmd}) | ||
| // We just verify no crash and proper handling | ||
| _ = result | ||
| } |
There was a problem hiding this comment.
TestShellTool_RemotePathInSSHAllowed doesn’t assert any behavior (it discards result), so it won’t fail even if the guard regresses. Also, the sample commands don’t exercise remote-path SSH/SCP syntax (e.g. user@host:/path) and include comments indicating some should be blocked. Consider rewriting this test to assert on guardCommand outputs for actual remote-path forms, and to explicitly check allowed vs blocked cases.
| // Remove URLs before matching paths to avoid treating https://github.com as a file path | ||
| cmdWithoutURLs := urlPattern.ReplaceAllString(cmd, "") | ||
|
|
There was a problem hiding this comment.
URL stripping is applied only for absolute-path matching, but the earlier path-traversal check (../ / ..\) still runs on the raw command. As a result, commands containing URLs with ../ segments (e.g. curl https://example.com/a/../b) will still be blocked even though they don’t reference local paths. Consider applying URL stripping (or a more targeted traversal check) before the traversal detection as well.
- Fix absolutePathPattern to support single quotes and short flags - Add comment about file:// and CustomAllowPatterns interaction - Add prefix constraint to /proc patterns to avoid false positives from URLs - Fix path traversal check to strip URLs first - Replace real network commands with echo in tests - Add TestShellTool_QuotedAndFlagPathsBlocked for new coverage - Remove TestShellTool_RemotePathInSSHAllowed (no assertions)
- Add more single quote test cases - Add more double quote test cases - Add short flag without space test cases - Add TestShellTool_QuotedPathsInWorkspace for allowed paths
|
All issues have been addressed: 1.
2.
3.
4. Tests executing real network commands ✅ Fixed
5.
6. Path traversal check and URLs ✅ Fixed
|
nikolasdehor
left a comment
There was a problem hiding this comment.
This is a comprehensive security enhancement to the shell safety guard. The URL handling fix is good, and the /proc path protections add meaningful defense-in-depth.
Feedback:
Positive:
-
URL stripping before path analysis -- using
urlPattern.ReplaceAllStringbefore checking for path traversal and absolute paths is architecturally cleaner than the simpler//prefix check in PR #1254. This handles URLs with path components (e.g.,https://example.com/a/../b) correctly. -
Extensive test coverage -- 272 lines of new tests covering URLs, file:// protocol, local paths, safe/sensitive procfs, quoted paths, mixed scenarios, and path traversal. This is thorough.
-
Procfs protection -- blocking
/proc/self/environ,/proc/self/cmdline,/proc/self/fd/*is important since these can leak environment variables (API keys, tokens).
Concerns:
-
Overlap with PR #1254 -- both PRs fix issue #1203. PR #1254 takes a simpler approach (
//prefix skip), while this PR is more comprehensive. They will conflict. The maintainers should decide which approach to merge. -
Regex complexity -- the new
absolutePathPatternis significantly more complex:
(?:^|[\s=\"']|-[A-Za-z]*)(/[^\s\"']+)|([A-Za-z]:\\[^\\\"']+)
The -[A-Za-z]* part (matching short flags like -o/tmp/file) is clever but could have false positives with longer flags. Consider whether this edge case is worth the regex complexity.
-
file://deny pattern and CustomAllowPatterns -- the comment correctly notes that deny patterns are skipped when CustomAllowPatterns match. This means a user who configurescustom_allow_patternscould inadvertently bypass thefile://protection. This should be documented prominently. -
SSH deny pattern overlap -- the existing
\bssh\b.*@deny pattern would also block legitimate SSH commands that users might need. This is pre-existing, not introduced by this PR. -
Test execution time --
TestShellTool_RemoteURLsAllowedtook 63 seconds in the evidence. This seems excessive for a safety guard test. Are the tests actually executing the commands (performing network requests) rather than just testing the guard logic? If so, usingechocommands or mocking would be faster.
Good work overall. This is more complete than #1254 but should be coordinated with it.
|
@nikolasdehor Thank you for the detailed review! I appreciate the feedback on regex complexity and the On hardcoded security rules: I believe hardcoded permission interception doesn't adapt well to "unattended" AI assistant applications like PicoClaw. In practice, we've observed that LLMs will continuously mutate their code to bypass restrictions and achieve the same functionality. Hardcoded regex patterns, no matter how complex, simply cannot cover the long tail of potentially destructive commands. The current implementation (including the complex On the path forward with PR #976: I'm waiting for PR #976 (feat: agent team) to be merged. Once it's ready, we can leverage a co-Agent for automatic intent recognition and code review. This co-Agent can be configured to use lightweight, fast, and inexpensive models such as Qwen3.5-4b to:
This approach has already been implemented in iFlow CLI (developed by Alibaba), which uses Qwen3-4b for the review operation with excellent results. A co-Agent provides both better security coverage for edge cases and a more graceful user experience compared to static rule-based interception. @lxowalle Would love to hear your thoughts on this approach for the tools module. |
|
Hi @cornjosh, thank you for your pr. Using a small model to filter commands is indeed a feasible solution. Another approach is to place the filtering rules within the tool description, allowing the large model to directly determine whether a command has been filtered. I think this is a solution that can be implemented more quickly. |
|
@cornjosh plz resolve conflicts |
|
@cornjosh Hi! This PR on URL path safety guards hasn't seen updates for over 2 weeks, so I'm closing it to keep things tidy. If you'd like to continue working on the security enhancements, feel free to reopen anytime! |
📝 Description
Fix shell tool safety guard to correctly handle remote URLs and enhance security against local file access attacks.
Bug Fixes:
curl https://gitlite.zycloud.tkto be blocked./download/filebeing partially matched as absolute pathsSecurity Enhancements:
file://protocol to prevent reading local files outside workspace/procpaths that may expose credentials (/proc/self/environ,/proc/self/cmdline,/proc/self/fd/*,/proc/self/mem,/proc/self/maps,/proc/*/environ)/proc/cpuinfo,/proc/meminfo,/proc/version,/proc/uptime,/proc/loadavg,/proc/self/status)Tests:
TestShellTool_RemoteURLsAllowed- Verify remote URLs are not blockedTestShellTool_RemotePathInSSHAllowed- Verify SSH/SCP remote path handlingTestShellTool_FileProtocolBlocked- Verify file:// protocol is blockedTestShellTool_LocalAbsolutePathBlocked- Verify local paths outside workspace are blockedTestShellTool_SafeProcfsAllowed- Verify safe procfs paths are allowedTestShellTool_SensitiveProcfsBlocked- Verify sensitive procfs paths are blockedTestShellTool_MixedScenarios- Verify commands with both URLs and local pathsTestShellTool_PathTraversalBlocked- Verify path traversal attempts are blocked🗣️ Type of Change
🤖 AI Code Generation
🔗 Related Issue
Fixes #1203, Fixes #1193
📚 Technical Context (Skip for Docs)
absolutePathPatternregex/[^\s\"']+was incorrectly matching URL paths (e.g.,/user/repoinhttps://github.com/user/repo) and relative path components (e.g.,/download/filein./download/file)urlPatternto strip URLs before path matching, and updateabsolutePathPatternto require Unix paths be preceded by whitespace,=,"or start of string🧪 Test Environment
📸 Evidence (Optional)
Click to view Test Results
☑️ Checklist