Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 54 additions & 27 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,33 +54,60 @@ gpt-copy /path/to/directory -o output.md
```

### Advanced File Filtering
Fine-tune which files are processed using include and exclude options.

- **Include Files (`-i` or `--include`):**
Specify one or more glob patterns (with optional brace expansion) to include only matching files.

**Examples:**
- Include all Python files in the `src` folder:
```sh
gpt-copy /path/to/directory -i "src/*.py"
```
- Include specific modules:
```sh
gpt-copy /path/to/directory -i "src/{module1,module2}.py"
```

- **Exclude Files (`-e` or `--exclude`):**
Specify one or more glob patterns to exclude files. Exclusion takes precedence over inclusion.

**Examples:**
- Exclude all files in the `tests` folder:
```sh
gpt-copy /path/to/directory -e "tests/*"
```
- Exclude a specific file:
```sh
gpt-copy /path/to/directory -i "src/*.py" -e "src/__init__.py"
```
Fine-tune which files are processed using include and exclude options. Patterns follow gitignore-style glob syntax with support for `*`, `**`, and brace expansion.

#### Filter Options

- **`-i` or `--include`:** Include files/directories matching the pattern
- **`-e` or `--exclude`:** Exclude files/directories matching the pattern
- **`--exclude-dir`:** Exclude directories (automatically adds trailing `/`)

#### Pattern Matching Rules

1. **Last Match Wins:** If multiple patterns match a file, the last matching pattern determines whether it's included or excluded.
2. **Directory Patterns:** Patterns ending with `/` match directories and all their contents.
- `node_modules/` excludes the directory and everything inside it
- `build/` excludes the build directory and all files/subdirectories
3. **Wildcard Patterns:**
- `*` matches any characters except `/`
- `**` matches any characters including `/` (any depth)
- `tests/*` matches direct children of tests directory
- `**/*.log` matches all .log files at any depth
4. **Directory-Only Wildcards:** Patterns with wildcards ending in `/` match only directories
- `tmp/**/` matches all directories under tmp/ at any depth, but not files

#### Examples

- **Exclude directories with all their contents:**
```sh
gpt-copy . --exclude-dir tests --exclude-dir node_modules
# or equivalently:
gpt-copy . -e "tests/" -e "node_modules/"
```

- **Exclude specific directories but include subdirectories:**
```sh
gpt-copy . -e "tests/*" -i "tests/**/"
# Excludes direct children of tests/ but includes nested directories
```

- **Exclude all files then include specific ones:**
```sh
gpt-copy . -e "**" -i "src/**/*.py"
# Excludes everything, then includes Python files under src/
```

- **Complex filtering with multiple patterns:**
```sh
gpt-copy . -e "build/**" -e "**/*.log" -i "build/reports/**"
# Excludes build directory and all .log files, but includes build/reports/
```

- **Include only specific folder:**
```sh
gpt-copy . -e "app/" -e "tests/" -e "notebooks/" -i "deployment/"
# Excludes app, tests, notebooks, includes only deployment
```

### Force Mode (`-f` or `--force`)
Ignore `.gitignore` and Git-tracked file restrictions to process **all** files:
Expand Down
2 changes: 1 addition & 1 deletion shell.nix
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ let
pythonPackages = pkgs.python311Packages; # Change to Python 3.10
in
pkgs.mkShell rec {
name = "concatenate-files";
name = "gpt-copy";

buildInputs = with pkgs; [
gcc # Required for crates needing C compilers
Expand Down
73 changes: 57 additions & 16 deletions src/gpt_copy/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,21 +74,33 @@ def matches(self, pattern: str, relpath: str, is_dir: bool) -> bool:
Returns:
True if the pattern matches the path
"""
# If pattern ends with /, only match directories
# Use PathSpec for glob matching with ** support
spec = self._compiled_specs.get(pattern)
if not spec:
return False

# If pattern ends with /, it should only match directories
# For patterns like "dir/", match the directory and contents
# For patterns like "dir/**/", match only directories at any depth under dir/
if pattern.endswith("/"):
if not is_dir:
return False
# Match the pattern against the path with trailing /
match_path = relpath + "/"
# Files should not match directory-only patterns
# UNLESS the pattern also matches the file path (e.g., "dir/" matches "dir/file.txt")
# Check if this is a simple directory pattern or has wildcards
if "**" in pattern or "*" in pattern.rstrip("/"):
# Pattern has wildcards - only match if this is a directory
return False
# Pattern is a simple directory like "node_modules/"
# This should match contents too
match_path = relpath
else:
# Match the directory with trailing /
match_path = relpath + "/"
else:
# For files, match as-is. For dirs, try both with and without /
# For non-directory patterns, match as-is
match_path = relpath

# Use PathSpec for glob matching with ** support
spec = self._compiled_specs.get(pattern)
if spec:
return spec.match_file(match_path)
return False
return spec.match_file(match_path)

def effective_action(self, relpath: str, is_dir: bool) -> Action:
"""
Expand Down Expand Up @@ -154,19 +166,48 @@ def _include_can_match_descendant(self, pattern: str, dir_relpath: str) -> bool:
Returns:
True if the pattern could potentially match a descendant
"""
# If pattern contains **, it might match deep descendants
if "**" in pattern:
return True

# If pattern starts with the directory path, it targets descendants
if pattern.startswith(dir_relpath + "/"):
return True

# If the pattern has no directory component and dir is not nested,
# it could match direct children
# If dir is root level, any pattern could potentially match something under it
if not dir_relpath or dir_relpath == ".":
return True

# If pattern contains ** at the start (like **/foo), it might match anywhere
if pattern.startswith("**/"):
return True

# If pattern is just **, it matches everything
if pattern == "**":
return True

# If the pattern has no directory component, it could match direct children
if "/" not in pattern.rstrip("/"):
return True

# For patterns with directory components (like "build/reports/**"),
# check if the pattern could possibly match under dir_relpath
# Extract the first directory component of the pattern
pattern_first_dir = pattern.split("/")[0]

# Check if dir_relpath could contain this directory
# For example:
# dir_relpath="node_modules", pattern="build/reports/**" -> False (different first dirs)
# dir_relpath="build", pattern="build/reports/**" -> True (pattern is under build)
# dir_relpath="", pattern="build/reports/**" -> True (pattern could be anywhere)

# If the directory path starts with the pattern's first directory, it could match
if dir_relpath.startswith(pattern_first_dir + "/") or dir_relpath == pattern_first_dir:
return True

# If the pattern's first directory starts with dir_relpath, it could match
if pattern_first_dir.startswith(dir_relpath + "/"):
return True

# Otherwise, the paths are incompatible
return False

# For other cases, be conservative - allow traversal
# This includes patterns like "data/*.csv" which might match if dir_relpath is "" or "data"
return True
Expand Down
24 changes: 15 additions & 9 deletions src/gpt_copy/gpt_copy.py
Original file line number Diff line number Diff line change
Expand Up @@ -415,20 +415,26 @@ def collect_recursive(dir_path: Path):
)
)
# Collect direct children for compression
# Only add children that aren't excluded by filter rules
try:
children = sorted(entry.iterdir())
for child in children:
if not is_ignored(
child, gitignore_specs, root_path, tracked_files
):
child_rel = child.relative_to(root_path).as_posix()
file_infos.append(
FileInfo(
path=child,
relative_path=child_rel,
is_directory=child.is_dir(),
)
# Check if child is excluded by filter rules
child_action = filter_engine.effective_action(
child_rel, child.is_dir()
)
if child_action == Action.INCLUDE:
file_infos.append(
FileInfo(
path=child,
relative_path=child_rel,
is_directory=child.is_dir(),
)
)
except OSError:
pass
continue
Expand Down Expand Up @@ -728,20 +734,20 @@ def write_output(
"--include",
"include_patterns",
multiple=True,
help="Glob pattern(s) to mark files/directories as included (repeatable)",
help="Glob pattern(s) to include files/directories (e.g., 'src/**/*.py'). Last match wins. Repeatable.",
)
@click.option(
"-e",
"--exclude",
"exclude_patterns",
multiple=True,
help="Glob pattern(s) to mark files/directories as excluded (repeatable)",
help="Glob pattern(s) to exclude files/directories (e.g., 'tests/', '**/*.log'). Patterns ending with / exclude the directory and all contents. Repeatable.",
)
@click.option(
"--exclude-dir",
"exclude_dir_patterns",
multiple=True,
help="Glob pattern(s) to mark directories as excluded (repeatable)",
help="Exclude directories by name/pattern (e.g., 'node_modules', 'dist'). Automatically adds trailing /. Repeatable.",
)
@click.option(
"--no-number",
Expand Down
13 changes: 5 additions & 8 deletions tests/test_tree_compression.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,12 @@ def test_generate_tree_excluded(tmp_path: Path):
# Debug print (optional)
print(tree_output)

# Verify that the excluded directory is shown in compressed form.
# Verify that the excluded directory is shown (without children, as they're excluded).
assert "exclude_dir" in tree_output
# The compressed view should show at most 3 children of the excluded directory.
assert "file3.txt" in tree_output
assert "file4.txt" in tree_output
assert "file5.txt" in tree_output
# There should be an ellipsis indicating additional files.
assert "[...]" in tree_output
# Ensure that the fourth child (file6.txt) is not shown.
# Children of excluded directories should not be shown (they're filtered out).
assert "file3.txt" not in tree_output
assert "file4.txt" not in tree_output
assert "file5.txt" not in tree_output
assert "file6.txt" not in tree_output

# Also verify that the included directory is fully expanded.
Expand Down
Loading
Loading