GPT Copy is a command-line tool that recursively scans a directory, collects readable files, and concatenates them into a single structured markdown stream. The output can be printed to stdout or written to a file, making it easy to feed codebases, documentation, or notes into language models like GPT.
- Recursive Directory Scanning: Respects
.gitignorerules to selectively process files. - Structured Output: Concatenates file contents into a structured markdown document with file-specific code fences.
- File Filtering: Supports glob-style include (
-i/--include) and exclude (-e/--exclude) patterns for precise file selection. - Force Mode: The
-f/--forceoption bypasses ignore rules and Git-tracked file restrictions. - Line Numbering: Zero-padded line numbers are added to each file's content by default (similar to
cat -n). Use--no-numberto disable. - Token Counting: Includes a separate
tokensCLI command to count the number of tokens in text using OpenAI’stiktokenlibrary with GPT-4o model encoding. - Integrated Token Analysis: Use
--tokensto display token counts for each file in the tree structure, with--top-nto filter and show only the files with the most tokens.
Install globally using UV's tool system:
uv tool install git+https://github.com/simone-viozzi/gpt-copy.gitInstall directly from Git:
pip install git+https://github.com/simone-viozzi/gpt-copy.gitFor development, clone the repository and install in editable mode:
git clone https://github.com/simone-viozzi/gpt-copy.git
cd gpt-copy
uv sync --dev # or pip install -e .[dev]Run the tool by specifying the target directory:
gpt-copy /path/to/directoryRedirect the output to a file:
gpt-copy /path/to/directory -o output.mdFine-tune which files are processed using include and exclude options. Patterns follow gitignore-style glob syntax with support for *, **, and brace expansion.
-ior--include: Include files/directories matching the pattern-eor--exclude: Exclude files/directories matching the pattern--exclude-dir: Exclude directories (automatically adds trailing/)
- Last Match Wins: If multiple patterns match a file, the last matching pattern determines whether it's included or excluded.
- Directory Patterns: Patterns ending with
/match directories and all their contents.node_modules/excludes the directory and everything inside itbuild/excludes the build directory and all files/subdirectories
- Wildcard Patterns:
*matches any characters except/**matches any characters including/(any depth)tests/*matches direct children of tests directory**/*.logmatches all .log files at any depth
- Directory-Only Wildcards: Patterns with wildcards ending in
/match only directoriestmp/**/matches all directories under tmp/ at any depth, but not files
-
Exclude directories with all their contents:
gpt-copy . --exclude-dir tests --exclude-dir node_modules # or equivalently: gpt-copy . -e "tests/" -e "node_modules/"
-
Exclude specific directories but include subdirectories:
gpt-copy . -e "tests/*" -i "tests/**/" # Excludes direct children of tests/ but includes nested directories
-
Exclude all files then include specific ones:
gpt-copy . -e "**" -i "src/**/*.py" # Excludes everything, then includes Python files under src/
-
Complex filtering with multiple patterns:
gpt-copy . -e "build/**" -e "**/*.log" -i "build/reports/**" # Excludes build directory and all .log files, but includes build/reports/
-
Include only specific folder:
gpt-copy . -e "app/" -e "tests/" -e "notebooks/" -i "deployment/" # Excludes app, tests, notebooks, includes only deployment
Ignore .gitignore and Git-tracked file restrictions to process all files:
gpt-copy /path/to/directory -fLine numbering is enabled by default for the content of each file. Each line is prefixed with a zero-padded line number, similar to the Unix cat -n command.
Basic usage (line numbers included):
gpt-copy /path/to/directoryDisable line numbers:
gpt-copy /path/to/directory --no-numberCount the number of tokens in a given text using GPT-4o encoding. The command reads from a file or standard input.
Examples:
- Count tokens in a file:
tokens file.txt
- Pipe output from
gpt-copyintotokens:gpt-copy /path/to/directory | tokens
Display token counts for each file in the directory tree using the --tokens option:
gpt-copy /path/to/directory --tokensFilter by Top N Files by Token Count: Show only the files with the highest token counts:
gpt-copy /path/to/directory --tokens --top-n 5Combine with File Filtering: Use with include/exclude patterns to count tokens only for specific file types:
gpt-copy /path/to/directory --tokens --include "*.py" --top-n 3-
Collects
.gitignoreRules: Scans the directory for.gitignorefiles and applies the rules to skip ignored files unless the force mode is enabled. -
Generates a Structured File Tree: Creates a visual representation of the directory structure.
-
Reads and Formats Files:
- Detects file type based on extension.
- Wraps file contents in appropriate markdown code fences.
- Adds line numbers by default (can be disabled with
--no-number). - Skips binary or unrecognized file types.
-
Applies File Filtering: Uses include and exclude glob patterns to determine which files to process, based on their paths relative to the root directory.
# Folder Structure
```
project_root
├── main.py
├── README.md
└── subdir
├── config.yaml
└── script.js
```
## File Contents
### File: `main.py`
*(Relative Path: `main.py`)*
```python
print("Hello, World!")
```
### File: `config.yaml`
*(Relative Path: `subdir/config.yaml`)*
```yaml
version: 1.0
enabled: true
```
Contributions are welcome! If you'd like to contribute, please open a pull request with your proposed changes.
This project is licensed under the MIT License.