由于大部分转码工具是直接批量转换,可能会误将 UTF-8 当作 GBK 再转为 UTF-8,导致乱码。 本工具会先识别文件编码,再决定是否转换,从而减少误操作。
识别策略:
- UTF-8 识别基于 Rust 标准库
std::str::from_utf8 - 非 UTF-8 使用
chardetng进行编码猜测 - 仅当识别为 GBK 且置信度达到阈值(默认 0.8)时才执行转换
先通过 Cargo 安装:
cargo install gbk2utf8验证安装:
gbk2utf8 --help强制英文输出:
gbk2utf8 --lang en --help默认会处理 .txt、.c、.h 文件。仅处理 txt 并自动备份:
gbk2utf8 -e txt -b安装后可执行文件位于 ~/.cargo/bin(Windows 通常是 %USERPROFILE%\\.cargo\\bin),请确保该目录在 PATH 中。
推荐按下面 3 条路径选择一种:
- 通过 Cargo 安装(推荐,大多数用户)
如果你已安装 Cargo,直接执行:
cargo install gbk2utf8升级:
cargo install gbk2utf8 --force如果你还没安装 Cargo(不做 Rust 开发也可以安装):
- 访问
https://rustup.rs并安装 Rustup(会同时安装 Cargo) - 重新打开终端,确认 Cargo 可用:
cargo --version-
直接使用预编译 EXE(不安装 Cargo)
-
打开 GitHub Releases:
https://github.com/GenesisAN/gbk2utf8/releases -
下载 Windows 产物(如
gbk2utf8-windows-x86_64.exe) -
重命名为
gbk2utf8.exe(可选) -
放到任意目录并执行,或把该目录加入
PATH后全局调用 -
开发者安装方式(本地源码 / Git)
本地源码安装:
cargo install --path .
cargo install --path . --force从 Git 仓库安装:
cargo install --git https://github.com/GenesisAN/gbk2utf8.git gbk2utf8
cargo install --git https://github.com/GenesisAN/gbk2utf8.git gbk2utf8 --force卸载:
cargo uninstall gbk2utf8处理当前目录下默认扩展名(.txt、.c、.h):
gbk2utf8扫描但不转换:
gbk2utf8 -d ./src -i -s仅处理 txt 并备份:
gbk2utf8 -e txt -b仅处理代码文件:
gbk2utf8 -e c,h使用忽略规则文件:
gbk2utf8 -d ./src --ignore-file .gbk2utf8ignore.gbk2utf8ignore 示例:
build/
target/
legacy/old.c
*.bak
| 参数 | 说明 |
|---|---|
-d, --dir <路径> |
扫描目录(默认当前目录),递归处理子目录 |
-e, --extensions <扩展名,...> |
处理的扩展名,默认 txt,c,h |
-s, --scan-only |
仅扫描,不转换 |
-b, --backup |
转换前备份为 .bak |
-i, --show-info |
显示编码猜测与置信度 |
-m, --min-confidence <数值> |
GBK 置信度阈值,默认 0.8 |
--t <TLD> / --tld <TLD> |
给 chardetng 的 TLD 提示(如 cn、jp),仅用于提升编码猜测准确性,默认 cn |
--ignore-file <路径> |
忽略规则文件(gitignore 语法),默认 .gbk2utf8ignore |
--lang <auto|zh|en> |
输出语言,默认 auto(自动检测) |
- 自动识别编码(UTF-8 / GBK)
- 避免误转 UTF-8 文件
- 递归目录扫描
- 支持 gitignore 风格忽略规则
- 支持扩展名过滤
- 支持转换前备份
- 支持显示编码检测详情
- 输出转换统计信息
程序结束会输出:
- 成功转换
- 转换失败
- 无需转换
命中忽略规则的文件不计入以上统计。
本地构建:
git clone https://github.com/GenesisAN/gbk2utf8.git
cd gbk2utf8
cargo build --release
./target/release/gbk2utf8 --help仓库内置 GitHub Release 工作流:.github/workflows/release.yml。
Many batch converters convert files blindly and may corrupt already UTF-8 files.
gbk2utf8 detects encoding first, then converts only when appropriate.
Detection strategy:
- UTF-8 check via Rust stdlib
std::str::from_utf8 - Non-UTF-8 detection via
chardetng - Convert only when encoding is GBK and confidence is above threshold (default
0.8)
Install from crates.io:
cargo install gbk2utf8Verify:
gbk2utf8 --helpForce English output:
gbk2utf8 --lang en --helpDefault file extensions are .txt, .c, .h.
Convert only txt files with backup:
gbk2utf8 -e txt -bBinary location is usually ~/.cargo/bin (Windows: %USERPROFILE%\\.cargo\\bin).
Make sure it is in your PATH.
Choose one of these 3 paths:
- Install with Cargo (recommended for most users)
If Cargo is already installed:
cargo install gbk2utf8Upgrade:
cargo install gbk2utf8 --forceIf Cargo is not installed yet (no Rust development needed):
- Install Rustup from
https://rustup.rs(it also installs Cargo) - Reopen your terminal and verify:
cargo --version-
Use prebuilt executable (without Cargo)
-
Open GitHub Releases:
https://github.com/GenesisAN/gbk2utf8/releases -
Download the Windows artifact (for example
gbk2utf8-windows-x86_64.exe) -
Optionally rename it to
gbk2utf8.exe -
Run it directly, or add its folder to
PATHfor global usage -
Developer install (local source / Git)
Install from local source:
cargo install --path .
cargo install --path . --forceInstall from Git repository:
cargo install --git https://github.com/GenesisAN/gbk2utf8.git gbk2utf8
cargo install --git https://github.com/GenesisAN/gbk2utf8.git gbk2utf8 --forceUninstall:
cargo uninstall gbk2utf8Process default extensions in current directory:
gbk2utf8Scan only (no conversion):
gbk2utf8 -d ./src -i -sOnly txt with backup:
gbk2utf8 -e txt -bOnly C headers/sources:
gbk2utf8 -e c,hUse ignore rules:
gbk2utf8 -d ./src --ignore-file .gbk2utf8ignore| Option | Description |
|---|---|
-d, --dir <DIR> |
Directory to scan recursively (default: current directory) |
-e, --extensions <EXTENSIONS,...> |
File extensions to process (default: txt,c,h) |
-s, --scan-only |
Scan only, do not convert |
-b, --backup |
Create .bak before conversion |
-i, --show-info |
Show detected encoding and confidence |
-m, --min-confidence <VALUE> |
GBK confidence threshold (default: 0.8) |
--t <TLD> / --tld <TLD> |
TLD hint for chardetng (for example cn, jp), used only to improve detection accuracy (default: cn) |
--ignore-file <PATH> |
Ignore rules file in gitignore syntax (default: .gbk2utf8ignore) |
--lang <auto|zh|en> |
Output language (default: auto, auto-detected) |
- Encoding-aware conversion (UTF-8 / GBK)
- Avoids accidental conversion of UTF-8 files
- Recursive directory traversal
- gitignore-style ignore rules
- Extension filtering
- Optional backup before write
- Per-file detection details
- Final conversion statistics
Build locally:
git clone https://github.com/GenesisAN/gbk2utf8.git
cd gbk2utf8
cargo build --release
./target/release/gbk2utf8 --helpGitHub Release workflow is available at .github/workflows/release.yml.