Here are specific details that are useful when you want to contribute to the BPE crates. Make sure to read the repository's contribution guidelines as well.
This project has a slightly unusual structure to resolve some dependency issues.
- This directory contains
bpe, the BPE code itself. - A sibling directory contains
bpe-openai, which exposes tokenizers for OpenAI token sets, and depends onbpe. - Tests are located in the
testssubdirectory, and benchmarks in thebenchmarkssubdirectory. Both of these are separate crates so they can depend onbpe-openaiwithout causing a cyclic dependency.
Only the bpe and bpe-openai crates are meant to be published. The other ones are for development use only.
Change the working directory to the benchmarks directory:
cd benchmarksRun the benchmark as follows (required cargo-criterion installed):
cargo criterion(Using cargo bench ignores the settings in criterion.toml!)
Open the full report which should be located in target/criterion/reports/index.html.
Update the figures in this repo as follows (requires rsvg-convert from librsvg installed):
script/copy-results