QoL: use uv for package installs and limit fineweb download to 500M tokens#222
Open
alexandermorgan wants to merge 3 commits intoKellerJordan:masterfrom
Open
QoL: use uv for package installs and limit fineweb download to 500M tokens#222alexandermorgan wants to merge 3 commits intoKellerJordan:masterfrom
alexandermorgan wants to merge 3 commits intoKellerJordan:masterfrom
Conversation
…mands, and limit fineweb download to 500M tokens
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a simple quality of life pr, not a new record. This changes the two bash commands in the README so that they use uv to install the python packages. For the docker case, this also required minor changes to the Dockerfile. These changes took the docker package install time from 3m30s to 59s for me. This pr also limits the number of tokens downloaded from fineweb to 500M, which is sufficient given the recent token efficiency improvements.
¿Why use --system in the
uv pip installcommand? Because uv will otherwise default to installing things in a virtualenv which is not needed in this case (assuming people are renting a hardware instance for the runs).¿Why doesn't the docker command pip install uv like the other command does? Because this is the way the Astral team recommends to install uv in a docker container.
¿Why use
uv pip installinstead ofuv sync+ a pyproject.toml file? The performance difference is negligible souv pip installis preferred because it is a simple drop-in replacement.