[ci] simplify CI configurations, parallelize compilation, test CUDA on Ubuntu 22.04#6458
Merged
[ci] simplify CI configurations, parallelize compilation, test CUDA on Ubuntu 22.04#6458
Conversation
borchero
approved these changes
May 21, 2024
Collaborator
borchero
left a comment
There was a problem hiding this comment.
Thank you for spending so much effort to improve the CI here @jameslamb! 🙏🏼
Collaborator
Author
|
Sure, happy to do it! Thanks for all the reviews! |
This was referenced May 23, 2024
Contributor
|
This pull request has been automatically locked since there has not been any recent activity since it was closed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposes the following changes to the CI setup:
In the CUDA jobs:
nvidia-dockerand restart the docker daemon" on every CUDA builddocker runin ascript:block/tmpinstead of a directory that's mounted in from the self-hosted runner/home/github/miniforgealready exists" on the next buildactions/checkoutfromv1tov3v4yet because GLIBC in the container images used in this job aren't new enoughOn most of the CI jobs:
GITHUB_ACTIONS=trueGITHUB_ACTIONSanyway (docs)VERSION.txt" into the 2 CI scripts that need it, instead of having it defined as inline shell code across most of the CI configsCMAKE_BUILD_PARALLEL_LEVEL=4environment variable (see Notes)If any of these generate a lot of discussion, I'll split this up into smaller PRs. But thought the sum total was small enough to do as a single PR.
Notes for Reviewers
Why set
CMAKE_BUILD_PARALLEL_LEVEL?This environment variable is the equivalent of passing e.g.
-j4tocmake --buildormake.It tells that build tool (Ninja, in most of our builds here), to compile multiple objects at a time.
We set that in builds that separately invoke
cmake, like here:LightGBM/.ci/test.sh
Line 56 in 3e9ab53
But currently any builds that are just running
sh build-python.shorRscript build_r.Rare performing serial compilation.Setting this to a value greater than
1should speed up builds.I chose
4because we're already using-j4in lots of places, and it seems to be working well.References:
scikit-build-coredocs recommending this (link)CMAKE_BUILD_PARALLEL_LEVEL(link)Why update Ubuntu versions?
It helps with the GitHub Actions Node 16/20 situation: #6453 (comment).
But more importantly, I think it's more likely to match the set of operating systems and library versions that
lightgbmusers are using in their environments.Ubuntu 22.04 has been available for 2 years (Ubuntu release history) and all of RAPIDS CI uses Ubuntu 20.04 and 22.04:
https://github.com/rapidsai/shared-workflows/blob/19d17957e59cf81574f214e043adf8cff7db9447/.github/workflows/wheels-test.yaml#L81-L85
Other References
Some related PRs explaining the history of the CUDA jobs: