fix(ci): add swap and limit CPU cores to prevent arm64 runner OOM#6957
Merged
Conversation
Use taskset to restrict colcon build to (nproc - 1) cores, reserving one core for OS/Docker/BuildKit overhead. This prevents the ARM64 public runners (4 vCPUs / 16GB RAM) from being starved during heavy C++ compilation, which causes the runner to lose communication with the server and fail the job. Closes #6956 Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
|
Thank you for contributing to the Autoware project! 🚧 If your pull request is in progress, switch it to draft mode. Please ensure:
|
This was referenced Mar 28, 2026
de59c08 to
ca7fc93
Compare
The arm64 health-check runner (4 vCPU / 16 GB RAM) intermittently OOM-kills during heavy C++ compilation. Adding an 8 GB swapfile increases available virtual memory from ~20 GB to ~28 GB, providing headroom for transient memory spikes. Signed-off-by: Mete Fatih Cırıt <mfc@autoware.org>
ca7fc93 to
ee0f44c
Compare
mitsudome-r
approved these changes
Mar 28, 2026
Member
mitsudome-r
left a comment
There was a problem hiding this comment.
Looks good to me.
It might be more useful if we could add an argument for workflow dispatch to enable limiting the number of cores further whenever CI fails due to out of resource. But that doesn't have to be done in this PR.
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Changes
Add 8 GB swapfile for arm64 health-check builds -- the arm64 runner (4 vCPU / 16 GB RAM) intermittently OOM-kills during heavy C++ compilation. Adding swap increases virtual memory from ~18 GB to ~26 GB, providing headroom for transient memory spikes.
Use
tasksetto restrictcolcon buildtonproc - 1cores -- reserves one core for OS/Docker/BuildKit overhead. Under taskset,nprocreturns 3, so colcon builds 3 packages in parallel instead of 4.Print runner info (CPU, memory, swap, disk) before the build for easier debugging.
Why
The
docker-build (main-arm64)job intermittently fails becausecolcon builddefaults to using all available cores. With multiple packages compiling in parallel, each spawning multiple cmake compile jobs, the runner exceeds its memory budget and loses communication with the server.tasksetalone was insufficient (see #6956 comment) because it only limits CPU affinity, not memory. 9 concurrent compiler processes (3 packages x 3 jobs) can still exceed 16 GB. The additional swap provides the extra headroom needed.If this is still not enough, the next step is
CMAKE_BUILD_PARALLEL_LEVELto directly limit per-package compile jobs.Test plan
docker-build (main-arm64)health-check job passes without runner communication loss