Use new apptainer singularity install to avoid transient errors#63
Open
Use new apptainer singularity install to avoid transient errors#63
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NCI have recently installed an Apptainer-based container engine on Gadi, which uses a different driver for mounting container image. In most cases it will require just swapping
module load singularitywithmodule load apptainer. This will hopefully fix the transient errors (see #26) and so far no errors have surfaced in my local testing. It could be good to get this change into at leastpayu/devso it can be further tested.Ben Menadue also picked up that the short-circuit to detect if running inside a container in the launcher scripts doesn't correctly detect Apptainer containers. They suggested a more reliable way would be to inspect that process's status directly:
This will return 0 if running outside a container or 1 if inside. (Or more precisely, 0 if launching a container will work and 1 if it won't.)
So far in my tests, there hasn't been any "FATAL: container creation failed" using apptainer, so we could maybe also remove that retry logic when launching the container?