Defer docker image URL accessibility check to srun when not caching locally #164
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Defer docker image URL accessibility check to srun when not caching locally. This PR is created to fix a bug reported by @srivatsankrishnan. CloudAI has an install mode to install test templates. In the install stage, docker images are cached locally when a user sets cache_docker_images_locally to true. Even if cache_docker_images_locally is set to false, the accessibility of the given docker image URL is tested. The problem arises when the head node does not have enroot. In install mode, CloudAI calls DockerImageCacheManager's
check_docker_image_exists
.check_docker_image_exists
uses enroot, and when the head node does not have enroot, it fails to check and then reports an unknown error as shown below.While DockerImageCacheManager has
_check_prerequisites
to check prerequisites like enroot and srun, it is only called incache_docker_image
and not incheck_docker_image_exists
. Therefore, ifcheck_docker_image_exists
is called directly,_check_prerequisites
is not called, and it fails to check the accessibility of the URL. This PR fixes the error by always returning True incheck_docker_image_exists
. This is the rationale: if a docker image URL is not accessible, it will be identified by the actual sbatch or srun command. Therefore, DockerImageCacheManager does not need to check it.Test Plan