Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove environment variables eval for --bind, --nv and --rocm for build command #6211

Merged
merged 2 commits into from
Oct 12, 2021

Conversation

cclerget
Copy link
Collaborator

@cclerget cclerget commented Oct 7, 2021

This fixes or addresses the following GitHub issues:

Before submitting a PR, make sure you have done the following:

@cclerget cclerget added the ci:e2e label Oct 7, 2021
Copy link
Collaborator

@DrDaveD DrDaveD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a description of the changed behavior to CHANGELOG.md

@DrDaveD
Copy link
Collaborator

DrDaveD commented Oct 7, 2021

How exactly does this affect issue #6181?

@cclerget
Copy link
Collaborator Author

cclerget commented Oct 8, 2021

@DrDaveD Doesn't fix the initial issue (as this is a won't fix issue and an expected behavior which would be documented by apptainer/singularity-userdocs#411) but cover the one highlighted in #6181 (comment)

@DrDaveD
Copy link
Collaborator

DrDaveD commented Oct 8, 2021

So to be clear, you're saying that this stops accepting $SINGULARITY_BINDPATH, right? Ok, I see BINDPATH was an additional variable that was included with the --bind option. If that's the case, how was SINGULARITY_BINDPATH honored prior to 3.8.0? I looked through the pr #5911 changeset and I don't see it removing support for SINGULARITY_BINDPATH. In fact, it looks like it adds settings for these variables. Is this change going to cause those to break? I think I'm going to have to test this PR.

CHANGELOG.md Outdated Show resolved Hide resolved
@DrDaveD
Copy link
Collaborator

DrDaveD commented Oct 8, 2021

Using the example Singularity file from the comment on issue #6181 if I do sudo singularity build --nv test.sif Singularity with a singularity built from the current code in this PR where the host has the nvidia files, I get no error like I expected. I expected there to be errors because the mount points don't exist in the container. Oh, and I get the same behavior with 3.8.3. There's something I am not understanding about this feature.

For that matter, I don't even understand what the purpose is of these options. Is it only for having these bound in during image build time? I tried using it with a %setup command of mkdir $SINGULARITY_ROOTFS/input and bind-mounting something into it during build, but nothing from inside that directory got into the container, just the empty directory. Why might someone want to have the nvidia or rocm commands and libraries available inside the container during the build? They don't need gpu operations at that time, right? Maybe they need to do different things to build the image depending on what they probe then, but I can't think of what those things might be.

@eburgueno
Copy link

eburgueno commented Oct 9, 2021

I would be against having two different behaviors for --bind and SINGULARITY_BINDPATH. The real issue here is that build --bind does not behave the same as run --bind. At runtime, --bind creates the binded mount point inside the container if one does not exist. This is the expected behavior because the build-time container should not make assumptions about the environment where the image will be run. However this doesn't happen during build, which triggers the error in #6181.

The actual solution would be to make the behavior of --bind consistent for both build and run, and create the destination mountpoint if it does not exist.

@eburgueno
Copy link

eburgueno commented Oct 9, 2021

@DrDaveD

Why might someone want to have the nvidia or rocm commands and libraries available inside the container during the build? They don't need gpu operations at that time, right?

The only use case that I can think of is something like guppy. I have to use rpm instead of yum because the cuda drivers are not needed at build time but the rpm expects them as a dependency.

Edit: that said, I can't remember now if the rpm expects the cuda rpms to be installed, or if it checks for the cuda drivers directly. If it's the former, having build --nv probably won't help in that case.

@cclerget
Copy link
Collaborator Author

cclerget commented Oct 9, 2021

If that's the case, how was SINGULARITY_BINDPATH honored prior to 3.8.0?

@DrDaveD --bind/--nv/--rocm were not supported before 3.8.0 for build command, so they were simply ignored.

I would be against having two different behaviors for --bind and SINGULARITY_BINDPATH. The real issue here is that build --bind does not behave the same as run --build.

@eburgueno This requires some clarifications, --bind/--nv/--rocm for build command are working exactly the same way as with run/shell/exec coupled with --writable, build command basically just run singularity exec --writable --bind ... /tmp/sandbox/build/image /post-script, so if you try to bind to a non existent directory inside a writable container when using the --writable option you will get the same error, there is no different behavior here, just my mistake about the introduction of environment variables support for those options and the side effects that you reported.

At runtime, --bind creates the binded mount point inside the container if one does not exist

This is only valid when running non-writable container, not for writable container as mentioned above, Singularity doesn't create anything inside the container to honor bind mount and has always worked this way AFAIR.

@DrDaveD
Copy link
Collaborator

DrDaveD commented Oct 9, 2021

Ok now I see I had a misunderstanding and thought prior to 3.8.0 SINGULARITY_BINDPATH was honored.

Since that is the case, I really don't see any reason for stopping to honor these environment variables. I don't think they cause any harm. I don't see any negative side effect.

@eburgueno
Copy link

@cclerget thank you for the clarification. I must acknowledge that I never used the run --writable option, so I was not aware that bind mounts did not create the destination mount point if one doesn't exist. What is the rationale for not doing so? the fact that you would be changing a container non-explicitly?

@DrDaveD the negative side effect of continuing to honor the environment variables at build time is that if the behavior is to be left consistent to that of run --writable, end-users will get a build error when they previously did not. The most common use pattern for HPC environments is that sysadmins configure SINGULARITY_BINDPATH (or the equivalent option in the global config file) so that local paths are present to all containers at runtime. A small portion of users will want to build containers in that same HPC environment and suddenly find themselves unable to because of a behavior change. And of those, only fraction will actually need to bind paths at build time.

This is quite literally "it's not a bug, it's a [new] feature!". It is however a feature that introduces a subtle breaking change. And I would be fine with it if it was for a very obvious and widely used use case, but I don't think that's what we're dealing with here.

From where I stand, there might be two options moving forward:

  • Change the behavior of bind at build and run --writable so that the destination mount point is created if it doesn't exist. This would make it consistent across all commands (build, run, and run --writable), and mimics what other container runtimes do (eg: Docker, podman, etc).
  • Change the fail to bind into a warning, but let the build finish anyway. The nonexistence of the bind mount is likely inconsequential to the image build, and for the few cases where it might be needed at least the user gets a warning.

@DrDaveD
Copy link
Collaborator

DrDaveD commented Oct 11, 2021

Ok it didn't occur to me that system administrators would set SINGULARITY_BINDPATH. That's a good reason to not have this feature honor that variable. I would think that more system admins would set the default in singularity.conf, which wouldn't have this side effect.

I don't have a strong opinion on your first option, to change --writable and the build command to allow binding to non-existent mount points. It may be technically a challenge because it may require an extra overlay layer between the container and the area that users write into, I'm not sure. I don't like the second proposed option, to change the error into a warning.

Now I think this PR is probably good to do, independent of whether or not there is separately added the ability to allow binds to non-existent mountpoints. It does still need an update to CHANGELOG.md to indicate SINGULARITY_BINDPATH is also no longer honored for the build command.

@cclerget
Copy link
Collaborator Author

What is the rationale for not doing so? the fact that you would be changing a container non-explicitly?

@eburgueno Yes, Singularity avoid to mess up with the container filesystem in writable mode and create unexpected empty dir/files but the most important argument is from a security point of view, contrary to overlay/underlay mode (allow non-existent binds), writable expose the container filesystem directly to Singularity, placing Singularity in a untrusted filesystem when dealing with file/dir manipulations, it took several v3 versions to handle that securely and the writable behavior wasn't changed since.

A way to avoid the need for creating the bind directory during build is to use /mnt, as it is generally available in a lot of container distribution and is dedicated to that purpose. On another note Singularity support nested binding so you could mount a temporary directory or a specific directory layout from your build directory by doing this:

singularity build -B build/data:/mnt -B /another/useful/datadir:/mnt/useful ...

After build you will noticed that Singularity created directory build/data/useful if it didn't exist prior to running build.

For most use cases, I think binding into /mnt may be sufficient, and there is a workaround for build which will be documented (I will also add the /mnt suggestion). And for multiple bind I would recommend to use nested binding on top of /mnt.

Like @DrDaveD , I'm against option 2, by the past there were complaints about Singularity reporting error as warning or conversely, Singularity must be consistent on that, plus it make things more difficult for maintainers and tests.

For option 1, I don't see much reasons to change this behavior to match the build use cases (like suggested above /mnt should be sufficient for build use cases), and if an image is intended to be used in writable mode outside build with specific binds, handling it during build to create the necessary layout from the definition file is the way (also valid for build with the advantage to provide some "documentation" in the definition file regarding the expected binds during build)

@cclerget cclerget added this to the 3.9.0 milestone Oct 12, 2021
@DrDaveD DrDaveD merged commit 1aaebe1 into apptainer:master Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to bind directory during build to location that does not already exist
4 participants