Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of description on what the permissions on files in the input root are #43

Open
EdSchouten opened this issue Dec 23, 2018 · 5 comments

Comments

@EdSchouten
Copy link
Collaborator

(Note: This is a continuation of a remark I made in #40)

It seems to be the case that the protocol doesn't document what the permissions of files and directories in the input root are in relation to the credentials of the build action.

  • Are build actions permitted to overwrite input files? When using Sandboxfs, this would be easy to achieve. Without using Sandboxfs, it's also possible, but has the downside that optimizations built around caching input files and hardlinking them is out of the question.
  • Related to the previous question, is it permitted to hardlink input files to some other location? Recent versions of Linux have fs.protected_hardlinks enabled by default. This implies that if input files are read-only (due to the use of hardlinking caches), we cannot guarantee that the kernel will allow the creation of hardlinks to input files.
  • In the general sense: is there even any guarantee that hardlinks can be created between two distinct directories in the input root? For example, may a worker place the output directories on a separate file system (tmpfs)? If so, this means you can't hardlink input files into an output directory.
  • What are permissions on directories? Where may the build action create temporary files? In any directory in the input root, or only inside of a directory containing one or more outputs?

Buildbarn's workers don't use anything like FUSE (yet!), for the reason that I initially aimed at using Buildbarn on Kubernetes, where you can't simply make mounts inside of containers. After giving it enough tweaks, I eventually concluded that:

  • All directories in the input root need to be writable to appease the build rules out there. Build rules should be allowed to rename and remove any file in the input root, and to create files in any directory in the input root.
  • Input files may be read-only. They may be replaced by removing them first and creating a new one with the old name.
  • Disallowing input files to be hardlinked (fs.protected_hardlinks == 1) causes a very small number of build rules to break, but those may be easy to fix.

Do we want to document this in the .proto file somewhere?

@EdSchouten
Copy link
Collaborator Author

Friendly ping? :-)

EdSchouten pushed a commit to EdSchouten/remote-apis that referenced this issue Feb 1, 2019
At the build event at Bloomberg, several users of REv2 (Bazel,
Buildbarn, BuildGrid) discussed what file permissions of the input root
ought to be. Basically, there were two camps:

- Camp A: reduce the number of writable files and directories to the
  absolute minumum (Bazel and Buildbarn).

- Camp B: keep files read-only to allow caching based on hardlinks,
  while making all directories writable (BuildGrid). This is desired to
  support the execution of very large build actions that invoke
  traditional build systems that tend to clobber the input root all over
  the place.

It was eventually decided to support the latter. In the long run a
separate effort could be started to lock this down further, if it turns
out this causes significant performance overhead.

Until recently, Buildbarn also followed the second approach. A recent
change caused Buildbarn to only make directories with one or more
outputs to be writable and all of the parent directories. This change
will be reverted to match the semantics proposed by this commit.

Fixes: bazelbuild#43
@ola-rozenfeld
Copy link
Contributor

Does #90 fully address this, or is there anything remaining?

@juergbi
Copy link
Contributor

juergbi commented Oct 22, 2019

Does #90 fully address this, or is there anything remaining?

We still need to document the default permissions, i.e., when UnixMode is not specified as node property.

@EricBurnett
Copy link
Collaborator

I don't think we can document the default permissions just yet, for the simple reason that there isn't currently an agreed-upon default. The tradeoffs you outline above are a good example: I agree that directories need to be writeable, but I'm not sure what will/won't break if we make files readonly (today RBE marks all input files as writeable; no actions modify their inputs today, but I don't know if any are opening them readwrite and so would be broken by readonly - wouldn't put it past random tools/scripts tools to do so). At the same time, to document that files must be writeable precludes you from some reasonable optimizations, so I'm not going to push for it.

If we do document a default, my vote is for '777' for the simple reason that that's what we use today, and I don't have a viable way to validate whether changing it would break anyone. If you (and other implementations) are interested in standardizing on the same, then I certainly won't object, but otherwise I think it's probably a case where we should declare it to be "implementation-defined" and leave it to tests of cross-compatibility to flag if the divergence actually matters to anything.

For #90 there's an independent concern that defaults may be context-specific: the default permission for inputs should probably be different from the default permission of outputs?

@juergbi
Copy link
Contributor

juergbi commented Nov 6, 2019

If we do document a default, my vote is for '777' for the simple reason that that's what we use today, and I don't have a viable way to validate whether changing it would break anyone. If you (and other implementations) are interested in standardizing on the same, then I certainly won't object, but otherwise I think it's probably a case where we should declare it to be "implementation-defined" and leave it to tests of cross-compatibility to flag if the divergence actually matters to anything.

If these are the only two options, we have to keep it implementation-defined for now. BuildBox (and as far as I know also Buildbarn) exposes cached files to action commands via hardlinks by default. To prevent corruption of the cached files, these hardlinks have to be read-only to the process/user executing the action commands.

To my knowledge, this is the only cross-platform approach that avoids having to copy each input file for every action. Some platforms provide more flexible options such as FUSE or reflinks but these options are not universally available.

For #90 there's an independent concern that defaults may be context-specific: the default permission for inputs should probably be different from the default permission of outputs?

Do you mean for the special case of output paths that already exist in the input tree? As I normally expect outputs to be created by the action command, in which case the permissions are defined by the action command (+ possibly umask) and not by the RE server/worker. Or am I misunderstanding your question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants