Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnixFileSystem: read cached hashes from extended attributes #11662

Conversation

EdSchouten
Copy link
Contributor

@EdSchouten EdSchouten commented Jun 29, 2020

There are certain workloads where Bazel's running time gets dominated by
checksum computation. Examples include:

  • People adding local_repository()s to their project that point to
    networked file shares.
  • The use of repositories that contain very large input files.

When using remote execution, we need to compute digests to be able to
place such files in input roots. In many cases, a centralized CAS will
already contain these files. It would be nice if Bazel could efficiently
check for existence of such objects without needing to scan the file
locally.

This change extends UnixFileSystem to call getxattr() on attribute
"user.checksum.${algo}" prior to falling back to reading file contents.
There is no true standard on how these extended attributes should be
called, but "user.checksum.${algo}" already has some precedent. It is,
for example, used by BuildGrid internally:

https://gitlab.com/BuildGrid/buildbox/buildbox-fuse/-/merge_requests/9

EDIT: The name of the extended attribute is now configurable.

Using extended attributes to store this information also seems to be a
fairly common approach. Apparently it is also used within Google itself:

https://groups.google.com/g/bazel-discuss/c/6VmjSOLySnY/m/v2dpwt8jBgAJ

So far no code has been added to let Bazel write these attributes to
disk. The main goal so far is to speed up access to read-only corpora,
where the maintainers have spent the effort adding these attributes.

@ulfjack
Copy link
Contributor

ulfjack commented Jun 29, 2020

Using user-settable attributes is not safe, and I would be against having this enabled by default because it's footgun territory. I'm not even sure this should be merged at all.

Filesystem implementations that expose hashes through attributes, e.g., FUSE-based filesystems, can use a non-user-settable namespace. I'd prefer we aligned with Google on the name, but they may be using a 'google.' prefix, which would be undesirable (and it's not the end of the world if it differs, except that it has a silent failure mode if the attribute name doesn't match exactly).

@ulfjack
Copy link
Contributor

ulfjack commented Jun 29, 2020

Aligning with Google means that they can use bazel unmodified with their existing filesystem implementations.

@EdSchouten
Copy link
Contributor Author

EdSchouten commented Jun 29, 2020

Using user-settable attributes is not safe, and I would be against having this enabled by default because it's footgun territory. I'm not even sure this should be merged at all.

Could you explain why this would be unsafe? My takeaway has been that getting/setting user.* extended attributes is subject to the same permission checking as reading/writing its contents. This means that if you want to ensure these two remain in sync, you can simply remove the +w bit from such files.

This means that if using user.* extended attributes is unsafe, the act of storing data in files as we do right now would be unsafe as well.

Filesystem implementations that expose hashes through attributes, e.g., FUSE-based filesystems, can use a non-user-settable namespace. I'd prefer we aligned with Google on the name, but they may be using a 'google.' prefix, which would be undesirable (and it's not the end of the world if it differs, except that it has a silent failure mode if the attribute name doesn't match exactly).

This is also fine by me. Any convention will do. Even google.*, though it would be a bit awkward. I'm merely kicking the tires here to see what's possible.

@ulfjack
Copy link
Contributor

ulfjack commented Jun 29, 2020

Bazel would rely on an external contract for correctness: "the user has to make sure that anything modifying any file in the source tree will either update or remove the extended attribute". In most situations, there is nothing enforcing this contract, so it's clearly not safe for Bazel to rely on it, especially in the default configuration.

Yes, there are cases where you can provide such guarantees, e.g., through a FUSE filesystem, or by controlling the writer and making the filesystem read-only. Since you're requiring significant additional infrastructure anyway, why would you then prefer user-settable attributes over non-user-settable attributes? In a FUSE filesystem, you can actually handle read/write access correctly, so you definitely don't want users to set them, so why use user-settable attributes? If you're going the other route, why not allow the writer to write non-user-settable attributes?

@EdSchouten
Copy link
Contributor Author

Since you're requiring significant additional infrastructure anyway, [...]

Note that you only require this 'significant additional infrastructure' in case you require that this is not being tampered with. Even though Bazel has 'correctness' in its slogan, there is always a certain ceiling/limit where correctness breaks down, and that's fine. That's where the authors of a tool such as Bazel come up with a boundary. Here are some examples:

  • Do we expect that Bazel remains 'correct' in case of hardware memory corruption? Of course not.
  • Do we expect that build actions are still hermetic in case they do symlink expansion, thereby finding the actual location of the files, and start sniffing around there? Nope!
  • Do we expect that Bazel remains 'correct' if I run a program in the background that keeps on resetting file timestamps using utimensat(), thereby making it harder for build systems to detect changes? Likely not.

With that same reasoning, you could argue that having a read-only file that has user.checksum.sha256 set, but making it writable again and modifying its contents without adjusting the attribute as well is also invalid.

[...], why would you then prefer user-settable attributes over non-user-settable attributes?

As the name implies: it limits this functionality to super users (or FUSE file systems, or ...). Why would I need administrative rights to create a data set on my system and annotate it with some checksums, just so I can instruct Bazel to use them? Should I file tickets against my sysadmin to get those added?

Again, I have no strong opinion on it. As long as at the end of the day, we get a feature like this added. Any concrete proposals for a naming scheme other than what is suggested thus far?

@ulfjack
Copy link
Contributor

ulfjack commented Jun 29, 2020 via email

@EdSchouten
Copy link
Contributor Author

Again, I have no strong opinion on it. As long as at the end of the day, we get a feature like this added. Any concrete proposals for a naming scheme other than what is suggested thus far?

@ulfjack
Copy link
Contributor

ulfjack commented Jun 29, 2020

As I said above, I'd prefer to sync with Google. Not sure who, though. @jmmv @EricBurnett

@jmillikin-stripe
Copy link
Contributor

Drive-by comment to say that I am also uncomfortable with user-modifiable xattrs being used for this, but would love the ability to opt-in to something a FUSE filesystem can set.

@EdSchouten
Copy link
Contributor Author

EdSchouten commented Jun 30, 2020

On macOS, attribute names are completely freeform. Even as a non-root user, I can do this:

$ touch foo
$ xattr -w horse.goat.duck woof foo
$ xattr foo                        
horse.goat.duck

On Linux, it's a different story. Attribute names have to be placed in namespaces. Only four namespaces are supported: security, system, trusted and user. At least on my system (stock Ubuntu 18.04 with ext4), permissions for those namespaces seem to be as follows:

Namespace setxattr() as root setxattr() as !root getxattr() as !root
security
system
trusted
user

This means that the properties as proposed by @ulfjack are seemingly impossible to achieve on macOS. On Linux, we can get that by calling the attributes security.*, even though that namespace is documented as being intended for purposes other than ours.

EDIT: The Linux xattr(7) man page also says this about security.*:

When no security module is loaded, all processes have read access to extended security attributes, and write access is limited to processes that have the CAP_SYS_ADMIN capability.

This means that even access to security.* is not guaranteed as !root.

Maybe the best option would be to stick to user.*, so that we at least get consistent behaviour between macOS and Linux, but make it an opt-in feature using a command line flag?

EDIT 2: If we're going to use a command line flag anyway, maybe we can avoid the entire naming discussion by letting that command line flag take the name of the extended attribute to use... Hmm....

@EdSchouten EdSchouten force-pushed the eschouten/20200629-xattr-checksum branch 3 times, most recently from baf2383 to 27847a4 Compare June 30, 2020 10:31
@sstriker
Copy link

sstriker commented Jul 2, 2020

This is also discussed in https://groups.google.com/g/bazel-dev/c/SH-Dnm5e0TE/m/uRUSD9UrBgAJ.
The patch looks a lot like what is being suggested: https://groups.google.com/g/bazel-dev/c/SH-Dnm5e0TE/m/buxEm-AgBQAJ

FWIW, from my perspective this is a useful addition, I actually assumed it was already in given the thread above :).

@EricBurnett
Copy link

Looking at the patch as presented, it only reads attributes and does not write them. This seems pretty reasonable to me, regardless of namespace used: when directed to do so, bazel is trusting an external (unspecified) system to update extended attributes on files to match the hashes of their contents. Most likely to be used with FUSE filesystems and similar, which should be good for maintaining correctness in the event that inputs change, but if anything else is used to set these attributes, that system must also maintain the invariant that they do not get stale (however it chooses to achieve that).

Yes, the user could shoot themselves in the foot on this, but that seems similar to me in how a user could use a bad filesystem that provides insufficiently atomic operations, or run some tool to fetch inputs that failed to update them in cases. Their ability to get themselves into a state where the extended attributes are wrong is similar in flavour to their ability to get themselves into a state where the input file bytes are wrong. And in particular, use of a protected namespace doesn't actually defend against it - just because it reduces potential writers of the attributes in question doesn't mean it prevents them from going stale any better?

I think Ulf's line of argument is very valid in the event that bazel writes attributes, I'll note - bazel should not write and re-read attributes without strong expectation it can keep them consistent in the event of file changes, and may want to take precautions to isolate them further from outside corruption via namespaces.

Couple comments on the patch though (as a drive-by non-expert on bazel code):

  • Is there any way to combine the getxattr call with the stat bazel does on the file already? Having a whole bunch more filesystem operations seems like it could be a performance hit.
  • Could the attr be cached? Looks to me like we'll re-stat the file every time getDigest is called, whereas if we computed the digest I believe we cache that value and reuse it.
  • I'm a little unclear on encoding - seems like a byte[] is being extracted and so I'm guessing it's stored as an unencoded digest, but attrs are documented as null-terminated strings. Wouldn't this result in truncated digests for all files where the hash contains a 0 byte?
  • Before commit, I assume unit tests will be added?

@EdSchouten EdSchouten force-pushed the eschouten/20200629-xattr-checksum branch from 27847a4 to 1adc7f9 Compare July 2, 2020 13:47
@EdSchouten
Copy link
Contributor Author

EdSchouten commented Jul 2, 2020

Hi Eric,

Thanks for the review!

Couple comments on the patch though (as a drive-by non-expert on bazel code):

  • Is there any way to combine the getxattr call with the stat bazel does on the file already? Having a whole bunch more filesystem operations seems like it could be a performance hit.

I don't think Linux/UNIX provides any facilities to stat() and getxattr() a file at once. It'll need to remain two separate system calls regardless of whether they are grouped.

  • Could the attr be cached? Looks to me like we'll re-stat the file every time getDigest is called, whereas if we computed the digest I believe we cache that value and reuse it.

I think that caching is done at a higher level. UnixFileSystem.getDigest() normally forwards the call super.super.super.[....].getDigest(), which eventually ends up at FileSystem.getDigest(). That function does the heavy work of checksumming the data that's on disk:

  protected byte[] getDigest(final Path path) throws IOException {
    return new ByteSource() {
      @Override
      public InputStream openStream() throws IOException {
        return getInputStream(path);
      }
    }.hash(digestFunction.getHashFunction()).asBytes();
  }

So my assumption is that this change does not disable/bypass caching.

  • I'm a little unclear on encoding - seems like a byte[] is being extracted and so I'm guessing it's stored as an unencoded digest, but attrs are documented as null-terminated strings. Wouldn't this result in truncated digests for all files where the hash contains a 0 byte?

Keys of extended attributes are null-terminated strings, while their values are supposedly binary safe. Tools like xattr(1) are also capable of displaying binary values well, which is why I decided to not do any hexadecimal -> binary conversion.

  • Before commit, I assume unit tests will be added?

Yeah, that's a fair point, and also where I'd like your and the other Bazel folks' input.

My mindset going into this was to put this logic in a decorator class that wraps an existing FileSystem, while overriding getDigest(). It looks like this isn't straight-forward, because FileSystem is not an interface. Coming from Go, Java also doesn't seem to have any facilities for easily forwarding methods, like Go's way of doing composition. That means we'll need to forward all the other methods manually. Letting this be part of UnixFileSystem makes it harder to unit test, because it directly calls into the system calls (through JNI, I suppose).

What would be the best way for me to test this?

@EricBurnett
Copy link

Thanks for the quick answers Ed! Those all make sense to me. I'll defer to the bazel folk on the thread for the question of how to test though, as I don't know best practices here myself. (I mostly interact with bazel as a client tool that I look into for debugging, but don't really know how to update myself). In the abstract, a wrapper around getxattr would at least let you test against the interface you believe it holds, but probably you'll also need an integration test to test the round-trip against a real filesystem?

help =
"The name of an extended attribute that can be placed on files to store a precomputed "
+ "copy of the file's hash, corresponding with --digest_function. This option "
+ "can be used to reduce disk I/O and CPU load caused by hash computation.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be clearer (and probably longer).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulfjack any suggestions on how to improve the description?

@sstriker
Copy link

sstriker commented Jul 8, 2020

@janakdr thoughts?

Copy link
Contributor

@janakdr janakdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having the xattr name as a startup option is reasonable. Inside Google, we'll need to force-set this option to the Google-internal xattr, but I don't think that should block this. If there's enough documentation/stern warnings around this, letting people specify the xattr themselves seems ok. Could potentially warn if the xattr has an "insecure shape", with another option to turn off that warning, although that may be overkill.

In terms of tests, you could write a Java integration test with a custom FileSystem that recorded calls, and assert that the desired ones were made. One example with a custom filesystem. Could also write a shell test that turned on profiling

add_to_blazerc "common --record_full_profiler_data --profile=$PROF"
add_to_blazerc "common --experimental_generate_json_trace_profile"
add_to_blazerc "common --noslim_profile"

and then counted the number of system call operations for each file. We have an internal test that does that.

@EdSchouten EdSchouten force-pushed the eschouten/20200629-xattr-checksum branch 2 times, most recently from 46dd576 to 531c3a3 Compare July 14, 2020 15:18
@EdSchouten
Copy link
Contributor Author

In terms of tests, you could write a Java integration test with a custom FileSystem that recorded calls, and assert that the desired ones were made. One example with a custom filesystem.

Thanks for that link! One of the things I completely forgot is that it's valid in Java to inherit from a class and override one of the methods deep down. Calls to sibling methods in a base class also do a 'vtable lookup' (or however Java would call that). You tend to forget that as a Go programmer...

Anyway, I've added a basic unit test that inherits from UnixFileSystem and overrides getxattr() to return a constant value. Calls to DigestUtils.getDigestWithManualFallback() should then succeed without doing any file I/O.

Copy link
Contributor

@janakdr janakdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on which Bazel "roots" you expect to have this xattr? Specifically, will this xattr be available for source files, output files, or both? If just one, then this will wastefully try to compute fast digests for every file under the other root. This is related to your concern about not using the file cache when using fast digests.

Inside Google, the same xattr provides a digest for both source and outputs. Unfortunately, at the FileSystem level, the distinction is lost, so I'm not sure how you could switch off between them here if only one has fast digests.

@EdSchouten
Copy link
Contributor Author

Hi Janak!

Can you comment on which Bazel "roots" you expect to have this xattr? Specifically, will this xattr be available for source files, output files, or both? If just one, then this will wastefully try to compute fast digests for every file under the other root. This is related to your concern about not using the file cache when using fast digests.

Inside Google, the same xattr provides a digest for both source and outputs. Unfortunately, at the FileSystem level, the distinction is lost, so I'm not sure how you could switch off between them here if only one has fast digests.

Initially I'm only interested in using this on output files (#11703). In the long run I also want to use it for source files. So for me completely acceptable for me to call getxattr() globally.

@EdSchouten EdSchouten force-pushed the eschouten/20200629-xattr-checksum branch from 6e074e4 to 1034111 Compare July 27, 2020 12:55
@dhanna11
Copy link

dhanna11 commented Aug 2, 2020

Drive by comment: might want to check out Facebooks piper/srcfs clone. They have xattrs implemented for buck https://github.com/facebookexperimental/eden/

@janakdr
Copy link
Contributor

janakdr commented Aug 3, 2020

Sorry, I was OOO all last week. I'll try to take a look this week, but it may be a few days as I dig out from my backlog.

Copy link
Contributor

@janakdr janakdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably good to document explicitly that setting this flag means that Bazel will try to get this xattr for every file, including both source and outputs, so that users aren't surprised if Bazel starts issuing more filesystem calls with this flag.

@EdSchouten EdSchouten force-pushed the eschouten/20200629-xattr-checksum branch 3 times, most recently from 3cbdcf8 to fab90a6 Compare August 6, 2020 13:16
@EdSchouten
Copy link
Contributor Author

It's probably good to document explicitly that setting this flag means that Bazel will try to get this xattr for every file, including both source and outputs, so that users aren't surprised if Bazel starts issuing more filesystem calls with this flag.

Done!

EdSchouten added a commit to buildbarn/bb-storage that referenced this pull request Aug 8, 2020
In this PR for Bazel I am adding support for reading hashes from
extended attributes:

bazelbuild/bazel#11662

This can speed up Bazel's file system access, as it no longer needs to
read the full file contents to compute the digest.

The PR for Bazel does not set any standard for extended attribute
naming. Let's go ahead and pick a naming scheme on our end. We could
have also used Buildbox's "user.checksum.*", but the format of those
extended attributes is different. Bazel expects binary hashes, while
Buildbox uses base16 (hexadecimal) encoding.
@EdSchouten EdSchouten force-pushed the eschouten/20200629-xattr-checksum branch 5 times, most recently from bb83657 to eb855f8 Compare August 10, 2020 21:23
@EdSchouten EdSchouten force-pushed the eschouten/20200629-xattr-checksum branch from eb855f8 to ce61a16 Compare August 14, 2020 14:54
There are certain workloads where Bazel's running time gets dominated by
checksum computation. Examples include:

- People adding local_repository()s to their project that point to
  networked file shares.
- The use of repositories that contain very large input files.

When using remote execution, we need to compute digests to be able to
place such files in input roots. In many cases, a centralized CAS will
already contain these files. It would be nice if Bazel could efficiently
check for existence of such objects without needing to scan the file
locally.

This change extends UnixFileSystem to call getxattr() on an attribute
prior to falling back to reading file contents. The name of the extended
attribute that is used is configurable through a command line flag.

Using extended attributes to store this information also seems to be a
fairly common approach. Apparently it is also used within Google itself:

https://groups.google.com/g/bazel-discuss/c/6VmjSOLySnY/m/v2dpwt8jBgAJ

So far no code has been added to let Bazel write these attributes to
disk. The main goal so far is to speed up access to read-only corpora,
where the maintainers have spent the effort adding these attributes.
@EdSchouten EdSchouten force-pushed the eschouten/20200629-xattr-checksum branch from ce61a16 to 3fa149a Compare September 3, 2020 14:19
@EdSchouten
Copy link
Contributor Author

Hey! I've just rebased the change on top of latest master once again. Is there anything else I need to do on my end?

@janakdr
Copy link
Contributor

janakdr commented Sep 3, 2020

Whoops, my apologies, forgot that I needed to manually import! I've imported the pending change and will work on harmonizing this change with Google's internal hash setup. I'll update here as I go.

@janakdr
Copy link
Contributor

janakdr commented Sep 11, 2020

Update: I've gotten rid of the gross global digest function state that we were using
recent commits

I currently have a bunch of commits in flight to stop using Path#getDigest (using DigestUtils instead, or getting metadata from elsewhere). Once that's done, I'll get rid of the FileSystem#getDigest overrides, like RemoteActionFileSystem (we have more internally). However, I think that this change is somewhat orthogonal to that clean-up, so I will continue to work on getting it into shape in parallel.

bazel-io pushed a commit that referenced this pull request Sep 11, 2020
…with DigestUtils#getDigestWithManualFallback or document why Path#getFastDigest is not useful to them.

This is inspired by cleaning up in preparation for #11662: currently, some FileSystem implementations call Path#getFastDigest inside Path#getDigest because they have a fast digest implementation and want to guard against expensive digest computations.

After this change, ~all overrides of FileSystem#getDigest can be removed, since the callers will already have called FileSystem#getFastDigest.

PiperOrigin-RevId: 331142210
@EdSchouten
Copy link
Contributor Author

Awesome work, @janakdr! Would the changes you're working on also allow us to implement this feature using a plain option, instead of a startup option? Or would that still be a bit of a stretch?

@janakdr
Copy link
Contributor

janakdr commented Sep 13, 2020

We never really recreate the FileSystem on a running server, so it might be tough even after these changes. We could discuss how to make that happen, and what the priority should be, though.

@bazel-io bazel-io closed this in 05650ff Sep 17, 2020
@EdSchouten
Copy link
Contributor Author

Awesome! \o/

@EdSchouten EdSchouten deleted the eschouten/20200629-xattr-checksum branch September 17, 2020 18:34
Yannic pushed a commit to Yannic/bazel that referenced this pull request Oct 5, 2020
There are certain workloads where Bazel's running time gets dominated by
checksum computation. Examples include:

- People adding local_repository()s to their project that point to
  networked file shares.
- The use of repositories that contain very large input files.

When using remote execution, we need to compute digests to be able to
place such files in input roots. In many cases, a centralized CAS will
already contain these files. It would be nice if Bazel could efficiently
check for existence of such objects without needing to scan the file
locally.

This change extends UnixFileSystem to call getxattr() on attribute
~~"user.checksum.${algo}" prior to falling back to reading file contents.~~
~~There is no true standard on how these extended attributes should be~~
~~called, but "user.checksum.${algo}" already has some precedent. It is,~~
~~for example, used by BuildGrid internally:~~

~~https://gitlab.com/BuildGrid/buildbox/buildbox-fuse/-/merge_requests/9~~

**EDIT:** The name of the extended attribute is now configurable.

Using extended attributes to store this information also seems to be a
fairly common approach. Apparently it is also used within Google itself:

https://groups.google.com/g/bazel-discuss/c/6VmjSOLySnY/m/v2dpwt8jBgAJ

So far no code has been added to let Bazel write these attributes to
disk. The main goal so far is to speed up access to read-only corpora,
where the maintainers have spent the effort adding these attributes.

Closes bazelbuild#11662.

(@janakdr made some modifications from the original pull request, mainly to
deal with merge conflicts and address Google-internal style.)

PiperOrigin-RevId: 332256967
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants