Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnixFileSystem: read cached hashes from extended attributes #11662

Commits on Sep 3, 2020

  1. UnixFileSystem: read cached hashes from extended attributes

    There are certain workloads where Bazel's running time gets dominated by
    checksum computation. Examples include:
    
    - People adding local_repository()s to their project that point to
      networked file shares.
    - The use of repositories that contain very large input files.
    
    When using remote execution, we need to compute digests to be able to
    place such files in input roots. In many cases, a centralized CAS will
    already contain these files. It would be nice if Bazel could efficiently
    check for existence of such objects without needing to scan the file
    locally.
    
    This change extends UnixFileSystem to call getxattr() on an attribute
    prior to falling back to reading file contents. The name of the extended
    attribute that is used is configurable through a command line flag.
    
    Using extended attributes to store this information also seems to be a
    fairly common approach. Apparently it is also used within Google itself:
    
    https://groups.google.com/g/bazel-discuss/c/6VmjSOLySnY/m/v2dpwt8jBgAJ
    
    So far no code has been added to let Bazel write these attributes to
    disk. The main goal so far is to speed up access to read-only corpora,
    where the maintainers have spent the effort adding these attributes.
    EdSchouten committed Sep 3, 2020
    Configuration menu
    Copy the full SHA
    3fa149a View commit details
    Browse the repository at this point in the history