Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Middle-ground vendoring option using a local registry #46

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

cormacrelf
Copy link

@cormacrelf cormacrelf commented May 22, 2024

The problem is a goldilocks one.

  1. vendor = false is quick & easy to manage deps & buckify, but pretty bad day to day.

    • Doesn't work offline as buck2 issues HEAD requests constantly
    • Terrible DX annoyance with "too many open files" errors due to buck trying to download 1000 crates at once. The standard start to your day looks like "run buck2 build about 15 times until by random chance the scheduler manages to get past those errors"
    • Those crates get downloaded again, and again, and again
    • reindeer buckify takes 2 seconds or so. Pretty convenient.
  2. [vendor] ... is slow to manage deps & buckify.

    • Neat for small projects
    • Also probably neat for Meta with y'all's funky EdenFS etc.
    • But middle ground is bad
      • Middle = vendor directory of 1000 crates, 1.2 GB, 50k source files. Mostly from dupes of the windows crates which can't be pinned to one single version etc.
      • reindeer vendor takes 35 seconds
      • reindeer buckify takes 20 seconds
      • git status takes 150ms
      • The vendor folder wrecks git performance simply by its existence.
    • Build experience is perfect, works offline, etc.

I think we need a solution for the middle ground:

  • vendor = "local-registry", using https://github.com/dhovart/cargo-local-registry
  • reindeer vendor ultimately just writes a bunch of .crate files into vendor/, which are just gzipped tarballs
  • .crate files stored in git, but using git-lfs if you like. Suddenly windows-0.48.0.crate is just a blob. Your diffs are much simpler when you modify deps. Etc.
  • Some buck2 rule to extract them. (There is no prelude rule that can do this with strip_prefix and sub_targets support, but prelude's extract_archive could probably have that added.)

Outcomes:

  • Offline works (although doesn't handle cargo package = { git = "..." } deps yet).
  • reindeer vendor and reindeer buckify both take 2 seconds
  • git status takes 20ms
  • Buck builds are a compromise, but a pretty great one. It still has to extract the tarballs when you want to build things. But at least buck won't ever actually extract windows-0.48.0.crate on linux, and you only pay for what you build.
  • The DX annoyance factor during builds is back to zero. No more too many open files errors.
  • DX annoyance when updating deps is acceptable.

Problems:

  • Relies on https://github.com/dhovart/cargo-local-registry being installed. Note however this is a single-file binary. I think if you rewrote it without the dependency on the cargo crate it would be maybe a 2-file crate. And we could use it as a library.
  • I think storing the local registry's index folder (nested ab/ ac/ ah/ ... folders) might be a little bit annoying if you're making concurrent edits on different branches. But you can always regenerate.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 22, 2024
@cormacrelf
Copy link
Author

cormacrelf commented May 22, 2024

And here is an `extract_archive` rule based on prelude's `http_archive` that makes all this work.
def _tar_strip_prefix_flags(strip_prefix: [str, None]) -> list[str]:
    if strip_prefix:
        # count nonempty path components in the prefix
        count = len(filter(lambda c: c != "", strip_prefix.split("/")))
        return ["--strip-components=" + str(count), strip_prefix]
    return []

def _unarchive_cmd(
        # ext_type: str,
        # exec_is_windows: bool,
        archive: Artifact,
        strip_prefix: [str, None]) -> (cmd_args, bool):
    unarchive_cmd = cmd_args(
        "tar",
        "-xzf",
        archive,
        _tar_strip_prefix_flags(strip_prefix),
    )
    return unarchive_cmd, False

def _extract_archive_impl(ctx: AnalysisContext) -> list[Provider]:
    archive = ctx.attrs.src

    # no need to prefer local; this is not a downloaded object. unlike http_archive
    prefer_local = False
    unarchive_cmd, needs_strip_prefix = _unarchive_cmd(archive, ctx.attrs.strip_prefix)
    exec_is_windows = False

    output_name = ctx.label.name
    output = ctx.actions.declare_output(output_name, dir = True)
    script_output = ctx.actions.declare_output(output_name + "_tmp", dir = True) if needs_strip_prefix else output
    if exec_is_windows:
        ext = "bat"
        mkdir = "md {}"
        interpreter = []
    else:
        ext = "sh"
        mkdir = "mkdir -p {}"
        interpreter = ["/bin/sh"]
    exclude_flags = []
    script, _ = ctx.actions.write(
        "unpack.{}".format(ext),
        [
            cmd_args(script_output, format = mkdir),
            cmd_args(script_output, format = "cd {}"),
            cmd_args([unarchive_cmd] + exclude_flags, delimiter = " ").relative_to(script_output),
        ],
        is_executable = True,
        allow_args = True,
    )
    exclude_hidden = []
    ctx.actions.run(
        cmd_args(interpreter + [script]).hidden(exclude_hidden + [archive, script_output.as_output()]),
        category = "extract_archive",
        prefer_local = prefer_local,
    )

    if needs_strip_prefix:
        ctx.actions.copy_dir(output.as_output(), script_output.project(ctx.attrs.strip_prefix))

    return [DefaultInfo(
        default_output = output,
        sub_targets = {
            path: [DefaultInfo(default_output = output.project(path))]
            for path in ctx.attrs.sub_targets
        },
    )]

extract_archive = rule(
    impl = _extract_archive_impl,
    attrs = {
        "src": attrs.source(),
        "strip_prefix": attrs.option(attrs.string(), default = None),
        "sub_targets": attrs.list(attrs.string(), default = [], doc = """
            A list of filepaths within the archive to be made accessible as sub-targets.
            For example if we have an http_archive with `name = "archive"` and
            `sub_targets = ["src/lib.rs"]`, then other targets would be able to refer
            to that file as `":archive[src/lib.rs]"`.
        """),
    },
)

@cormacrelf cormacrelf force-pushed the feature/local-registry branch from 2eb5ad4 to d8c14c7 Compare May 22, 2024 12:35
@cormacrelf cormacrelf force-pushed the feature/local-registry branch from d8c14c7 to e1d2c53 Compare May 22, 2024 12:40
@cormacrelf cormacrelf force-pushed the feature/local-registry branch from cb858a3 to a7b0188 Compare September 4, 2024 07:52
@dmezh
Copy link

dmezh commented Sep 26, 2024

@cormacrelf I think your analysis is on point; there is a gap here that the tarballs fill nicely.

@dmezh
Copy link

dmezh commented Sep 26, 2024

Maybe @jsgf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants