Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redefine CARGO_TARGET_DIR to be only an artifacts directory #14125

Open
kornelski opened this issue Jun 21, 2024 · 15 comments
Open

Redefine CARGO_TARGET_DIR to be only an artifacts directory #14125

kornelski opened this issue Jun 21, 2024 · 15 comments
Labels
A-caching Area: caching of dependencies, repositories, and build artifacts C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing. Z-out-dir Nightly: --out-dir

Comments

@kornelski
Copy link
Contributor

kornelski commented Jun 21, 2024

See #14125 (comment)


Problem

There are a couple of issues with the CARGO_TARGET_DIR that are seemingly in conflict with each other:

  1. Multiple locations of target dirs complicate excluding them from backups and full-disk search, cleanup of the temp files, moving temp files to dedicated partitions, out of slow network drives or container mounts, etc. Users don't like that the target dir is huge, and multiple instances of it add up to lot of disk space. Users would prefer a central location to ease management of the temp files, and also to dedupe/reuse dependencies across many projects.

  2. People (and tools) are relying on a relative ./target directory being present to copy or run built files out of there. Additionally, users may not want to configure a shared CARGO_TARGET_DIR due to risk of file name conflicts between projects.

However, the dilemma between 1 and 2 exists only because Cargo uses CARGO_TARGET_DIR for two different roles:

  1. A cache for all intermediate build products (a place where crates.io crates are built, where compiler-private temp files are) which aren't project-specific, and/or files that users don't need to access directly.
  2. A location for user-facing final build products (artifacts) that users expect to be there and need to access.

Proposed Solution

So to satisfy both uses, I suggest to change the thinking about what the role of CARGO_TARGET_DIR should be. Instead of thinking where to put the same huge all-purpose mixed CARGO_TARGET_DIR, think how to deduplicate and slim CARGO_TARGET_DIR, and move everything non-user-facing out of it.

Instead of merging or sharding the CARGO_TARGET_DIR as-is with all of its current content, and adding --artifact-dir as a separate place where final products are being copied to — make CARGO_TARGET_DIR to be the artifact dir (without copying).

As long as the CARGO_TARGET_DIR dir is the place for all of the build files, of all crates including all the crates.io and local builds, with all the caches, all the temp junk, then this is going to be a problematic large directory that needs to be managed. But if the purpose of the ./target dir was changed to be only for user-facing files (files that users can name, and would access via ./target path themselves), then this directory would be relatively small, with a good reason to stay workspace-relative.

What isn't an intermediate build product? (and should stay in ./target)

  • linked (and stripped) binaries of the current workspace, including binaries for the examples,
  • libraries of the current workspace as .a/.so, where lib.crate-type calls for them. Possibly .rlib/.rmeta in the future if there's a stable ABI.
  • linked binaries for tests and benches of the current workspace (to make it easy to launch them under a debugger/profiler, and so they can use relative file paths to read workspace assets).
  • debug symbols for all of the above.
  • .d files for all of the above (so that IDEs and other build systems know when to rebuild the artifacts).
  • if Cargo adds some "staging" directory (a non-private OUT_DIR for build.rs, see Allow build scripts to stage final artifacts #13663), then for build scripts belonging to the current workspace it would be inside ./target as well.

So generally files that users build intentionally, and may want to access directly (run themselves, or package up for distribution) and files that users may need configure their IDE and debugger to find inside the project.

Crates in [patch.crates-io] with a path are a gray area, an might also have their artifacts included in the ./target dir (but in some way that avoids clobbering workspaces' files).

What isn't a final build product, and doesn't belong to ./target:

  • anything related to building crates from crates.io, or any other registry (packages with source = "registry+…")
  • all .fingerprint and incremental dir content of all crates. These are implementation details of the compiler, and nobody should be accessing these directly via ./target/….
  • .o files. Users are not supposed to use them directly either (Rust has static libs for this).
  • proc macro libs. They're not useful without rustc present.

All of these should be built in some other shared build cache dir (one that is not inside CARGO_TARGET_DIR), configurable by a new option/env var.

Registry dependencies would get unique paths derived from rustc version + package IDs + enabled features (so that different crates using different features don't invalidate each others' caches all the time). This would enable sharing built crates.io dependencies across all projects for the same local user, without also causing local workspaces to clobber each others' CARGO_TARGET_DIR/profile/product paths. Temp directories for local projects would need some hashed paths in the shared build/temp dir too.

Advantages

  • Such split removes about 90% of the weight from ./target dirs (for cargo itself, it makes ./target/debug with binaries and tests take 415MB, instead of 4.2GB). This makes cleanup of all the scattered target dirs less of a pressing problem.
  • The ./target keeps relatively few files, and removes high-frequency-churning files out of it, which makes it less of a problem for real-time disk indexing (like search and backups on macOS).
  • I/O latency of ./target stops being critical for build speeds, unlike I/O of the incremental cache and rewrites of thousands of .o files. It becomes feasible to have project directory on a network drive without overriding CARGO_TARGET_DIR (network filesystems are used by non-Linux systems where tools like Vagrant and Docker have to run full-fat VMs, and can't cheaply share the file system).
  • It makes ./target contain only workspace-unique files, which makes it justified for every workspace to have one.
  • It enables moving registry deps to a shared build directory, without side effect of local projects overwriting each others' files. Sharing of dependencies matches users' expectation that the same dependencies shouldn't be redundantly rebuilt for each local project.
  • It's almost entirely backwards compatible. Users can get the benefits without breaking their existing workflows, post-build scripts, and integrations. It doesn't invalidate documentation/books/tutorials that refer to target/release/exe etc.
  • It could be the default behavior, so it could benefit all users without friction of adding --artifact-dir or .cargo/config.

Notes

No response

@kornelski kornelski added C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage. labels Jun 21, 2024
@kornelski
Copy link
Contributor Author

In terms of incompatibilities, only a few come to my mind:

  • Hacks that recursively search ./target looking for a file built by some build script in its OUT_DIR. Ideally this should be solved by adding proper support for building arbitrary assets in Cargo, but for the sake of keeping compatibility with CARGO_TARGET_DIR, this could be made to work by keeping OUT_DIRs as subdirectories or symlinks in ./target.

  • Binaries trying to link to .so/.dylib of their Rust dependencies, where the deps are built by Cargo, and are not installed on the system. This is currently basically unsupported in Cargo, because there's no way to set rpath to anything that works (absolute path is too build-specific, and relative doesn't work for tests and examples that are in a different dir than bins). But if users are hacking around it with custom post-build steps, they may expect to find the 3rd party shared libs in ./target. This again would be better solved by having Rust/Cargo aware of dylib dependencies, and copy/hardlink dylibs as needed and set rpath accordingly. Not many deps have crate-type = "cdylib", so these also could be kept/symlinked in ./target for back-compat.

  • Build containers could have very strict read-only filesystems with only CARGO_TARGET_DIR and CARGO_HOME subdirs being writeable. This could be made backwards-compatible by reverting back to the current everything-in-one-place behavior when CARGO_TARGET_DIR is set, and picking other env vars for configuring locations of the trimmed ./target and the intermediate products cache dir.

@epage
Copy link
Contributor

epage commented Jun 22, 2024

@poliorcetics from rust-lang/rfcs#3664 (comment)

Yes, the issue should be moved to cargo I think.

I'm not convinced at all this won't break backwards compatibility in some way.

It makes ./target contain only workspace-unique files, which makes it justified for every workspace to have one.

And I don't want one in any cargo project while still keeping isolation, which is entirely different from what you are proposing.

It enables moving registry deps to a shared build directory, without side effect of local projects overwriting each others' files. Sharing of dependencies matches users' expectation that the same dependencies shouldn't be redundantly rebuilt for each local project.

Once again, the RFC I wrote and original issue I inspired myself from do not ask for that, it asks for the opposite: myself and many others want separate targets dirs for every project.

There are probably as many reasons as there are users for it but common ones are different sets of features amongst projects, sharing of build caches for specific projects, CI builds wanting to separate projects for security, or one project pinning 1.2.3 in a dep B of a dep A and the other project pinning 1.2.4: A can have the same version for both but it's dependencies won't and cargo is not made to handle the case at the moment.

You are fundamentally solving a different issue, one that the RFC I posted is not trying to solve.

@epage
Copy link
Contributor

epage commented Jun 22, 2024

Overall, I see this as a solution alternative to #6790 and had recommended we have that conversation there (or on internals).

What isn't an intermediate build product? (and should stay in ./target)

This is likely going to be the most difficult topic to work through and we'll need to make sure we get wide input on this from #6790 users and others.

Registry dependencies would get unique paths derived from rustc version + package IDs + enabled features (so that different crates using different features don't invalidate each others' caches all the time). This would enable sharing built crates.io dependencies across all projects for the same local user, without also causing local workspaces to clobber each others' CARGO_TARGET_DIR/profile/product paths. Temp directories for local projects would need some hashed paths in the shared build/temp dir too.

imo this is out of scope for this proposal (see #5931) and we should keep this focused so as not to get distracted.

@epage
Copy link
Contributor

epage commented Jun 22, 2024

I see this as a re-framing of the problem, addressing #6790 and rust-lang/rfcs#3371

Instead of us defining a new artifact-dir, we say target-dir is the artifact directory and move everything else out into a "working directory".

  • The move can be gradual as these files are considered implementation details.
  • We avoid "yet another flag" problem of --artifact-dir because controlling of the location of the "working directory" is too low level for the CLI and should be reserved for config
  • A big downside compared to --artfact-dir is that target-dir is not the final directory files are put under but the parent directory. This is especially annoying for cargo build vs cargo build --target, dealing with profile names, etc

Potential names

  • working-dir (arbitrarily chose this one to refer to the concept moving forward)
  • build-dir

Cargo script would default its target-dir as its working-dir

This would need input from

  • artifact-dir users
  • cache / cache clean up users
  • random other systems that dig into the target dir

@epage
Copy link
Contributor

epage commented Jun 22, 2024

This would need an audit of ways we publicly treat the target dir as a working dir, like exposing CARGO_TARGET_TMPDIR

@ensc
Copy link

ensc commented Jun 23, 2024

Sharing sources over NFS adds another incompatibility when the ability to move ./target out of the sources is eliminated.

Every modern buildsystem allows to keep sources and build results separated (and users and tools do not have problems with it). I do not think that cargo should go the way back and enforce a fixed ./target directory.

@kornelski
Copy link
Contributor Author

I'm not suggesting to force it to always be ./target. The CARGO_TARGET_DIR can continue to move this directory elsewhere.
The main point is to reduce severity of problems that the current high-churn high-volume content of this dir causes.

@epage
Copy link
Contributor

epage commented Jul 30, 2024

We talked about this in today's Cargo team meeting.

Our care abouts include

  • Moving intermediate artifacts away
    • Having a transition plan for tool authors
  • Providing a profile-agnostic location for final artifacts
  • Resolving the "elide target when host" within the final artifact path
  • Having a coherent story for users, for example
    • Users today mostly interact with target-dir for dealing with final artifacts, so we likely want to preserve that
    • If we create a new directory for intermediate artifacts, we might not want to templatize target-dir as that will push people down one path moving intermediate artifacts and then we completely change it on the user what path they should go, rather than having a stable story for how to handle this

While we acknowledged the potential for user confusion with CARGO_TARGET_TMPDIR, we were fine with it not being associated with target-dir

The general shape of what we proposed in the meeting is...

Shiny future

target-work-dir: Home of intermediate artifacts

  • Goals:
    • Move intermediate artifacts out of source
    • Don't force a transition for what term/variable people reach out to to access final artifacts
  • name is a placeholder
  • No CLI flag for target-work-dir
  • Precedence (note the lack of target-dir)
    • target-work-dir config
    • target-work-dir default ("{cargo-cache}/target/{workspace-manifest-path-hash}")

target-artifact-dir: Home of final artifacts

  • Goals:
    • Provide profile-agnostic path
    • Provide a way for users to opt-out of the target-platform from being elided when building for the host-platform
  • name is a placeholder (e.g. do we want the target prefix)
  • --artifact-dir CLI flag is added
  • will error if {platform} or {legacy-platform} is not present with multi-target builds
  • Precedence
    • target-artifact-dir config / cli
    • target-dir config / cli
    • target-artifact-dir default ("{workspace-root}/target/{legacy-platform}/{profile}")

Legacy target-dir

  • --target-dir is hidden on CLI. Maybe shows up in man pages
  • No templating supported

Other

  • CARGO_TARGET_TMPDIR points within target-work-dir

Path to Shiny Future

target-work-dir

In theory, we could trivially do this by

  • locking target-work-dir and target-dir (if different)
  • doing all work in target-work-dir and only when we do the "hardlink or cp" do we reference target-dir

Initial default is "{workspace-root}/target"

Template supports

  • {workspace-root}
  • {cargo-cache} (pointing to CARGO_HOME for now)
  • {workspace-manifest-path-hash}

Steps

  1. Implement
  2. Call for testing
  3. Stabilize with opt-in to new location
  4. Call for testing
  5. Switch to opt-out

Notes:

  • Throughout those steps, we would work closely with tool authors to make sure they had a smooth transiti
  • The opt-in/opt-out could either be
    • .cargo/config.toml target-work-dir field
    • A one-off environment variable if we don't want to stabilize the above yet

target-artifact-dir

Assumption: target-work-dir takes some pressure off of target-artifact-dir

Defaulted to final location ("{workspace-root}/target/{legacy-platform}/{profile}")

Template supports

  • {workspace-root}
  • {cargo-cache} (pointing to CARGO_HOME for now)
  • {workspace-manifest-path-hash}
  • {platform}
  • {legacy-platform} (elides host-target)

Needs all of the details in the tracking issue to be finalized.

Contingencies

If target-work-dir takes more than N time (1 year?) to stabilize, then we re-evaluate approving rust-lang/rfcs#3371. This is to try to balance the needs of the people who want something like rust-lang/rfcs#3371 now vs (1) the long-term inapplicability of that RFC and (2) the lack of stable "blessed" workflow for users (telling users to use solution X for several months and then telling them that is no longer "right" and they need to use solution Y).

Alternatives

  • Maintain target-dir as the "artifact base dir" and provide a query command (like buck2 analyze) to ask what the "artifact dir" is
  • Don't expose target-artifact-dir in CLI, leaving it for more advanced uses
  • Have only target-dir and target-artifact-dir
    • Downside: harder end-user transition as they reach out to target-dir for final artifacts. While we eventually de-emphasize target-dir, those workflows will work just fine
  • Have only target-dir and target-work-dir
    • Downside: access to final artifacts is still annoying
    • Only append final directories ({platform}/{profile}) if no templates are available
      • Downside: this would be confusing
      • We do this for {dl} but that is a more limited / off in the weeds use case rather than front and center for users
  • Change the meaning of target-dir if target-work-dir or target-artifact-dir is present
    • This gets confusing

@epage
Copy link
Contributor

epage commented Aug 1, 2024

Something we overlooked in the above analysis is other "artifacts". In particular, I'm thinking of cargo package which places files in $CARGO_TARGET_DIR/packages. I'm assuming at least the .crates location is part our stable API. We'd need to decide about the files laid out on disk next to it.

Ways of solving this

  • Like examples going to examples/, we could make the profile and target template dirs blank and always put this under packages/
  • We could have a high level {default} variable that is artifact-specific.

@clarfonthey
Copy link
Contributor

clarfonthey commented Sep 24, 2024

Just commenting here as I'm dealing with my own issues regarding the target dir, but personally, while it's nice for target to contain final build products only by default, I still will want all of these build products out of the target directory for the sake of excluding them from backups and snapshotting.

For some context, I use ZFS snapshots as a form of fast local backups; not long-term backups in case of hardware or extreme software failure, but decent short-term backups in case I accidentally delete a file or mess up an update. However, I explicitly go out of my way to exclude as many things as possible from auto-snapshotting that qualify as "cache" because they can very quickly clog up my disk if I'm not careful.

(Also: since snapshotting is a filesystem-level feature, I can't just say "don't save files of this type in snapshots" since snapshotting works by instantly freezing the state of the FS into a snapshot, and doesn't copy files over like a long-term backup would.)

For example, today I just deleted 200 GiB of snapshots of target directories. Not the current target directories, but past versions of them from previous snapshots. Snapshots are good for incremental stuff like code because they're copy-on-write, but the contents of a binary are effectively random to any snapshotting tool and they'll end up being fully duplicated every time they're snapshotted, and that means you can end up with several times that amount of data in snapshots until everything eventually gets old enough to be deleted. The "effectively random" part also applies especially to the final products, since while crates that don't change won't change in their compiled artifacts, the final linked products definitely will.

So, as far as I'm concerned, moving the final build products back into the workspace without also having the option to keep them out effectively un-solves the problem that moving the target directory was meant to solve. After all, the final build products, modulo LTO (which isn't really going to happen for debug builds) will effectively be the same size as all the intermediate products, so, that means that about half the disk usage will not be saved. (I'm extremely approximating here; the point is that it's a considerable amount of the disk usage, even if it's not half. Even 10% of the size is still a lot when you consider that these are being multiplied across several snapshots.)

And note that yes, other languages like Node and Python also have this exact same problem, but I don't think that other languages' inability to solve this problem forgives Rust not solving it. Also, even though node_modules can be massive, hundreds of GiB of binary artifacts is pretty hard to beat.

I love the idea of keeping intermediate products deduplicated and in one place. I just don't want that to obscure the goal of having the final products also somewhere else too.

@kornelski
Copy link
Contributor Author

kornelski commented Sep 24, 2024

After all, the final build products, modulo LTO (which isn't really going to happen for debug builds) will effectively be the same size as all the intermediate products

No, the intermediate products are usually many many times larger. Not just double, they can be 1000× larger! On the project I'm currently working, a clean debug build of a 20MB executable creates 2300MB of junk in target/. After working on it for a while, it grows to 13GB of temp data for a 20MB result.

There are often many duplicate copies of libstd and other dependencies in each .rlib file. There is a lot of duplication across code units. There's plenty of completely unused objects included in the dependencies, and stripped even without LTO (rust relies on --as-needed flag). There are also often separate copies for build dependencies, builds with cfg(test), and incremental build cache.

@epage
Copy link
Contributor

epage commented Sep 24, 2024

@clarfonthey the plan calls for both target-work-dir and target-artifact-dir to be templated so you can move their content out. It does not call out templating of target-dir as it calls for phasing that out. If we wanted to templatize it as a convenience way of setting both of the above, we'd likely want to wait for the above so we set the precedence for what people are generally expected to work with, rather than shifting expectations around on the user.

@mathstuf
Copy link
Contributor

I still will want all of these build products out of the target directory for the sake of excluding them from backups and snapshotting.

FWIW, I've had decent results with target being a symlink to somewhere that is not subject to backups/snapshotting. cargo clean will remove the symlink, but everything else I've used is largely fine. Note that this puts intermediate and final artifacts into the same bucket.

@nazar-pc
Copy link

nazar-pc commented Oct 1, 2024

I'm in the same boat as @clarfonthey, but with BTRFS snapshots, which I create every 15 minutes and then stream to longer-term storage. My debug builds are easily 2.5G and I clean up target that is 300-700G basically every week.

I created ~/.cache/cargo/{git,registry,target} for this reason, where ~/.cache is in a separate subvolume that is not subject for snapshotting/backups. ~/.cargo/{git,registry} are symlinks now (still hoping cargo starts respecting XDG one day) because they also grow to substantial sizes (currently 1.08M files and 24.1G together).

Proposed separation (especially templating for both new options) should work nicely for such use case, CARGO_TARGET_DIR in .profile has major consequences for build times when jumping between projects.

Excited!

@epage epage added A-caching Area: caching of dependencies, repositories, and build artifacts Z-out-dir Nightly: --out-dir S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing. and removed S-triage Status: This issue is waiting on initial triage. labels Oct 23, 2024
@Nemo157
Copy link
Member

Nemo157 commented Nov 29, 2024

Personally I also don't care about having the artifacts copied anywhere, I access them via cargo run etc. directly from the work-dir anyway. It'd be nice if we could do something like target-artifact-dir = false to just disable it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-caching Area: caching of dependencies, repositories, and build artifacts C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing. Z-out-dir Nightly: --out-dir
Projects
None yet
Development

No branches or pull requests

7 participants