From cc53ecad0b870159bc3720a58ec927da5e72e90f Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 8 Apr 2020 09:57:12 -0700 Subject: [PATCH 1/3] Add some more exposition on fingerprints. --- src/cargo/core/compiler/fingerprint.rs | 146 ++++++++++++++++++++----- 1 file changed, 118 insertions(+), 28 deletions(-) diff --git a/src/cargo/core/compiler/fingerprint.rs b/src/cargo/core/compiler/fingerprint.rs index 4c1674b1493..46139c9bfee 100644 --- a/src/cargo/core/compiler/fingerprint.rs +++ b/src/cargo/core/compiler/fingerprint.rs @@ -36,8 +36,7 @@ //! Fingerprints and Metadata are similar, and track some of the same things. //! The Metadata contains information that is required to keep Units separate. //! The Fingerprint includes additional information that should cause a -//! recompile, but it is desired to reuse the same filenames. Generally the -//! items in the Metadata do not need to be in the Fingerprint. A comparison +//! recompile, but it is desired to reuse the same filenames. A comparison //! of what is tracked: //! //! Value | Fingerprint | Metadata @@ -54,8 +53,7 @@ //! __CARGO_DEFAULT_LIB_METADATA[^4] | | ✓ //! package_id | | ✓ //! authors, description, homepage, repo | ✓ | -//! Target src path | ✓ | -//! Target path relative to ws | ✓ | +//! Target src path relative to ws | ✓ | //! Target flags (test/bench/for_host/edition) | ✓ | //! -C incremental=… flag | ✓ | //! mtime of sources | ✓[^3] | @@ -64,12 +62,18 @@ //! //! [^1]: Build script and bin dependencies are not included. //! -//! [^3]: The mtime is only tracked for workspace members and path -//! dependencies. Git dependencies track the git revision. +//! [^3]: See below for details on mtime tracking. //! //! [^4]: `__CARGO_DEFAULT_LIB_METADATA` is set by rustbuild to embed the //! release channel (bootstrap/stable/beta/nightly) in libstd. //! +//! When deciding what should go in the Metadata vs the Fingerprint, consider +//! that some files (like dylibs) do not have a hash in their filename. Thus, +//! if a value changes, only the fingerprint will detect the change. Fields +//! that are only in Metadata generally aren't relevant to the fingerprint +//! because they fundamentally change the output (like target vs host changes +//! the directory where it is emitted). +//! //! ## Fingerprint files //! //! Fingerprint information is stored in the @@ -83,9 +87,7 @@ //! `CARGO_LOG=cargo::core::compiler::fingerprint=trace cargo build` can be //! used to display this log information. //! - A "dep-info" file which contains a list of source filenames for the -//! target. This is produced by reading the output of `rustc -//! --emit=dep-info` and packing it into a condensed format. Cargo uses this -//! to check the mtime of every file to see if any of them have changed. +//! target. See below for details. //! - An `invoked.timestamp` file whose filesystem mtime is updated every time //! the Unit is built. This is an experimental feature used for cleaning //! unused artifacts. @@ -110,6 +112,103 @@ //! all dependencies, when it is updated, by using `Arc` clones, it //! automatically picks up the updates to its dependencies. //! +//! ### dep-info files +//! +//! Cargo passes the `--emit=dep-info` flag to `rustc` so that `rustc` will +//! generate a "dep info" file (with the `.d` extension). This is a +//! Makefile-like syntax that includes all of the source files used to build +//! the crate. This file is used by Cargo to know which files to check to see +//! if the crate will need to be rebuilt. +//! +//! After `rustc` exits successfully, Cargo will read the dep info file and +//! translate it into a binary format that is stored in the fingerprint +//! directory (`translate_dep_info`). The mtime of the fingerprint dep-info +//! file itself is used as the reference for comparing the source files to +//! determine if any of the source files have been modified (see below for +//! more detail). +//! +//! There is also a third dep-info file. Cargo will extend the file created by +//! rustc with some additional information and saves this into the output +//! directory. This is intended for build system integration. See the +//! `output_depinfo` module for more detail. +//! +//! #### -Zbinary-dep-depinfo +//! +//! `rustc` has an experimental flag `-Zbinary-dep-depinfo`. This causes +//! `rustc` to include binary files (like rlibs) in the dep-info file. This is +//! primarily to support rustc development, so that Cargo can check the +//! implicit dependency to the standard library (which lives in the sysroot). +//! We want Cargo to recompile whenever the standard library rlib/dylibs +//! change, and this is a generic mechanism to make that work. +//! +//! ### Mtime comparison +//! +//! The use of modification timestamps is the most common way a unit will be +//! determined to be dirty or fresh between builds. There are many subtle +//! issues and edge cases with mtime comparisons. This gives a high-level +//! overview, but you'll need to read the code for the gritty details. Mtime +//! handling is different for different unit kinds. The different styles are +//! driven by the `Fingerprint.local` field, which is set based on the unit +//! kind. +//! +//! The status of whether or not the mtime is "stale" or "up-to-date" is +//! stored in `Fingerprint.fs_status`. +//! +//! All units will compare the mtime of its newest output file with the mtimes +//! of the outputs of all its dependencies. If any output file is missing, +//! then the unit is stale. If any dependency is newer, the unit is stale. +//! +//! #### Normal package mtime handling +//! +//! `LocalFingerprint::CheckDepinfo` is used for checking the mtime of +//! packages. It compares the mtime of the input files (the source files) to +//! the mtime of the dep-info file (which is written last after a build is +//! finished). If the dep-info is missing, the unit is stale (it has never +//! been built). The list of input files comes from the dep-info file. See the +//! section above for details on dep-info files. +//! +//! Also note that although registry and git packages use `CheckDepInfo`, none +//! of their source files are included in the dep-info (see +//! `translate_dep_info`), so for those kinds no mtime checking is done +//! (unless `-Zbinary-dep-depinfo` is used). Repository and git packages are +//! static, so there is no need to check anything. +//! +//! When a build is complete, the mtime of the dep-info file in the +//! fingerprint directory is modified to rewind it to the time when the build +//! started. This is done by creating an `invoked.timestamp` file when the +//! build starts to capture the start time. The mtime is rewound to the start +//! to handle the case where the user modifies a source file while a build is +//! running. Cargo can't know whether or not the file was included in the +//! build, so it takes a conservative approach of assuming the file was *not* +//! included, and it should be rebuilt during the next build. +//! +//! #### Rustdoc mtime handling +//! +//! Rustdoc does not emit a dep-info file, so Cargo currently has a relatively +//! simple system for detecting rebuilds. `LocalFingerprint::Precalculated` is +//! used for rustdoc units. For registry packages, this is the package +//! version. For git packages, it is the git hash. For path packages, it is +//! the a string of the mtime of the newest file in the package. +//! +//! There are some known bugs with how this works, so it should be improved at +//! some point. +//! +//! #### Build script mtime handling +//! +//! Build script mtime handling runs in different modes. There is the "old +//! style" where the build script does not emit any `rerun-if` directives. In +//! this mode, Cargo will use `LocalFingerprint::Precalculated`. See the +//! "rustdoc" section above how it works. +//! +//! In the new-style, each `rerun-if` directive is translated to the +//! corresponding `LocalFingerprint` variant. The `RerunIfChanged` variant +//! compares the mtime of the given filenames against the mtime of the +//! "output" file. +//! +//! Similar to normal units, the build script "output" file mtime is rewound +//! to the time just before the build script is executed to handle mid-build +//! modifications. +//! //! ## Considerations for inclusion in a fingerprint //! //! Over time we've realized a few items which historically were included in @@ -484,9 +583,8 @@ impl<'de> Deserialize<'de> for DepFingerprint { #[derive(Debug, Serialize, Deserialize, Hash)] enum LocalFingerprint { /// This is a precalculated fingerprint which has an opaque string we just - /// hash as usual. This variant is primarily used for git/crates.io - /// dependencies where the source never changes so we can quickly conclude - /// that there's some string we can hash and it won't really change much. + /// hash as usual. This variant is primarily used for rustdoc where we + /// don't have a dep-info file to compare against. /// /// This is also used for build scripts with no `rerun-if-*` statements, but /// that's overall a mistake and causes bugs in Cargo. We shouldn't use this @@ -1072,19 +1170,16 @@ fn calculate_normal<'a, 'cfg>( .collect::>>()?; deps.sort_by(|a, b| a.pkg_id.cmp(&b.pkg_id)); - // Afterwards calculate our own fingerprint information. We specially - // handle `path` packages to ensure we track files on the filesystem - // correctly, but otherwise upstream packages like from crates.io or git - // get bland fingerprints because they don't change without their - // `PackageId` changing. + // Afterwards calculate our own fingerprint information. let target_root = target_root(cx); - let local = if use_dep_info(unit) { + let local = if unit.mode.is_doc() { + // rustdoc does not have dep-info files. + let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?; + vec![LocalFingerprint::Precalculated(fingerprint)] + } else { let dep_info = dep_info_loc(cx, unit); let dep_info = dep_info.strip_prefix(&target_root).unwrap().to_path_buf(); vec![LocalFingerprint::CheckDepInfo { dep_info }] - } else { - let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?; - vec![LocalFingerprint::Precalculated(fingerprint)] }; // Figure out what the outputs of our unit is, and we'll be storing them @@ -1128,12 +1223,6 @@ fn calculate_normal<'a, 'cfg>( }) } -/// Whether or not the fingerprint should track the dependencies from the -/// dep-info file for this unit. -fn use_dep_info(unit: &Unit<'_>) -> bool { - !unit.mode.is_doc() -} - /// Calculate a fingerprint for an "execute a build script" unit. This is an /// internal helper of `calculate`, don't call directly. fn calculate_run_custom_build<'a, 'cfg>( @@ -1588,7 +1677,8 @@ impl DepInfoPathType { /// included. If it is false, then package-relative paths are skipped and /// ignored (typically used for registry or git dependencies where we assume /// the source never changes, and we don't want the cost of running `stat` on -/// all those files). +/// all those files). See the module-level docs for the note about +/// `-Zbinary-dep-depinfo` for more details on why this is done. /// /// The serialized Cargo format will contain a list of files, all of which are /// relative if they're under `root`. or absolute if they're elsewhere. From cd396f340a9f8250414a9e6c6c1ed127e5e31c63 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 8 Apr 2020 12:45:15 -0700 Subject: [PATCH 2/3] Fix freshness when linking is interrupted. --- src/cargo/core/compiler/fingerprint.rs | 23 ++++++- tests/testsuite/freshness.rs | 85 ++++++++++++++++++++++++++ 2 files changed, 107 insertions(+), 1 deletion(-) diff --git a/src/cargo/core/compiler/fingerprint.rs b/src/cargo/core/compiler/fingerprint.rs index 46139c9bfee..8ce062d675e 100644 --- a/src/cargo/core/compiler/fingerprint.rs +++ b/src/cargo/core/compiler/fingerprint.rs @@ -376,6 +376,24 @@ pub fn prepare_target<'a, 'cfg>( return Ok(Job::new(Work::noop(), Fresh)); } + // Clear out the old fingerprint file if it exists. This protects when + // compilation is interrupted leaving a corrupt file. For example, a + // project with a lib.rs and integration test: + // + // 1. Build the integration test. + // 2. Make a change to lib.rs. + // 3. Build the integration test, hit Ctrl-C while linking (with gcc). + // 4. Build the integration test again. + // + // Without this line, then step 4 will think the integration test is + // "fresh" because the mtime of the output file is newer than all of its + // dependencies. But the executable is corrupt and needs to be rebuilt. + // Clearing the fingerprint ensures that Cargo never mistakes it as + // up-to-date until after a successful build. + if loc.exists() { + paths::write(&loc, b"")?; + } + let write_fingerprint = if unit.mode.is_run_custom_build() { // For build scripts the `local` field of the fingerprint may change // while we're executing it. For example it could be in the legacy @@ -1501,7 +1519,10 @@ fn compare_old_fingerprint( let old_fingerprint_json = paths::read(&loc.with_extension("json"))?; let old_fingerprint: Fingerprint = serde_json::from_str(&old_fingerprint_json) .chain_err(|| internal("failed to deserialize json"))?; - debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short); + // Fingerprint can be empty after a failed rebuild (see comment in prepare_target). + if !old_fingerprint_short.is_empty() { + debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short); + } let result = new_fingerprint.compare(&old_fingerprint); assert!(result.is_err()); result diff --git a/tests/testsuite/freshness.rs b/tests/testsuite/freshness.rs index 409fb9e14fb..1098cd9d1cc 100644 --- a/tests/testsuite/freshness.rs +++ b/tests/testsuite/freshness.rs @@ -6,6 +6,7 @@ use std::io; use std::io::prelude::*; use std::net::TcpListener; use std::path::{Path, PathBuf}; +use std::process::Stdio; use std::thread; use std::time::SystemTime; @@ -2323,3 +2324,87 @@ LLVM version: 9.0 assert_eq!(check("beta1", true), beta1_name); assert_eq!(check("nightly1", true), nightly1_name); } + +#[cargo_test] +fn linking_interrupted() { + // Interrupt during the linking phase shouldn't leave test executable as "fresh". + + let listener = TcpListener::bind("127.0.0.1:0").unwrap(); + let addr = listener.local_addr().unwrap(); + + // Create a linker that we can interrupt. + let linker = project() + .at("linker") + .file("Cargo.toml", &basic_manifest("linker", "1.0.0")) + .file( + "src/main.rs", + &r#" + use std::io::Read; + + fn main() { + // Figure out the output filename. + let output = match std::env::args().find(|a| a.starts_with("/OUT:")) { + Some(s) => s[5..].to_string(), + None => { + let mut args = std::env::args(); + loop { + if args.next().unwrap() == "-o" { + break; + } + } + args.next().unwrap() + } + }; + std::fs::remove_file(&output).unwrap(); + std::fs::write(&output, "").unwrap(); + // Tell the test that we are ready to be interrupted. + let mut socket = std::net::TcpStream::connect("__ADDR__").unwrap(); + // Wait for the test to tell us to exit. + let _ = socket.read(&mut [0; 1]); + } + "# + .replace("__ADDR__", &addr.to_string()), + ) + .build(); + linker.cargo("build").run(); + + // Build it once so that the fingerprint gets saved to disk. + let p = project() + .file("src/lib.rs", "") + .file("tests/t1.rs", "") + .build(); + p.cargo("test --test t1 --no-run").run(); + // Make a change, start a build, then interrupt it. + p.change_file("src/lib.rs", "// modified"); + let linker_env = format!( + "CARGO_TARGET_{}_LINKER", + rustc_host().to_uppercase().replace('-', "_") + ); + let mut cmd = p + .cargo("test --test t1 --no-run") + .env(&linker_env, linker.bin("linker")) + .build_command(); + let mut child = cmd + .stdout(Stdio::null()) + .stderr(Stdio::null()) + .spawn() + .unwrap(); + // Wait for linking to start. + let mut conn = listener.accept().unwrap().0; + + // Interrupt the child. + child.kill().unwrap(); + // Note: rustc and the linker are still running, let them exit here. + conn.write(b"X").unwrap(); + + // Build again, shouldn't be fresh. + p.cargo("test --test t1") + .with_stderr( + "\ +[COMPILING] foo [..] +[FINISHED] [..] +[RUNNING] target/debug/deps/t1[..] +", + ) + .run(); +} From 14e86cc7037ba6a552c0a248e72747e1dec03702 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Thu, 9 Apr 2020 15:30:41 -0700 Subject: [PATCH 3/3] More fingerprint and metadata comments. --- .../compiler/context/compilation_files.rs | 17 +++- src/cargo/core/compiler/fingerprint.rs | 90 ++++++++++++------- 2 files changed, 76 insertions(+), 31 deletions(-) diff --git a/src/cargo/core/compiler/context/compilation_files.rs b/src/cargo/core/compiler/context/compilation_files.rs index 1db15b12bc6..56b3c3c615b 100644 --- a/src/cargo/core/compiler/context/compilation_files.rs +++ b/src/cargo/core/compiler/context/compilation_files.rs @@ -13,14 +13,20 @@ use crate::core::compiler::{CompileMode, CompileTarget, Unit}; use crate::core::{Target, TargetKind, Workspace}; use crate::util::{self, CargoResult}; -/// The `Metadata` is a hash used to make unique file names for each unit in a build. +/// The `Metadata` is a hash used to make unique file names for each unit in a +/// build. It is also use for symbol mangling. +/// /// For example: /// - A project may depend on crate `A` and crate `B`, so the package name must be in the file name. /// - Similarly a project may depend on two versions of `A`, so the version must be in the file name. +/// /// In general this must include all things that need to be distinguished in different parts of /// the same build. This is absolutely required or we override things before /// we get chance to use them. /// +/// It is also used for symbol mangling, because if you have two versions of +/// the same crate linked together, their symbols need to be differentiated. +/// /// We use a hash because it is an easy way to guarantee /// that all the inputs can be converted to a valid path. /// @@ -39,6 +45,15 @@ use crate::util::{self, CargoResult}; /// more space than needed. This makes not including something in `Metadata` /// a form of cache invalidation. /// +/// You should also avoid anything that would interfere with reproducible +/// builds. For example, *any* absolute path should be avoided. This is one +/// reason that `RUSTFLAGS` is not in `Metadata`, because it often has +/// absolute paths (like `--remap-path-prefix` which is fundamentally used for +/// reproducible builds and has absolute paths in it). Also, in some cases the +/// mangled symbols need to be stable between different builds with different +/// settings. For example, profile-guided optimizations need to swap +/// `RUSTFLAGS` between runs, but needs to keep the same symbol names. +/// /// Note that the `Fingerprint` is in charge of tracking everything needed to determine if a /// rebuild is needed. #[derive(Copy, Clone, Hash, Eq, PartialEq, Ord, PartialOrd)] diff --git a/src/cargo/core/compiler/fingerprint.rs b/src/cargo/core/compiler/fingerprint.rs index 8ce062d675e..12dceaed1ba 100644 --- a/src/cargo/core/compiler/fingerprint.rs +++ b/src/cargo/core/compiler/fingerprint.rs @@ -5,23 +5,30 @@ //! (needs to be recompiled) or "fresh" (it does not need to be recompiled). //! There are several mechanisms that influence a Unit's freshness: //! -//! - The `Metadata` hash isolates each Unit on the filesystem by being -//! embedded in the filename. If something in the hash changes, then the -//! output files will be missing, and the Unit will be dirty (missing -//! outputs are considered "dirty"). -//! - The `Fingerprint` is another hash, saved to the filesystem in the -//! `.fingerprint` directory, that tracks information about the inputs to a -//! Unit. If any of the inputs changes from the last compilation, then the -//! Unit is considered dirty. A missing fingerprint (such as during the -//! first build) is also considered dirty. -//! - Whether or not input files are actually present. For example a build -//! script which says it depends on a nonexistent file `foo` is always rerun. -//! - Propagation throughout the dependency graph of file modification time -//! information, used to detect changes on the filesystem. Each `Fingerprint` -//! keeps track of what files it'll be processing, and when necessary it will -//! check the `mtime` of each file (last modification time) and compare it to -//! dependencies and output to see if files have been changed or if a change -//! needs to force recompiles of downstream dependencies. +//! - The `Fingerprint` is a hash, saved to the filesystem in the +//! `.fingerprint` directory, that tracks information about the Unit. If the +//! fingerprint is missing (such as the first time the unit is being +//! compiled), then the unit is dirty. If any of the fingerprint fields +//! change (like the name of the source file), then the Unit is considered +//! dirty. +//! +//! The `Fingerprint` also tracks the fingerprints of all its dependencies, +//! so a change in a dependency will propagate the "dirty" status up. +//! +//! - Filesystem mtime tracking is also used to check if a unit is dirty. +//! See the section below on "Mtime comparison" for more details. There +//! are essentially two parts to mtime tracking: +//! +//! 1. The mtime of a Unit's output files is compared to the mtime of all +//! its dependencies' output file mtimes (see `check_filesystem`). If any +//! output is missing, or is older than a dependency's output, then the +//! unit is dirty. +//! 2. The mtime of a Unit's source files is compared to the mtime of its +//! dep-info file in the fingerprint directory (see `find_stale_file`). +//! The dep-info file is used as an anchor to know when the last build of +//! the unit was done. See the "dep-info files" section below for more +//! details. If any input files are missing, or are newer than the +//! dep-info, then the unit is dirty. //! //! Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking //! is notoriously imprecise and problematic. Only a small part of the @@ -33,6 +40,12 @@ //! //! ## Fingerprints and Metadata //! +//! The `Metadata` hash is a hash added to the output filenames to isolate +//! each unit. See the documentation in the `compilation_files` module for +//! more details. NOTE: Not all output files are isolated via filename hashes +//! (like dylibs), but the fingerprint directory always has the `Metadata` +//! hash in its directory name. +//! //! Fingerprints and Metadata are similar, and track some of the same things. //! The Metadata contains information that is required to keep Units separate. //! The Fingerprint includes additional information that should cause a @@ -69,10 +82,11 @@ //! //! When deciding what should go in the Metadata vs the Fingerprint, consider //! that some files (like dylibs) do not have a hash in their filename. Thus, -//! if a value changes, only the fingerprint will detect the change. Fields -//! that are only in Metadata generally aren't relevant to the fingerprint -//! because they fundamentally change the output (like target vs host changes -//! the directory where it is emitted). +//! if a value changes, only the fingerprint will detect the change (consider, +//! for example, swapping between different features). Fields that are only in +//! Metadata generally aren't relevant to the fingerprint because they +//! fundamentally change the output (like target vs host changes the directory +//! where it is emitted). //! //! ## Fingerprint files //! @@ -378,19 +392,35 @@ pub fn prepare_target<'a, 'cfg>( // Clear out the old fingerprint file if it exists. This protects when // compilation is interrupted leaving a corrupt file. For example, a - // project with a lib.rs and integration test: + // project with a lib.rs and integration test (two units): // - // 1. Build the integration test. - // 2. Make a change to lib.rs. - // 3. Build the integration test, hit Ctrl-C while linking (with gcc). + // 1. Build the library and integration test. + // 2. Make a change to lib.rs (NOT the integration test). + // 3. Build the integration test, hit Ctrl-C while linking. With gcc, this + // will leave behind an incomplete executable (zero size, or partially + // written). NOTE: The library builds successfully, it is the linking + // of the integration test that we are interrupting. // 4. Build the integration test again. // - // Without this line, then step 4 will think the integration test is - // "fresh" because the mtime of the output file is newer than all of its - // dependencies. But the executable is corrupt and needs to be rebuilt. - // Clearing the fingerprint ensures that Cargo never mistakes it as - // up-to-date until after a successful build. + // Without the following line, then step 3 will leave a valid fingerprint + // on the disk. Then step 4 will think the integration test is "fresh" + // because: + // + // - There is a valid fingerprint hash on disk (written in step 1). + // - The mtime of the output file (the corrupt integration executable + // written in step 3) is newer than all of its dependencies. + // - The mtime of the integration test fingerprint dep-info file (written + // in step 1) is newer than the integration test's source files, because + // we haven't modified any of its source files. + // + // But the executable is corrupt and needs to be rebuilt. Clearing the + // fingerprint at step 3 ensures that Cargo never mistakes a partially + // written output as up-to-date. if loc.exists() { + // Truncate instead of delete so that compare_old_fingerprint will + // still log the reason for the fingerprint failure instead of just + // reporting "failed to read fingerprint" during the next build if + // this build fails. paths::write(&loc, b"")?; }