Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] mtime+content tracking #8623

Closed
wants to merge 41 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
68c94e8
mtime+content tracking
gilescope Aug 16, 2020
df16720
v2: Take advantage of rustc's precalulated src hashes.
gilescope Aug 21, 2020
642cf9c
Put hash back in cache.
gilescope Aug 21, 2020
fecc6da
Fix existing tests.
gilescope Aug 22, 2020
0675c0e
Optimisation: No need to figure out if bin files are up to date if th…
gilescope Sep 5, 2020
7ca4fbd
WIP object reading
gilescope Oct 29, 2020
8073aac
Read SVH in bin object files and rlibs
gilescope Oct 30, 2020
5e94d25
SvhInBin => svh. No need for svh in filename.
gilescope Oct 30, 2020
29cef89
Use hex encoding for src hashes
gilescope Oct 31, 2020
e3c98e4
cargo fmt
gilescope Oct 31, 2020
1f369a2
use from le bytes
gilescope Oct 31, 2020
3a47d6b
rlib rather than rmeta
gilescope Oct 31, 2020
edc38a9
bugfix
gilescope Nov 1, 2020
d2875fb
Read svh from .rmeta
gilescope Nov 1, 2020
88869b9
First cut
gilescope Nov 3, 2020
36fd73d
Use existing cache. Introduce dep_info_cache to stop parsing the same…
gilescope Nov 3, 2020
c25d820
Signs that the caching of build.rs might be working but now output of…
gilescope Nov 7, 2020
4e917b7
Tentitve output hashing
gilescope Nov 8, 2020
a9cd845
Less clones
gilescope Nov 9, 2020
15a8f24
Simpler eq
gilescope Nov 9, 2020
46bb454
no need for to_string_lossy
gilescope Nov 9, 2020
73352cd
No point hashing a hash
gilescope Nov 9, 2020
b4000e9
use to_le_bytes
gilescope Nov 9, 2020
0f84c40
Merge branch 'master' into endmtime
gilescope Nov 9, 2020
b6425c1
Just use derived hash impl
gilescope Nov 9, 2020
05b9edf
Everything using dep_info_cache now.
gilescope Nov 9, 2020
46c6953
People only care if there's a miss
gilescope Nov 9, 2020
d498f08
Merge branch 'master' into endmtime
gilescope Nov 10, 2020
f289f82
loop to find
gilescope Nov 10, 2020
eb5c061
match to if let
gilescope Nov 10, 2020
5d63d44
Less prints
gilescope Nov 10, 2020
03dc307
Better log messages
gilescope Nov 10, 2020
e73de2a
less print stmts
gilescope Nov 10, 2020
4b63f8a
size and hash optional
gilescope Nov 12, 2020
1411aa5
size and hash optional
gilescope Nov 12, 2020
87600cc
Only activate when switched on
gilescope Nov 12, 2020
1379263
Updates to serialiseation format in tests.
gilescope Nov 12, 2020
58478ec
reduced duplication
gilescope Nov 13, 2020
615dd81
Working following format change of hashes
gilescope Nov 16, 2020
a750137
cargo fmt + fix tests
gilescope Nov 16, 2020
533a597
Break out to separate file as fingerprint.rs is big
gilescope Nov 17, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ lazycell = "1.2.0"
libc = "0.2"
log = "0.4.6"
libgit2-sys = "0.12.14"
md-5 = "0.9"
memchr = "2.1.3"
num_cpus = "1.0"
opener = "0.4"
Expand All @@ -55,6 +56,8 @@ semver = { version = "0.10", features = ["serde"] }
serde = { version = "1.0.82", features = ["derive"] }
serde_ignored = "0.1.0"
serde_json = { version = "1.0.30", features = ["raw_value"] }
sha-1 = "0.9"
sha2 = "0.9"
shell-escape = "0.1.4"
strip-ansi-escapes = "0.1.0"
tar = { version = "0.4.26", default-features = false }
Expand All @@ -68,12 +71,18 @@ clap = "2.31.2"
unicode-width = "0.1.5"
openssl = { version = '0.10.11', optional = true }
im-rc = "15.0.0"
ar="0.8"

# A noop dependency that changes in the Rust repository, it's a bit of a hack.
# See the `src/tools/rustc-workspace-hack/README.md` file in `rust-lang/rust`
# for more information.
rustc-workspace-hack = "1.0.0"

[dependencies.object]
version = "0.20.0"
default-features = false
features = ['read_core', 'elf', 'macho', 'pe', 'unaligned']

[target.'cfg(target_os = "macos")'.dependencies]
core-foundation = { version = "0.9.0", features = ["mac_os_10_7_support"] }

Expand Down
327 changes: 327 additions & 0 deletions src/cargo/core/compiler/content_hash.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
use std::fmt;
use std::fs;
use std::io::{self, Read};
use std::num::NonZeroU64;
use std::path::Path;
use std::path::PathBuf;
use std::str::FromStr;

use filetime::FileTime;
use log::debug;
use md5::{Digest, Md5};
use object::Object;
use serde;
use serde::{Deserialize, Serialize};
use sha1::Sha1;
use sha2::Sha256;

/// A file location with identifying properties: size and hash.
#[derive(Ord, PartialOrd, Eq, PartialEq, Clone, Debug, Hash, Serialize, Deserialize)]
pub struct Fileprint {
pub path: PathBuf, //TODO is this field needed on here?
pub size: Option<FileSize>,
pub hash: Option<FileHash>,
}

impl Fileprint {
pub(crate) fn from_md5(path: PathBuf) -> Self {
let size = CurrentFileprint::calc_size(&path);
let hash = CurrentFileprint::calc_hash(&path, FileHashAlgorithm::Md5);
Self { path, size, hash }
}
}

#[derive(Clone, Copy, Ord, PartialOrd, Eq, PartialEq, Debug, Serialize, Deserialize, Hash)]
pub enum FileHashAlgorithm {
/// Svh is embedded as a symbol or for rmeta is in the .rmeta filename inside a .rlib.
Svh,
Md5,
Sha1,
Sha256,
}

impl FromStr for FileHashAlgorithm {
type Err = anyhow::Error;

fn from_str(s: &str) -> Result<FileHashAlgorithm, Self::Err> {
match s {
"md5" => Ok(FileHashAlgorithm::Md5),
"svh" => Ok(FileHashAlgorithm::Svh),
"sha1" => Ok(FileHashAlgorithm::Sha1),
"sha256" => Ok(FileHashAlgorithm::Sha256),
_ => Err(anyhow::Error::msg("Unknown hash type")),
}
}
}

impl std::fmt::Display for FileHashAlgorithm {
fn fmt(&self, fmt: &mut std::fmt::Formatter<'_>) -> std::result::Result<(), std::fmt::Error> {
match self {
Self::Md5 => fmt.write_fmt(format_args!("md5"))?,
Self::Svh => fmt.write_fmt(format_args!("svh"))?,
Self::Sha1 => fmt.write_fmt(format_args!("sha1"))?,
Self::Sha256 => fmt.write_fmt(format_args!("sha256"))?,
};
Ok(())
}
}

// While source files can't currently be > 4Gb, bin files could be.
pub type FileSize = NonZeroU64;

#[derive(Clone, Debug, Ord, PartialOrd, Eq, PartialEq, Hash, Serialize, Deserialize)]
pub struct FileHash {
kind: FileHashAlgorithm,
// arrays > 32 are currently hard work so broken in twain.
hash_front: [u8; 32],
hash_back: [u8; 32],
}

impl FileHash {
pub fn from_hex_rev(kind: FileHashAlgorithm, hash: &str) -> Option<FileHash> {
let mut decoded = hex::decode(hash).ok()?;
decoded.reverse(); // The slice is stored as little endien.
Some(Self::from_slice(kind, &decoded[..]))
}

pub fn from_hex(kind: FileHashAlgorithm, hash: &str) -> Option<FileHash> {
let decoded = hex::decode(hash).ok()?;
Some(Self::from_slice(kind, &decoded[..]))
}

pub fn from_slice_rev(kind: FileHashAlgorithm, hash: &[u8]) -> FileHash {
let mut v = hash.to_vec();
v.reverse();
Self::from_slice(kind, &v)
}

pub fn from_slice(kind: FileHashAlgorithm, hash: &[u8]) -> FileHash {
let mut result = FileHash {
kind,
hash_front: [0u8; 32],
hash_back: [0u8; 32],
};
let len = hash.len();
let front_len = std::cmp::min(len, 32);
(&mut result.hash_front[..front_len]).copy_from_slice(&hash[..front_len]);
if len > 32 {
let back_len = std::cmp::min(len, 64);
(&mut result.hash_back[..back_len - 32]).copy_from_slice(&hash[32..back_len]);
}
result
}

pub fn write_to_vec(&self, vec: &mut Vec<u8>) {
vec.push(match self.kind {
FileHashAlgorithm::Md5 => 1,
FileHashAlgorithm::Sha1 => 2,
FileHashAlgorithm::Sha256 => 3,
FileHashAlgorithm::Svh => 4,
});
vec.extend_from_slice(&self.hash_front[..]);
vec.extend_from_slice(&self.hash_back[..]);
}
}

impl fmt::Display for FileHash {
fn fmt(&self, formatter: &mut std::fmt::Formatter<'_>) -> Result<(), fmt::Error> {
write!(
formatter,
"{}:{}{}",
self.kind,
hex::encode(self.hash_front),
hex::encode(self.hash_back)
)
}
}

fn get_svh_from_ar<R: Read>(reader: R) -> Option<FileHash> {
let mut ar = ar::Archive::new(reader);
while let Some(file) = ar.next_entry() {
match file {
Ok(file) => {
let s = String::from_utf8_lossy(&file.header().identifier());
if s.ends_with(".rmeta") {
if let Some(index) = s.rfind('-') {
return FileHash::from_hex_rev(
FileHashAlgorithm::Svh,
&s[index + 1..(s.len() - ".rmeta".len())],
);
}
}
}
Err(err) => debug!("Error reading ar: {}", err),
}
}
debug!("HASH svh not found in archive file.");
None
}

// While this looks expensive, this is only invoked for dylibs
// with an incorrect timestamp the file is the expected size.
fn get_svh_from_object_file<R: Read>(mut reader: R) -> Option<FileHash> {
let mut data = vec![];
reader.read_to_end(&mut data).ok()?;
let obj = object::read::File::parse(&data).ok()?;

for (_idx, sym) in obj.symbols() {
if let Some(name) = sym.name() {
if name.starts_with("_rust_svh") {
if let Some(index) = name.rfind('_') {
return FileHash::from_hex_rev(FileHashAlgorithm::Svh, &name[index + 1..]);
}
}
}
}
debug!("HASH svh not found in object file");
None
}

fn get_svh_from_rmeta_file<R: Read>(mut reader: R) -> Option<FileHash> {
let mut data = Vec::with_capacity(128);
data.resize(128, 0);
reader.read_exact(&mut data).ok()?;
parse_svh(&data)
}

fn parse_svh(data: &[u8]) -> Option<FileHash> {
debug!("HASHXX {:?}", data);
const METADATA_VERSION_LOC: usize = 7;

if data[METADATA_VERSION_LOC] < 6 {
debug!("svh not available as compiler not recent enough.");
return None;
}
let rust_svh_len_pos = 12;
assert_eq!(data[rust_svh_len_pos], 64_u8);
let data = &data[rust_svh_len_pos + 1..];
Some(FileHash::from_slice(FileHashAlgorithm::Svh, &data[..64]))
}

/// Cache of file properties that we know to be true.
pub struct CurrentFileprint {
pub(crate) mtime: FileTime,
/// This will be None if not yet looked up.
size: Option<FileSize>,
/// This will be None if not yet calculated for this file.
hash: Option<FileHash>,
}

impl CurrentFileprint {
pub(crate) fn new(mtime: FileTime) -> Self {
CurrentFileprint {
mtime,
size: None,
hash: None,
}
}

pub(crate) fn size(&mut self, file: &Path) -> Option<&FileSize> {
if self.size.is_none() {
self.size = Self::calc_size(file);
}
self.size.as_ref()
}

pub(crate) fn calc_size(file: &Path) -> Option<FileSize> {
std::fs::metadata(file)
.map(|metadata| NonZeroU64::new(metadata.len()))
.ok()
.flatten()
}

pub(crate) fn file_hash(&mut self, path: &Path, reference: &FileHash) -> Option<&FileHash> {
if self.hash.is_none() {
self.hash = Self::calc_hash(path, reference.kind);
}
self.hash.as_ref()
}

fn invoke_digest<D, R>(reader: &mut R, kind: FileHashAlgorithm) -> Option<FileHash>
where
D: Digest,
R: Read,
{
let mut hasher = D::new();
let mut buffer = [0; 1024];
loop {
let count = reader.read(&mut buffer).ok()?;
if count == 0 {
break;
}
hasher.update(&buffer[..count]);
}
Some(FileHash::from_slice_rev(kind, &hasher.finalize()[..]))
}

pub(crate) fn calc_hash(path: &Path, algo: FileHashAlgorithm) -> Option<FileHash> {
if let Ok(file) = fs::File::open(path) {
let mut reader: io::BufReader<fs::File> = io::BufReader::new(file);

match algo {
FileHashAlgorithm::Md5 => Self::invoke_digest::<Md5, _>(&mut reader, algo),
FileHashAlgorithm::Sha1 => Self::invoke_digest::<Sha1, _>(&mut reader, algo),
FileHashAlgorithm::Sha256 => Self::invoke_digest::<Sha256, _>(&mut reader, algo),
FileHashAlgorithm::Svh => {
if path.extension() == Some(std::ffi::OsStr::new("rlib")) {
get_svh_from_ar(reader)
} else if path.extension() == Some(std::ffi::OsStr::new("rmeta")) {
get_svh_from_rmeta_file(reader)
} else {
get_svh_from_object_file(reader)
}
}
}
} else {
debug!("HASH failed to open path {:?}", path);
None
}
}
}

#[cfg(test)]
mod test {
use super::{parse_svh, FileHash, FileHashAlgorithm};

#[test]
fn test_no_svh_below_metadata_version_6() {
let vec: Vec<u8> = vec![
114, 117, 115, 116, 0, 0, 0, 5, 0, 13, 201, 29, 16, 114, 117, 115, 116, 99, 32, 49, 46,
52, 57, 46, 48, 45, 100, 101, 118, 16, 49, 100, 54, 102, 97, 101, 54, 56, 102, 54, 100,
52, 99, 99, 98, 102, 3, 115, 116, 100, 241, 202, 128, 159, 207, 146, 173, 243, 204, 1,
0, 2, 17, 45, 48, 55, 56, 97, 54, 56, 51, 101, 99, 57, 57, 55, 50, 48, 53, 50, 4, 99,
111, 114, 101, 190, 159, 241, 243, 142, 194, 224, 233, 82, 0, 2, 17, 45, 51, 101, 97,
54, 98, 97, 57, 97, 57, 56, 99, 50, 57, 51, 54, 100, 17, 99, 111, 109, 112, 105, 108,
101, 114, 95, 98, 117, 105, 108,
];
// r u s t / metadata version | base | r u s t c ' ' 1 . 4 9 . 0 - d e v |size| svh-->
assert!(parse_svh(&vec).is_none());
}

#[test] //TODO update the bits so svh is before rust version!
fn test_svh_in_metadata_version_6() {
let vec: Vec<u8> = vec![
114, 117, 115, 116, 0, 0, 0, 6, 0, 17, 73, 215, 64, 29, 94, 138, 62, 252, 69, 252, 224,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16,
114, 117, 115, 116, 99, 32, 49, 46, 53, 48, 46, 48, 45, 100, 101, 118, 3, 115, 116,
100, 220, 173, 135, 163, 173, 242, 162, 182, 228, 1, 0, 2, 17, 45, 48, 55, 56, 97, 54,
56, 51, 101, 99, 57, 57, 55, 50, 48, 53, 50,
];
// r u s t / metadata version | base | size=64 | svh | sizee_of_version | r u s t c ' ' 1 . 4 9 . 0 - d e v | base_pointer_points_here
assert_eq!(
parse_svh(&vec),
FileHash::from_hex(FileHashAlgorithm::Svh, "1d5e8a3efc45fce0")
);
}

#[test]
fn file_hash() {
let from_str = FileHash::from_hex(FileHashAlgorithm::Svh, "0102030405060708");
let from_slice = Some(FileHash::from_slice(
FileHashAlgorithm::Svh,
&[1, 2, 3, 4, 5, 6, 7, 8],
));
assert_eq!(from_str, from_slice);
}
}
8 changes: 6 additions & 2 deletions src/cargo/core/compiler/context/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@ use std::collections::{BTreeSet, HashMap, HashSet};
use std::path::PathBuf;
use std::sync::{Arc, Mutex};

use filetime::FileTime;
use jobserver::Client;

use crate::core::compiler::content_hash::CurrentFileprint;
use crate::core::compiler::fingerprint::RustcDepInfo;
use crate::core::compiler::{self, compilation, Unit};
use crate::core::PackageId;
use crate::util::errors::{CargoResult, CargoResultExt};
Expand Down Expand Up @@ -38,7 +39,9 @@ pub struct Context<'a, 'cfg> {
/// Fingerprints used to detect if a unit is out-of-date.
pub fingerprints: HashMap<Unit, Arc<Fingerprint>>,
/// Cache of file mtimes to reduce filesystem hits.
pub mtime_cache: HashMap<PathBuf, FileTime>,
pub mtime_cache: HashMap<PathBuf, CurrentFileprint>,
/// Cache of dep_info to reduce filesystem hits.
pub dep_info_cache: HashMap<PathBuf, RustcDepInfo>,
/// A set used to track which units have been compiled.
/// A unit may appear in the job graph multiple times as a dependency of
/// multiple packages, but it only needs to run once.
Expand Down Expand Up @@ -107,6 +110,7 @@ impl<'a, 'cfg> Context<'a, 'cfg> {
build_script_outputs: Arc::new(Mutex::new(BuildScriptOutputs::default())),
fingerprints: HashMap::new(),
mtime_cache: HashMap::new(),
dep_info_cache: HashMap::new(),
compiled: HashSet::new(),
build_scripts: HashMap::new(),
build_explicit_deps: HashMap::new(),
Expand Down
Loading