Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove glob to collect all artifacts introduced by cargo kani #2246

Merged
merged 6 commits into from
Mar 3, 2023

Conversation

celinval
Copy link
Contributor

@celinval celinval commented Feb 28, 2023

Description of changes:

This change allows us to pin-point exactly which build artifacts are related to a cargo build run. This is required to re-enable the build cache (#2036) where multiple artifacts can co-exist in the same folder.

The solution implemented here is rather hacky, but it's more reliable than other alternatives I've tried (see Call-outs). Now, kani-compiler will generate stubs in the place of binaries, shared, and static libraries when those types are requested. Those stubs will contain a JSON representation of the new type CompilerArtifactStub, which basically contain the path for the metadata.json file where the compiler stores the metadata related to a given crate.

kani-driver will parse CompilerArtifact messages to figure out which artifacts were built for the given build. For libraries, it will derive the name of the metadata file from the rmeta filepath. For other types of artifacts, it will parse the output file, convert it to CompilerArtifactStub, and extract the path for the rmeta filepath.

Resolved issues:

Resolves #1478
Resolves #2234

Related RFC:

N/A

Call-outs:

I'm not proud of this solution, and I am opened to suggestions. This is very hacky, but it is very similar to how MIRI solves this problem. MIRI also writes a JSON in the place of the binary that is meant to be generated by the compiler. This solution also works when all artifacts are fresh, since Cargo uses hardlink / copy to generate the final artifact, so the content of the file should point to the correct metadata file.

I investigated a few other options, but here is why they were not suitable:

  1. Parse CompilerArtifact and derive the name of the metadata file from it. Unfortunately, for things like static libraries, the only artifact reported is different than the one generated by the compiler which does not include the file hash.
  2. Emit a compiler message that kani-driver could parse and retrieve the name of the metadata. For this to work, we wouldn't be able to cache the final compilation artifacts, since we would need the compiler to be invoked again.
  3. Use the deprecated cargo build --build-plan. Besides being deprecated, this option doesn't quite work for cargo rustc, and the results actually differed when we build with tests enabled.
  4. Implement our own cache by using different target directory. This would be overly complicated, it would require fine tuning and it would be more error prone.
  5. Use a similar mechanism to the implemented one, and override the definition of --print=file-names to include .json to the artifact names. This would fix the main annoyance I have with the current solution which is to write plain text to file paths that are meant to be ELF. However, there is no easy to extend the compiler to do that, and this would require us to intercept the cargo call to the compiler for the call that gathers multiple information like file name, system configuration and others. IMO, not worth it.
  6. Create dummy ELF files with some information about the build. We could either extract information from the ELF or even just compare the inode / file hash of the artifact reported and all the artifacts available in the build directory. This solution would likely be as reliable as the one we implemented, but it is also way more complicated.

For more context, see this Zulip thread I created: https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo/topic/Files.20generated.20by.20the.20compiler

Testing:

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring out a solution @celinval! Just have a few comments.

| Message::BuildScriptExecuted(_)
| Message::BuildFinished(_) => {
Message::CompilerArtifact(rustc_artifact) => {
artifact = Some(rustc_artifact);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that there's a single artifact per cargo invocation? If so, I suggest adding an assertion that checks that, e.g:

assert_eq!(artifact, None);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Every package will generate at least one artifact. However, we only care about the last one, which is the one that we pass the codegen flag to. Thus, I am only running the map in the last artifact that was collected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Perhaps add a comment to mention this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -236,15 +231,6 @@ fn print_msg(diagnostic: &Diagnostic, use_rendered: bool) -> Result<()> {
Ok(())
}

/// Given a `path` with glob characters in it (e.g. `*.json`), return a vector of matching files
fn glob(path: &Path) -> Result<Vec<PathBuf>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo!

celinval added 2 commits March 1, 2023 21:08
Looking at the artifacts doesn't always work since Kani doesn't
produce any artifact for binary, tests, static / shared library.

Need to find a better solution.
@celinval
Copy link
Contributor Author

celinval commented Mar 3, 2023

I was finally able to debug this flaky issue!! It seems that we are getting the artifacts out of order. In this log, it seems that we are using the librand artifact:

2023-03-03T01:45:58.1861520Z TRACE kani_driver::call_cargo map_kani_artifact, rmeta="/Users/runner/work/kani-dev/kani-dev/build/tests/cargo-kani/dependencies/check_dummy/target/kani/x86_64-apple-darwin/debug/deps/librand-51ad98e9ae258aeb.rmeta", kani_meta="/Users/runner/work/kani-dev/kani-dev/build/tests/cargo-kani/dependencies/check_dummy/target/kani/x86_64-apple-darwin/debug/deps/rand-51ad98e9ae258aeb.kani-metadata.json

instead of the foo

2023-03-03T01:45:58.1854750Z DEBUG kani_compiler::codegen_cprover_gotoc::compiler_interface write_json, filename="/Users/runner/work/kani-dev/kani-dev/build/tests/cargo-kani/dependencies/check_dummy/target/kani/x86_64-apple-darwin/debug/deps/foo-c9378ac126d6db94.kani-metadata.json"

Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@celinval celinval merged commit ccec549 into model-checking:main Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants