Rust cache invalidated between native and Wasm builds #971

tiziano88 · 2020-05-11T22:43:59Z

To reproduce:

cargo build --release --target=wasm32-unknown-unknown --package=aggregator
- Finished release [optimized] target(s) in 43.39s (slow)
cargo build --release --target=wasm32-unknown-unknown --package=aggregator
- Finished release [optimized] target(s) in 0.14s (fast)
cargo build --release --package=aggregator_backend
- Finished release [optimized] target(s) in 1m 15s (slow)
cargo build --release --package=aggregator_backend
- Finished release [optimized] target(s) in 0.15s (fast)
cargo build --release --target=wasm32-unknown-unknown --package=aggregator
- Finished release [optimized] target(s) in 43.40s (expected: fast; actual: slow again)

I am guessing that switching compilation target causes some dependencies to be rebuilt (looks like libc may be a potential culprit).

@rbehjati could you look into it? It will become more and more relevant as we switch to the Rust version of main for the oak_loader, as we will interleave compiling native and Wasm code even more often then.

The text was updated successfully, but these errors were encountered:

tiziano88 · 2020-05-14T10:43:39Z

Perhaps we should have two separate Cargo workspaces, one for x86 and one for wasm?

cc @project-oak/core

tiziano88 · 2020-05-14T21:13:12Z

We could have the following top-level directories:

runtime (Cargo workspace, compiled to x86) -- basically the current oak top-level directory, after removing the remaining C++ code and renaming
sdk (Cargo workspace, compiled to Wasm) -- we could consider dropping the rust subfolder at some point, but we can keep it for the time being
abi (stand-alone crate, no need for workspace) -- moved from oak_abi which is currently within server
examples (Cargo workspace, compiled to Wasm, except extra binaries such as the aggregator back-end, which will be a stand-alone crate not nested in the examples workspace)

Thoughts?

daviddrysdale · 2020-05-15T06:12:50Z

Couple of cross-refs:

Link to earlier discussion.
Dividing up top-level directories would presumably help with Generated Rust docs include detritus from examples #765

tiziano88 · 2020-05-15T10:27:12Z

@rbehjati could you try and pull out the abi crate to be top level in a PR to start with? then we can split the top-level Cargo workspace file into smaller parts, and eventually do any other moves / renames.

tiziano88 · 2020-05-28T08:35:10Z

I am not sure things are in a consistent state after #1034; is oak_abi part of the workspace or not now? It seems I am not able to build or do anything with it directly:

cargo build
error: current package believes it's in a workspace when it's not:
current:   /home/tzn/src/oak/oak_abi/Cargo.toml
workspace: /home/tzn/src/oak/Cargo.toml

this may be fixable by adding `oak_abi` to the `workspace.members` array of the manifest located at: /home/tzn/src/oak/Cargo.toml
Alternatively, to keep it out of the workspace, add the package to the `workspace.exclude` array, or add an empty `[workspace]` table to the package's manifest.

tiziano88 · 2020-05-28T08:36:15Z

This may also be the cause of #1037

tiziano88 · 2020-05-28T13:56:44Z

FYI I just realised that cargo build has a --target-dir flag that may be sufficient to fix this issue in the short term, we can just use different dirs for wasm and native code.

It should help with project-oak#971, until a more proper solution is in place.

blaxill · 2020-05-28T19:26:09Z

@tiziano88 25c1d06#diff-35e128d19b2a49f8257e7a6ed82e3f44
There was an issue causing temporary files to accumulate when doing this previously, although it might have been due to using a fresh temp directory each time (and not manually erasing it), rather than using a consistent --target-dir

It should help with project-oak#971, until a more proper solution is in place.

tiziano88 · 2020-05-28T21:53:02Z

Good catch @blaxill , thanks! I do think it is solved by reusing the same directory, hopefully

It should help with #971, until a more proper solution is in place. In my experiments, this brings the time to run `./scripts/run_examples` without having made any changes from 440s down to 39s.

tiziano88 · 2020-06-05T09:41:50Z

Now that examples is a separate workspace (#1045), can we remove the separate target dir (#1044), and still benefit from separate caches?

rbehjati · 2020-06-05T11:29:18Z

I am not sure if --target-dir=examples/target has been of much help. But specifying a --target prevents invalidation of the cache. No target-dir is specified in the following commands, all targets are generated in examples/target, and cache is not invalidated:

$ cargo build --release --target=x86_64-unknown-linux-musl --manifest-path="examples/aggregator/backend/Cargo.toml"
    Finished release [optimized] target(s) in 0.15s
$ cargo build --release --target=wasm32-unknown-unknown --manifest-path="examples/translator/module/rust/Cargo.toml"
   Compiling same-file v1.0.6
   Compiling maplit v1.0.2
   Compiling cfg-if v0.1.10
   Compiling bytes v0.5.4
   Compiling byteorder v1.3.4
   Compiling serde v1.0.111
   Compiling fmt v0.1.0
   Compiling getrandom v0.1.14
   Compiling log v0.4.8
   Compiling serde_derive v1.0.111
   Compiling walkdir v2.3.1
   Compiling rand_core v0.5.1
   Compiling oak_utils v0.1.0 (/opt/my-project/oak_utils)
   Compiling prost v0.6.1
   Compiling prost-types v0.6.1
   Compiling oak_abi v0.1.0 (/opt/my-project/oak_abi)
   Compiling oak v0.1.0 (/opt/my-project/sdk/rust/oak)
   Compiling translator_common v0.1.0 (/opt/my-project/examples/translator/common)
   Compiling translator v0.1.0 (/opt/my-project/examples/translator/module/rust)
    Finished release [optimized] target(s) in 17.64s
$ cargo build --release --target=wasm32-unknown-unknown --manifest-path="examples/translator/module/rust/Cargo.toml"
    Finished release [optimized] target(s) in 0.13s
$ cargo build --release --target=x86_64-unknown-linux-musl --manifest-path="examples/aggregator/backend/Cargo.toml"
    Finished release [optimized] target(s) in 0.15s

If instead of cargo build --release --target=x86_64-unknown-linux-musl --manifest-path="examples/aggregator/backend/Cargo.toml" I use cargo build --release --manifest-path="examples/aggregator/backend/Cargo.toml" (which does not specify a target), then the cache will be invalidated after each alternating command.

--target-dir can still help if we use different dirs (e.g., examples/target/aggregator and examples/target/backend):

$ cargo build --release --target-dir=examples/target/aggregator --target=wasm32-unknown-unknown --manifest-path="examples/translator/module/rust/Cargo.toml"
    Finished release [optimized] target(s) in 31.73s
$ cargo build --release --target-dir=examples/target/backend --manifest-path="examples/aggregator/backend/Cargo.toml"
    Finished release [optimized] target(s) in 57.85s
$ cargo build --release --target-dir=examples/target/backend --manifest-path="examples/aggregator/backend/Cargo.toml"
    Finished release [optimized] target(s) in 0.16s
$ cargo build --release --target-dir=examples/target/aggregator --target=wasm32-unknown-unknown --manifest-path="examples/translator/module/rust/Cargo.toml"
    Finished release [optimized] target(s) in 0.13s

So, most of the changes in #1044 are still relevant. We could remove --target-dir=examples/targets, but that would not really make a difference.

[edit: I noticed that I've included results from experimenting with translator here, but aggregator behaves similarly.]

tiziano88 · 2020-06-05T11:34:01Z

I think if you undo your changes, then #1044 definitely makes a difference. Now that your changes are in, as you say, it does not seem to make a difference any more, since target dirs are already separated by workspace anyways. Hence my point that we can now remove that flag. But I haven't actually tried this myself.

rbehjati · 2020-06-05T11:52:57Z

My hypothesis is that #1044 would not have made a huge difference if you had not added --target=x86_64-unknown-linux-musl (but I have not tried all combinations either 😄). Right now, we have to keep that flag, but we can remove --target-dir=examples/target. If that is what you mean. Does it make sense to keep --target=x86_64-unknown-linux-musl for aggregator backend?

tiziano88 · 2020-06-05T11:57:56Z

Could you try to remove --target-dir but keep --target and see if the cache gets invalidated? I think it would be good to know. Is your hypothesis that the aggregator backend was the only thing that was invalidating the cache then?

rbehjati · 2020-06-05T13:20:20Z

The cache does not seem to get invalidated in that case.

--target-dir kept, --target kept (currently in master):

$ rm -rf target && rm -rf examples/target
$ time ./scripts/run_examples
real    4m22.916s
user    31m14.892s
sys     0m42.810s
$ time ./scripts/run_examples
real    0m43.430s
user    0m7.713s
sys     0m3.313s

--target-dir removed, --target kept (timing is similar to the previous case):

$ rm -rf target && rm -rf examples/target
$ time ./scripts/run_examples
real    4m15.375s
user    31m21.495s
sys     0m42.732s
$ time ./scripts/run_examples
real    0m43.149s
user    0m7.744s
sys     0m3.265s

--target-dir kept, --target removed (much slower):

$ rm -rf target && rm -rf examples/target
$ time ./scripts/run_examples
real    4m33.951s
user    33m22.486s
sys     0m44.127s
$ time ./scripts/run_examples
real    2m51.976s
user    16m34.688s
sys     0m22.346s

--target-dir removed, --target removed:

$ rm -rf target && rm -rf examples/target
$ time ./scripts/run_examples
real    4m25.595s
user    33m28.209s
sys     0m44.055s
$ time ./scripts/run_examples
real    2m52.324s
user    16m36.129s
sys     0m22.476s

I noticed that the change you made to aggregator backend improved things, but I was not timing the commands before. So I could not really measure the improvement... all I have is just a hunch!

tiziano88 · 2020-06-05T13:24:47Z

If the tests were done in master, after your change that separated the examples workspace already, then I don't think this experiment is particularly conclusive though, is it?

rbehjati · 2020-06-05T16:30:10Z

Yes. The tests were done in master. What do we want to reach a conclusion about? Whether to remove --target-dir or how do cargo build and its flags work?
I think the tests give enough evidence that --target-dir can be removed. I included the last two cases with target removed just for info, not as evidence for removing --target-dir.

tiziano88 · 2020-06-05T16:36:49Z

I guess I'm still confused how this interacted with the --target=x86_64-unknown-linux-musl, as you mentioned in #971 (comment) .

rbehjati · 2020-06-05T18:23:17Z

Yeah. Me too. Here is more data that may or may not help with the confusion. Please don't ask me to do a full factorial experiment!

Before #1044 (on commit 0b133c0: no --target-dir, and no --target for backend):

$ rm -rf target && rm -rf examples/target
$ time ./scripts/run_examples
real    5m30.142s
user    44m1.707s
sys     0m58.771s
$ time ./scripts/run_examples
real    3m41.917s
user    27m42.284s
sys     0m35.379s

Still on commit 0b133c0, but after adding --target for backend:

$ rm -rf target && rm -rf examples/target
$ time ./scripts/run_examples
real    3m32.346s
user    22m22.393s
sys     0m34.561s
$ time ./scripts/run_examples
real    0m39.415s
user    0m6.175s
sys     0m3.168s

On #1044 (both --target-dir, and --target are present):

$ rm -rf target && rm -rf examples/target
$ time ./scripts/run_examples
real    3m52.818s
user    25m40.070s
sys     0m38.064s
$ time ./scripts/run_examples
real    0m39.501s
user    0m6.185s
sys     0m3.084s

Still on #1044, but --target for backend is removed:

$ rm -rf target && rm -rf examples/target
$ time ./scripts/run_examples
real    5m40.568s
user    47m30.893s
sys     1m3.099s
$ time ./scripts/run_examples
real    2m53.625s
user    23m40.563s
sys     0m29.746s

rbehjati · 2020-06-05T18:32:43Z

Getting back to your earlier question... the aggregator backend seems to have a significant impact on invalidating the cache, if not being the only thing. Do you agree?

tiziano88 · 2020-06-08T12:55:53Z

Does it mean that compiling the aggregator to x86 and then wasm is what is causing the issue?

tiziano88 · 2020-06-08T15:16:32Z

Apart from the cache issue, I think we should at least split out the crates so that the runtime / loader have a dedicated Cargo.lock file, which is the list of dependencies that are actually part of the TCB. Specifically, I think this would mean trimming down this list so that it only contains oak_loader and oak_runtime:

oak/Cargo.toml

Lines 1 to 10 in a2f74a4

    
           [workspace] 
        
           members = [ 
        
             "oak/server/rust/oak_loader", 
        
             "oak/server/rust/oak_runtime", 
        
             "runner", 
        
             "sdk/rust/oak", 
        
             "sdk/rust/oak_tests", 
        
             "third_party/roughenough", 
        
           ] 
        
           exclude = ["oak_abi", "oak_utils"]

@rbehjati does this make sense?

rbehjati · 2020-06-08T17:42:26Z

I agree. I'll make SDK and runner separate crates. I suppose we don't want to make third_party a separate workspace. Do we?

After this we can perform a thorough analysis to understand what is contributing to the invalidation of the cache.

tiziano88 · 2020-06-19T18:03:09Z

Note this is still an issue, try and run the following command twice in a row: ./scripts/run_example -e trusted_information_retrieval . Probably because of the backend, which is x86 but invalidates the wasm cache, since its target-dir is set to examples/target.

cc @ipetr0v

tiziano88 · 2020-06-19T18:34:43Z

@rbehjati if possible, let's try to build an understanding of what's happening, rather than just trying to get the numbers down.

My theory (not tested) is still that some dependency has a feature flag that causes it to be compiled differently in wasm vs x86, and / or has optional dependencies that cause part of the cache to go missing in one case.

rbehjati · 2020-06-19T21:24:53Z

Your theory seems consistent with my observation that specifying the target helps. From what I can see the change in #1179 solves the problem with trusted_information_retrieval (at least temporarily). Clearly fewer/no crates are recompiled when specifying the appropriate --target.

Apparently, trusted_information_retrieval cannot be compiled for --target=x86_64-unknown-linux-musl (which we previously had), but it can be compiled for --target=x86_64-unknown-linux-gnu (I don't know what is the difference between these two architectures).

Getting back to your theory, is it expected that everything should be compiled the same for wasm and x86? In other words, do we have a requirement for excluding anything that compiles differently for different architectures?

tiziano88 · 2020-06-19T21:43:51Z

Thanks for putting together #1179, AFAICT it is doing two things at once:

remove --target-dir flag
add --target flag

Which one is actually having the desired effect? Or is it really the combination of --target and no --target-dir that solves the issue?

tiziano88 · 2020-06-19T23:38:02Z

For reference, I think this is pretty much what I thought it was happening: https://stackoverflow.com/questions/60869985/why-is-cargo-build-cache-invalidating

Though it does not really explain why #1179 actually makes things work correctly 😅

rbehjati · 2020-06-22T16:00:52Z

Which one is actually having the desired effect? Or is it really the combination of --target and no --target-dir that solves the issue?

I have not seen --target-dir=./examples/target to have any impact (at least after separating the workspaces).

Though it does not really explain why #1179 actually makes things work correctly 😅

I think when --target is specified for each architecture, the files that are specific to that architecture go into some examples/target/<arch-name>/release directory, as opposed to being included in examples/target/release. So, in this case, examples/target/release only contains the files that are compiled the same for different architectures. However, I could not find any explanation confirming this anywhere in any of the cargo documentations I looked at. This is just my conclusion based on observations from some experiments reported below.
I have included the file structures for each case:

Using --target-dir=./examples/target and without specifying a target for backend. In our current setup, this would be the same as not specifying a --target-dir. In this case many crates are recompiled when running ./scripts/run_example -e trusted_information_retrieval for the second time.

398M	examples/target/release
 26M	examples/target/wasm32-unknown-unknown
---------------------------
424M	examples/target/

246M	oak/server/target/release
214M	oak/server/target/x86_64-unknown-linux-musl

Using --target=x86_64-unknown-linux-gnu for the backend. In this case none of the crates are recompiled when running ./scripts/run_example -e trusted_information_retrieval twice in a row.

227M	examples/target/release
 26M	examples/target/wasm32-unknown-unknown
183M	examples/target/x86_64-unknown-linux-gnu
---------------------------
436M	examples/target/

246M	oak/server/target/release
214M	oak/server/target/x86_64-unknown-linux-musl

The difference between this case and the previous case is that everything that is now in examples/target/x86_64-unknown-linux-gnu has been in examples/target/release in the previous case (almost). I don't know how exactly cargo works, but it seems that by specifying --target for the backend, we are forcing backend-specific files, which compile differently based on the target architecture, to go into a separate directory.

Using --target-dir="examples/target/${EXAMPLE}/wasm" for the Oak module, and --target-dir="examples/target/${EXAMPLE}/backend" for the backend, without specifying a target for the backend. None of the crates are recompiled in this case.

104M    examples/target/trusted_information_retrieval/wasm/release
26M     examples/target/trusted_information_retrieval/wasm/wasm32-unknown-unknown

381M    examples/target/trusted_information_retrieval/backend/release
---------------------------
511M	examples/target/

246M	oak/server/target/release
214M	oak/server/target/x86_64-unknown-linux-musl

In this case nothing is shared between the backend and the wasm module. It is the same as compiling cargo build for the oak module and the backend separately each in a clear environment (similar to case 4 below).

If I run only cargo build --release --target=x86_64-unknown-linux-gnu --manifest-path="examples/trusted_information_retrieval/backend/Cargo.toml", I get the following:

210M    examples/target/release
183M    examples/target/x86_64-unknown-linux-gnu

tiziano88 · 2020-06-22T21:51:45Z

So is --target=x86_64-unknown-linux-gnu the same or different than no --target flag, on an x86-linux machine?

rbehjati · 2020-06-23T06:47:51Z

It is the same.

The following command (source) gives the default target (which is used when --target is not specified):

rustc -Z unstable-options --print target-spec-json | grep llvm-target

On my linux machine, and our docker image, the output from this command is

"llvm-target": "x86_64-unknown-linux-gnu",

This is also the target specified in scripts/run_tests_tsan and our .cargo file.

tiziano88 · 2020-06-24T13:27:53Z

It is the same.

In that case, does it mean that we actually don't need to specify --target in #1179 ? Just removing --target-dir would have been enough?

rbehjati · 2020-06-24T13:43:41Z

I don't think so. If we don't specify --target all the target files go to the same directory, and the cache will be invalidated again. We are using --target=XYZ to put the target files in different dirs.

rbehjati · 2020-07-07T10:52:16Z

I have been digging deeper into this and here are some of my findings.

The following is the list of packages that are recompiled when running cargo build --release --target=wasm32-unknown-unknown --manifest-path=examples/aggregator/module/rust/Cargo.toml -Z unstable-options after running cargo build --release --manifest-path=examples/aggregator/backend/Cargo.toml.

# in examples/target/release/build

anyhow-f3bb683c8c9d193b
getrandom-3c98b8e535bc4756/
indexmap-aaacfd4ae6512dee/
libc-079e747c65dddbcb/
log-85d8f5fb872ec954/
proc-macro2-06ed86eb6c32d03e/
prost-build-bae3641e35f69cf3/
syn-f0a8bab71fd7020d/

After this, running cargo build --release --manifest-path=examples/aggregator/backend/Cargo.toml again results in rebuilding the same packages.

Each of these folders has the following content:

build-script-build
build_script_build-f0a8bab71fd7020d
build_script_build-f0a8bab71fd7020d.d

The binary files build-script-build and build_script_build-f0a8bab71fd7020d are rewritten when switching between the cargo build ... commands.

Corresponding to each of the folders, there is another folder inside examples/target/release/build with the same crate name, but a different fingerprint (e.g., anyhow-3ef89ee82156c6e6). All these folders have the following content, and are not rewritten when switching between cargo build ... commands:

invoked.timestamp  
out/
output  
root-output  
stderr

These seem to be the output from some build.rs script. When specifying --target in a cargo build (or cargo run) command, the subdirectories in examples/target/<TARGET>/release/build are all of the second form (i.e., are the output from a Rust build script).

I am still not entirely sure how cargo decides whether to rebuild a package or not, however the following note from the cargo book might be relevant.

When not using --target, this has a consequence that Cargo will share your dependencies with build scripts and proc macros. RUSTFLAGS will be shared with every rustc invocation. With the --target flag, build scripts and proc macros are built separately (for the host architecture), and do not share RUSTFLAGS.

More specifically it is advised that:

If you have args that you do not want to pass to build scripts or proc macros and are building for the host, pass --target with the host triple.

We set RUSTFLAGS in some of our scripts, but I am not sure if they are causing the problem. For now, the best solution seems to be to keep specifying --target with the host triple.

rbehjati · 2020-07-09T21:44:42Z

For now we are happy with the solution using --target, so I close this. I have shared a more detailed report of my investigation with the team.

tiziano88 added lang/Rust P1 labels May 11, 2020

tiziano88 assigned rbehjati May 11, 2020

tiziano88 mentioned this issue May 26, 2020

Restructure Oak directories #1023

Closed

rbehjati mentioned this issue May 27, 2020

Make oak_abi a standalone crate #1034

Merged

6 tasks

tiziano88 mentioned this issue May 28, 2020

Exclude oak_abi and oak_utils from the workspace #1039

Merged

6 tasks

tiziano88 added a commit to tiziano88/oak that referenced this issue May 28, 2020

Use separate target dir when compiling Wasm code

ed1f318

It should help with project-oak#971, until a more proper solution is in place.

tiziano88 mentioned this issue May 28, 2020

Use separate target dir when compiling Wasm code #1044

Merged

tiziano88 added a commit to tiziano88/oak that referenced this issue May 28, 2020

Use separate target dir when compiling Wasm code

a2da4be

It should help with project-oak#971, until a more proper solution is in place.

tiziano88 added a commit to tiziano88/oak that referenced this issue May 28, 2020

Use separate target dir when compiling Wasm code

21fa650

It should help with project-oak#971, until a more proper solution is in place.

This was referenced Jun 1, 2020

Make examples a separate workspace #1045

Merged

Update scripts after reorganization of the code #1064

Merged

This was referenced Jun 9, 2020

Move the remaining crates to separate workspaces #1122

Merged

Move top-level Cargo.toml to oak/server #1139

Merged

rbehjati mentioned this issue Jun 23, 2020

Specify build target for backend components in examples #1179

Merged

rbehjati mentioned this issue Jun 29, 2020

Downgrade error on gRPC client node invocation channel closure #1211

Merged

6 tasks

rbehjati mentioned this issue Jul 7, 2020

Specify the host triple as the target #1233

Merged

rbehjati closed this as completed Jul 9, 2020

rbehjati mentioned this issue Jan 11, 2022

Unify Rust workspaces #2475

Closed

Rust cache invalidated between native and Wasm builds #971

Rust cache invalidated between native and Wasm builds #971

Comments

tiziano88 commented May 11, 2020

tiziano88 commented May 14, 2020

tiziano88 commented May 14, 2020

daviddrysdale commented May 15, 2020

tiziano88 commented May 15, 2020

tiziano88 commented May 28, 2020

tiziano88 commented May 28, 2020

tiziano88 commented May 28, 2020

blaxill commented May 28, 2020

tiziano88 commented May 28, 2020

tiziano88 commented Jun 5, 2020

rbehjati commented Jun 5, 2020 • edited Loading

tiziano88 commented Jun 5, 2020

rbehjati commented Jun 5, 2020

tiziano88 commented Jun 5, 2020

rbehjati commented Jun 5, 2020

tiziano88 commented Jun 5, 2020

rbehjati commented Jun 5, 2020

tiziano88 commented Jun 5, 2020

rbehjati commented Jun 5, 2020

rbehjati commented Jun 5, 2020

tiziano88 commented Jun 8, 2020

tiziano88 commented Jun 8, 2020

rbehjati commented Jun 8, 2020

tiziano88 commented Jun 19, 2020

tiziano88 commented Jun 19, 2020

rbehjati commented Jun 19, 2020

tiziano88 commented Jun 19, 2020

tiziano88 commented Jun 19, 2020

rbehjati commented Jun 22, 2020

tiziano88 commented Jun 22, 2020

rbehjati commented Jun 23, 2020

tiziano88 commented Jun 24, 2020

rbehjati commented Jun 24, 2020

rbehjati commented Jul 7, 2020

rbehjati commented Jul 9, 2020

rbehjati commented Jun 5, 2020 •

edited

Loading