Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mmap the incremental data instead of reading it. #83214

Merged
merged 4 commits into from
Sep 7, 2021

Conversation

cjgillot
Copy link
Contributor

@cjgillot cjgillot commented Mar 16, 2021

Instead of reading the full incremental state using fs::read_file, we memmap it using a private read-only file-backed map.
This allows the system to reclaim any memory we are not using, while ensuring we are not polluted by
outside modifications to the file.

Suggested in #83036 (comment) by @bjorn3

@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 16, 2021
@cjgillot
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 16, 2021
@bors
Copy link
Contributor

bors commented Mar 16, 2021

⌛ Trying commit b199292a80eee54cbdfe86579363fc8dd159abf0 with merge ae8e99fab35dd8ad92e5e9fcf0f0a16437ce106a...

@rust-log-analyzer

This comment has been minimized.

@cjgillot
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented Mar 16, 2021

⌛ Trying commit 7996f05282fa03d5a1c055c0a0314cf82d7cb0a7 with merge 50118f3b1ac7884989bbef838e63e6f348478a59...

@bors
Copy link
Contributor

bors commented Mar 16, 2021

☀️ Try build successful - checks-actions
Build commit: 50118f3b1ac7884989bbef838e63e6f348478a59 (50118f3b1ac7884989bbef838e63e6f348478a59)

@rust-timer
Copy link
Collaborator

Queued 50118f3b1ac7884989bbef838e63e6f348478a59 with parent 1d6754d, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (50118f3b1ac7884989bbef838e63e6f348478a59): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 17, 2021
@bjorn3
Copy link
Member

bjorn3 commented Mar 17, 2021

Instruction count regressions up to 0.4%. max-rss wins up to 40% on ctfe-stress and up to 15% excluding ctfe-stress. task-clock improvements up to 35% on ctfe-stress and up to 3.6% excluding ctfe-stress with a few regressions up to 2.8%, which is within noise for task-clock.

@cjgillot
Copy link
Contributor Author

r? @michaelwoerister

@michaelwoerister
Copy link
Member

This looks good to me! We are already using memory mapped files for crate metadata.

One question: why use memmap2 even though we already have a dependency on memmap? Is the data readonly? In that case we don't need the MmapOptions::map_copy_read_only() -- the MmapOptions::map() call also available in memmap should do the trick.

@bjorn3
Copy link
Member

bjorn3 commented Mar 17, 2021

memmap is no longer maintained. memmap2 is a fork that is still maintained. https://rustsec.org/advisories/RUSTSEC-2020-0077.html

The file should not be written to, but map_copy_read_only is a safe guard that may or may not prevent crashes or other UB when it is written to by another process. The man page says "It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.". It seems to not help on Linux, but may on other systems.

@michaelwoerister
Copy link
Member

memmap is no longer maintained. memmap2 is a fork that is still maintained. https://rustsec.org/advisories/RUSTSEC-2020-0077.html

Good to know! I think in that case we should switch the entire codebase.

The file should not be written to, but map_copy_read_only is a safe guard that may or may not prevent crashes or other UB when it is written to by another process. The man page says "It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.". It seems to not help on Linux, but may on other systems.

I wonder how to interpret this. I suspect that neither map_copy_read_only nor a simple map can safeguard against other processes modifying the underlying file. I guess the only way to protect against the file being modified is to create a copy of it's data somewhere (unless there's a copy-on-write file system underneath -- but we can't assume that). But the creation of our private copy of the file's data is exactly we'd like to avoid with this PR.

I think the current implementation of incr. comp. makes sure that no rustc processes will try to modify files in the cache directory once they are "finalized" but we should review that before merging this PR.

@rust-log-analyzer

This comment has been minimized.

@cjgillot
Copy link
Contributor Author

memmap is no longer maintained. memmap2 is a fork that is still maintained. https://rustsec.org/advisories/RUSTSEC-2020-0077.html

Good to know! I think in that case we should switch the entire codebase.

Filed #83236

The file should not be written to, but map_copy_read_only is a safe guard that may or may not prevent crashes or other UB when it is written to by another process. The man page says "It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.". It seems to not help on Linux, but may on other systems.

I wonder how to interpret this. I suspect that neither map_copy_read_only nor a simple map can safeguard against other processes modifying the underlying file. I guess the only way to protect against the file being modified is to create a copy of it's data somewhere (unless there's a copy-on-write file system underneath -- but we can't assume that). But the creation of our private copy of the file's data is exactly we'd like to avoid with this PR.

I think the current implementation of incr. comp. makes sure that no rustc processes will try to modify files in the cache directory once they are "finalized" but we should review that before merging this PR.

rustc is not supposed to modify the incremental cache for another instance. This can only happen if the uses starts several rustc/cargo instances at the same moment on the same crate.

During one given session, the dep-graph and work product index are deserialized and the memory map is dropped. The only tricky case is the query result cache, which is kept as-is for the entire compilation. As the memory map needs to be dropped before removing the backing file, the cache promotions are moved to rustc_incremental::persist::save::save_dep_graph before dropping the memmap.

The current state passes tests in Linux. I don't have a Windows box to test on.

@michaelwoerister
Copy link
Member

rustc is not supposed to modify the incremental cache for another instance. This can only happen if the uses starts several rustc/cargo instances at the same moment on the same crate.

rustc actually should be handle to synchronize even these cases via file locks:

/// Allocate the lock-file and lock it.
fn lock_directory(sess: &Session, session_dir: &Path) -> Result<(flock::Lock, PathBuf), ()> {
let lock_file_path = lock_file_path(session_dir);
debug!("lock_directory() - lock_file: {}", lock_file_path.display());
match flock::Lock::new(
&lock_file_path,
false, // don't wait
true, // create the lock file
true,
) {
// the lock should be exclusive
Ok(lock) => Ok((lock, lock_file_path)),
Err(err) => {
sess.err(&format!(
"incremental compilation: could not create \
session directory lock file: {}",
err
));
Err(())
}
}
}

As the memory map needs to be dropped before removing the backing file, the cache promotions are moved to rustc_incremental::persist::save::save_dep_graph before dropping the memmap.

I wonder if we can encode a proper protocol at the type level here: The memmap gets wrapped in a type with a Drop implementation. By default the drop implementation does nothing, but if the compiler decides that there's a proper replacement file, it will (1) drop the memmap and (2) replace the existing file with the replacement. This way we can write all new versions of a given file to a temporary file and then let the Drop impl take care of swapping things out. That would make it less likely that someone write to the old file while the memmap is still live.

I'm not sure how compatible that would be the current setup.

In general I think it would be great if we made sure that it's OK to keep an open memmap to the previous versions of all cache files. That would open up the possibility to encode the dep-graph in a way that does not need any up-front decoding at all and still be lazy about actually loading data from the disk.

@cjgillot cjgillot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 28, 2021
@michaelwoerister
Copy link
Member

OK, let's give it a try.

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Aug 30, 2021

📌 Commit bcefd48 has been approved by michaelwoerister

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 30, 2021
@bors
Copy link
Contributor

bors commented Aug 31, 2021

⌛ Testing commit bcefd48 with merge 544326c21b625fcf159686fe790f1672fba194e9...

@m-ou-se
Copy link
Member

m-ou-se commented Aug 31, 2021

I think it would be sensible to

  • Merge this PR at the beginning of a cycle so there's enough time to react to errors popping up in the wild, and

Note that we're at the very end of a cycle right now. The beta cutoff is in just a few days. So this change is going almost directly into beta.

@cjgillot
Copy link
Contributor Author

@bors r-
Let's wait a few days.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 31, 2021
@bors
Copy link
Contributor

bors commented Aug 31, 2021

☀️ Try build successful - checks-actions
Build commit: 544326c21b625fcf159686fe790f1672fba194e9 (544326c21b625fcf159686fe790f1672fba194e9)

@cjgillot
Copy link
Contributor Author

cjgillot commented Sep 6, 2021

Beta has branched.
@bors r=michaelwoerister rollup=never

@bors
Copy link
Contributor

bors commented Sep 6, 2021

📌 Commit bcefd48 has been approved by michaelwoerister

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 6, 2021
@bors
Copy link
Contributor

bors commented Sep 6, 2021

⌛ Testing commit bcefd48 with merge 11bbb52...

@bors
Copy link
Contributor

bors commented Sep 7, 2021

☀️ Test successful - checks-actions
Approved by: michaelwoerister
Pushing 11bbb52 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 7, 2021
@bors bors merged commit 11bbb52 into rust-lang:master Sep 7, 2021
@rustbot rustbot added this to the 1.57.0 milestone Sep 7, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (11bbb52): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@cjgillot cjgillot deleted the dep-map branch September 9, 2021 07:08
Err(err) if err.kind() == io::ErrorKind::NotFound => return Ok(None),
Err(err) => return Err(err),
};
// SAFETY: This process must not modify nor remove the backing file while the memory map lives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it sound like it is okay when other processes modify or remove the file. Shouldn't the comment be more like

// SAFETY: No process must modify or remove the backing file while the memory map lives.
// We ensure this to be the case for rustc itself. There is no way to prevent another
// process from modifying this file, so UB can arise if they do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.