-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream files into the local store while capturing them #12563
Conversation
918e4ce
to
585cae4
Compare
585cae4
to
3eed58e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting results! Is this cherry-pickable to 2.6? Seems fine to not cherry-pick the benchmarks portion.
Should be, yea. |
3eed58e
to
17f845e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimisation!
Also FWIW if the problem is the number of allocations, rather than size, we could re-use a pool of |
Refactor the test harness to allow for creating larger files (without allocating their full contents in memory), and add benchmarks for a series of file counts, sizes, and repetitions. Preparation for #12563. [ci skip-build-wheels]
…in two passes). [ci skip-build-wheels]
…ation via Node. [ci skip-build-wheels]
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
17f845e
to
2e87455
Compare
) A private repository experienced OOM issues in CI with a large number of tests. Allocation profiling highlighted that a large batch of `32MB`/`64MB` allocations was occurring around the same time, all caused by the fact that `PosixFS::read_file` slurps an entire file into memory as `FileContent`. Because local process execution uses `PosixFS::read_file` to capture sandbox outputs, a bunch of processes finishing at once could lead to memory usage spikes. To fix that, this change removes `PosixFS::read_file` (which was only ever used to digest files), and replaces it with `{Store,local::ByteStore,ShardedLmdb}::store*` methods which make two-ish passes over a `Read` instance in order to digest and capture it. The methods take an `immutable_data` parameter so that callers can indicate that they know that data will not change, which allows for avoiding hashing it on the second pass. Captures from process sandboxes are treated as immutable, while memoized captures from the workspace are not. ---- ``` snapshot_capture/snapshot_capture((100, 100, false, 100)) time: [1.8909 s 1.9000 s 1.9109 s] change: [-27.459% -25.841% -24.085%] (p = 0.00 < 0.05) Performance has improved. snapshot_capture/snapshot_capture((20, 10000000, true, 10)) time: [1.2539 s 1.2655 s 1.2782 s] change: [-62.138% -61.318% -60.505%] (p = 0.00 < 0.05) Performance has improved. snapshot_capture/snapshot_capture((1, 200000000, true, 10)) time: [3.5281 s 3.5773 s 3.6299 s] change: [-13.420% -11.985% -10.434%] (p = 0.00 < 0.05) Performance has improved. ```
…12563) (#12572) A private repository experienced OOM issues in CI with a large number of tests. Allocation profiling highlighted that a large batch of `32MB`/`64MB` allocations was occurring around the same time, all caused by the fact that `PosixFS::read_file` slurps an entire file into memory as `FileContent`. Because local process execution uses `PosixFS::read_file` to capture sandbox outputs, a bunch of processes finishing at once could lead to memory usage spikes. To fix that, this change removes `PosixFS::read_file` (which was only ever used to digest files), and replaces it with `{Store,local::ByteStore,ShardedLmdb}::store*` methods which make two-ish passes over a `Read` instance in order to digest and capture it. The methods take an `immutable_data` parameter so that callers can indicate that they know that data will not change, which allows for avoiding hashing it on the second pass. Captures from process sandboxes are treated as immutable, while memoized captures from the workspace are not. ---- ``` snapshot_capture/snapshot_capture((100, 100, false, 100)) time: [1.8909 s 1.9000 s 1.9109 s] change: [-27.459% -25.841% -24.085%] (p = 0.00 < 0.05) Performance has improved. snapshot_capture/snapshot_capture((20, 10000000, true, 10)) time: [1.2539 s 1.2655 s 1.2782 s] change: [-62.138% -61.318% -60.505%] (p = 0.00 < 0.05) Performance has improved. snapshot_capture/snapshot_capture((1, 200000000, true, 10)) time: [3.5281 s 3.5773 s 3.6299 s] change: [-13.420% -11.985% -10.434%] (p = 0.00 < 0.05) Performance has improved. ``` [ci skip-rust] [ci skip-build-wheels]
pub async fn store<F, R>( | ||
&self, | ||
initial_lease: bool, | ||
data_is_immutable: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only scary part of the review. The eyeball glazes over the bool literals at the final call sites where this is plumbed. I like a bool enum to force spelling out what choice is being made more clearly, but maybe you consider that overkill here:
pants/src/rust/engine/client/src/options/args.rs
Lines 18 to 21 in 7da328c
enum Negate { | |
True, | |
False, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, if you're modifying this code and not reading closely in the 1st place, I'm sympathetic to the idea you'll get what you deserve.
A private repository experienced OOM issues in CI with a large number of tests. Allocation profiling highlighted that a large batch of
32MB
/64MB
allocations was occurring around the same time, all caused by the fact thatPosixFS::read_file
slurps an entire file into memory asFileContent
. Because local process execution usesPosixFS::read_file
to capture sandbox outputs, a bunch of processes finishing at once could lead to memory usage spikes.To fix that, this change removes
PosixFS::read_file
(which was only ever used to digest files), and replaces it with{Store,local::ByteStore,ShardedLmdb}::store*
methods which make two-ish passes over aRead
instance in order to digest and capture it.The methods take an
immutable_data
parameter so that callers can indicate that they know that data will not change, which allows for avoiding hashing it on the second pass. Captures from process sandboxes are treated as immutable, while memoized captures from the workspace are not.#12569 adds relevant benchmarks. Profiling allocation during those benchmarks shows peak memory usage of
945MB
onmain
, and362MB
on the branch. Additionally, the benchmarks all run faster (likely due to reduced allocation):