-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay include_bytes
to AST lowering
#103812
Conversation
r? @TaKO8Ki (rustbot has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Some changes occurred in src/tools/clippy cc @rust-lang/clippy Some changes occurred in src/tools/rustfmt cc @rust-lang/rustfmt |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 261c30cc4c962434dc5608ba767ed7b3cd7a35be with merge f3fc6066fb6e47915c9c0f888fbb7a66f620e14f... |
☀️ Try build successful - checks-actions |
Queued f3fc6066fb6e47915c9c0f888fbb7a66f620e14f with parent 95a3a72, future comparison URL. |
Finished benchmarking commit (f3fc6066fb6e47915c9c0f888fbb7a66f620e14f): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
This looks great! Could you please try to benchmark some code that includes a large binary file with and without this PR? We don't have a benchmark for this present in the |
With this program: static BYTES: &[u8] = include_bytes!("out");
fn main() {
println!("{}", BYTES.len());
} Using GNU time to record the time and max RSS: > dd if=/dev/urandom of=src/out bs=500M count=1
...
> cargo clean && /usr/bin/time -f 'mem %M, user %Us' cargo +nightly build
Compiling big_inc v0.1.0 (/tmp/big_inc)
Finished dev [unoptimized + debuginfo] target(s) in 10.52s
mem 4265648, user 9.55s
> cargo clean && /usr/bin/time -f 'mem %M, user %Us' cargo +stage1 build
Compiling big_inc v0.1.0 (/tmp/big_inc)
Finished dev [unoptimized + debuginfo] target(s) in 3.39s
mem 2755012, user 2.70s |
Looking at this a bit further, around 50% of the remaining compile time after this patch is spent in https://github.com/llvm/llvm-project/blob/244331ae833aaf33503bbd36890e704afb66a237/llvm/lib/IR/Constants.cpp#L2970-L2971 - not sure if that could be improved at all. EDIT: To the best of my knowledge, this is due to LLVM hashing the entire constant byte by byte. This is also done on the Rust side at some points (hashing query results), but the impact is less severe. |
@clubby789 people have sent LLVM patches in the past to improve things for Rust (edit: in fact this occurs all the time). If you want to change llvm too and need help, you could maybe ask the llvm working group on zulip. |
Hopefully the LLVM code could be improved (e.g. a faster hash function), but I wouldn't block this PR on this now, as this PR already seems to improve the situation for the Rust side a lot. |
This is very similar to adding a new AST node for What this node needs to have is a "canonical token representation", into which it turns one any attempt at tokenization, which includes pretty printing. Also, every use of |
r? @petrochenkov |
The initial diff just changed some matches until it compiled - I've now gone through all references to |
Thanks! |
e3d8936
to
b2da155
Compare
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (8ef2485): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
…etrochenkov Delay `include_bytes` to AST lowering Hopefully addresses rust-lang#65818. This PR introduces a new `ExprKind::IncludedBytes` which stores the path and bytes of a file included with `include_bytes!()`. We can then create a literal from the bytes during AST lowering, which means we don't need to escape the bytes into valid UTF8 which is the cause of most of the overhead of embedding large binary blobs.
FWIW, I tried to replace djbHash with xxHash on the benchmark posted above, here are the results:
(I shouldn't also measure the clean step, but w/e, this is unscientific anyway :) ). There is a clear improvement, but it's nothing really big (given the fact that we hash a stress test - 500 MiB file!). So I'm not sure if it's worth it to switch it. |
Hopefully addresses #65818.
This PR introduces a new
ExprKind::IncludedBytes
which stores the path and bytes of a file included withinclude_bytes!()
. We can then create a literal from the bytes during AST lowering, which means we don't need to escape the bytes into valid UTF8 which is the cause of most of the overhead of embedding large binary blobs.