-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We are std. #67
We are std. #67
Conversation
In doing so, simplify bit_reader_reverse::refill_slow a tad.
09e1c3e
to
74b0634
Compare
74b0634
to
7c4211c
Compare
Stacked the commits since both are necessary anyways and I noticed merge conflicts without them rebased on each other, annoyingly. |
Hi sorry for the late response, I was on travels. This is very exciting! Did you benchmark the change in the bitreader? This is one of the hottest code paths in this library. I am not opposed to removing the byteorder library, it's an artifact from way back where I'll be rather busy catching up the next few days but I am very motivated to get this PR moving forward. |
Honestly, I was more concerned about correctness, given the nature of the change. That said, having done so, I noticed something interesting. Nightly$ rustup default nightly
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'
nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.82.0-nightly (0d634185d 2024-08-29)
$ rustup update
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
info: syncing channel updates for 'beta-x86_64-unknown-linux-gnu'
info: latest update on 2024-08-29, rust version 1.81.0-beta.7 (4a765c0b8 2024-08-28)
#
# snipped
#
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
info: latest update on 2024-09-01, rust version 1.82.0-nightly (a7399ba69 2024-08-31)
#
# snipped
#
info: syncing channel updates for '1.75-x86_64-unknown-linux-gnu'
info: syncing channel updates for '1.76-x86_64-unknown-linux-gnu'
stable-x86_64-unknown-linux-gnu unchanged - rustc 1.80.1 (3f5fd8dd4 2024-08-06)
beta-x86_64-unknown-linux-gnu updated - rustc 1.81.0-beta.7 (4a765c0b8 2024-08-28) (from rustc 1.81.0-beta.5 (6c199c89c 2024-08-15))
nightly-x86_64-unknown-linux-gnu updated - rustc 1.82.0-nightly (a7399ba69 2024-08-31) (from rustc 1.82.0-nightly (0d634185d 2024-08-29))
1.75-x86_64-unknown-linux-gnu unchanged - rustc 1.75.0 (82e1608df 2023-12-21)
1.76-x86_64-unknown-linux-gnu unchanged - rustc 1.76.0 (07dca489a 2024-02-04)
info: cleaning up downloads & tmp directories
info: self-update is disabled for this build of rustup
info: any updates to rustup will need to be fetched with your system package manager
$ cargo clean
Removed 2502 files, 872.1MiB total
$ cargo bench
Compiling proc-macro2 v1.0.86
#
# snipped
#
Compiling criterion v0.5.1
Finished `bench` profile [optimized] target(s) in 7.04s
Running unittests src/lib.rs (target/release/deps/ruzstd-a67070fb02525630)
running 19 tests
test decoding::decodebuffer::tests::short_writer ... ignored
# snipped, all ignored
test tests::test_streaming ... ignored
test result: ok. 0 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/zstd.rs (target/release/deps/zstd-d2ffb1a8ff286e95)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/zstd_stream.rs (target/release/deps/zstd_stream-03f9974b2b34743b)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/reversedbitreader_bench.rs (target/release/deps/reversedbitreader_bench-34f9df705d959984)
Gnuplot not found, using plotters backend
reversed bitreader time: [2.4965 ms 2.4991 ms 2.5021 ms]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe
$ git branch
* master
remove-byteorder-dep
your-crate-will-adapt-to-service-us
$ git switch your-crate-will-adapt-to-service-us
Switched to branch 'your-crate-will-adapt-to-service-us'
Your branch is up to date with 'origin/your-crate-will-adapt-to-service-us'.
$ cargo bench
Updating crates.io index
Locking 3 packages to latest compatible versions
Adding compiler_builtins v0.1.124
Adding rustc-std-workspace-alloc v1.0.0
Adding rustc-std-workspace-core v1.0.0
Compiling ruzstd v0.7.1 (/home/jubilee/rust/ruzstd)
Finished `bench` profile [optimized] target(s) in 1.45s
Running unittests src/lib.rs (target/release/deps/ruzstd-7ff6417e4589357f)
running 19 tests
test decoding::decodebuffer::tests::short_writer ... ignored
# snipped, all ignored
test tests::test_streaming ... ignored
test result: ok. 0 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/zstd.rs (target/release/deps/zstd-e735dd04ebb08a91)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/zstd_stream.rs (target/release/deps/zstd_stream-cddb555eb1629f42)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/reversedbitreader_bench.rs (target/release/deps/reversedbitreader_bench-2e0563546441c387)
Gnuplot not found, using plotters backend
reversed bitreader time: [2.3927 ms 2.3950 ms 2.3979 ms]
change: [-4.3128% -4.1639% -4.0105%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
$ Stable$ git switch master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
$ cargo clean
Removed 596 files, 138.9MiB total
$ rustup default stable
info: using existing install for 'stable-x86_64-unknown-linux-gnu'
info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'
stable-x86_64-unknown-linux-gnu unchanged - rustc 1.80.1 (3f5fd8dd4 2024-08-06)
$ cargo bench
Compiling proc-macro2 v1.0.86
#
# snipped
#
Compiling criterion v0.5.1
Finished `bench` profile [optimized] target(s) in 7.30s
Running unittests src/lib.rs (target/release/deps/ruzstd-03704bf2cd47f473)
running 19 tests
test decoding::decodebuffer::tests::short_writer ... ignored
# snipped, all ignored
test tests::test_streaming ... ignored
test result: ok. 0 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/zstd.rs (target/release/deps/zstd-3bb7de58025fde57)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/zstd_stream.rs (target/release/deps/zstd_stream-795faa6cc7116ab9)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/reversedbitreader_bench.rs (target/release/deps/reversedbitreader_bench-7685c61267242384)
Gnuplot not found, using plotters backend
reversed bitreader time: [2.4838 ms 2.4878 ms 2.4916 ms]
$ git switch your-crate-will-adapt-to-service-us
Switched to branch 'your-crate-will-adapt-to-service-us'
Your branch is up to date with 'origin/your-crate-will-adapt-to-service-us'.
$ cargo bench
Updating crates.io index
Locking 3 packages to latest compatible versions
Adding compiler_builtins v0.1.124
Adding rustc-std-workspace-alloc v1.0.0
Adding rustc-std-workspace-core v1.0.0
Compiling ruzstd v0.7.1 (/home/jubilee/rust/ruzstd)
Finished `bench` profile [optimized] target(s) in 1.57s
Running unittests src/lib.rs (target/release/deps/ruzstd-7bbc3b5a0fd4b443)
running 19 tests
test decoding::decodebuffer::tests::short_writer ... ignored
# snipped, all ignored
test tests::test_streaming ... ignored
test result: ok. 0 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/zstd.rs (target/release/deps/zstd-4e597d9467301f4a)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/bin/zstd_stream.rs (target/release/deps/zstd_stream-2914c4f1bf4a1210)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/reversedbitreader_bench.rs (target/release/deps/reversedbitreader_bench-6f83a5878c30f49b)
Gnuplot not found, using plotters backend
reversed bitreader time: [2.6718 ms 2.6734 ms 2.6753 ms]
change: [+7.2807% +7.4595% +7.6427%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
$ ConclusionI did run cargo bench a few more times on both and got similar-ish results, though every now and then criterion would say performance has improved despite it being a no-change. Either this benchmark is too subject to variance for a quick run of criterion to tell us anything, or this is liable to be an improvement on nightly and a regression on stble. I'd be happy to investigate further performance improvements, but I am inherently reluctant to make strong, general conclusions based on a single bench target, and I don't want to compare my results against e.g. a generic "decode enwik9.zstd" directive. I either need to be able to download a precise file, or have a set of reconstructible steps that can be referred back to later, so that e.g. things like the encoding compression level can also be controlled for. With everything precisely reconstructible, I'd be able to actually examine improvements with metrics that aren't likely to be strongly affected by things like e.g. I ran the bench on a CPU with a 100 MB cache... or just be confident that I'm actually running the same bench if I boot a different CPU with a different OS. |
Obviously I'll have a very good look at the changes for correctness as well, but even if correct the changes need to not regress the performance further. As for the benchmarks: I should have been more precise, the single benchmark included in this repo is admittedly not very good. I am mostly using enwik9 because it is highly compressible. The actual compression level does not really matter though, zstd will generate a lot of offset/matchlength/literallength triplets on every level, precisely because enwik9 is so easly compressable. Decoding these triplets is what stresses the bitreader. What I'm usually doing to benchmark changes like this is to build the binaries in the repo with --release and then call something along the lines of
I am aware that that also measures a lot of stuff that isn't the bitreader but writing a benchmark that simulates a realistic usage pattern isn't trivial (and I also don't want to include a copy of enwik9 in this repo because that's pretty big). The bitreader does a significant (~40%) amount of work in this measurement and changes in performance are noticable even if there is some noise in there. |
to be clear, I don't really care if something "doesn't matter" to you, because that means I have to guess how exactly you assembled the enwik9.zstd candidate text file. |
Multiple sources can claim to offer a download of the "enwik9" dataset, and yet offer slightly varying datasets. It doesn't have to be in the repo: a URL would be sufficient to disambiguate. Multiple commands issued to a zstd encoder can produce different .zstd files. These are variables. They should be controlled. To do otherwise is to not perform an experiment. |
The compressor cannot be this crate as it does not implement compression yet, so it has to be a binary, and that binary has a version. If I install a version from my distro, that version may actually have arbitrary patches that severely alter its behavior. I know this because distros regularly patch the Rust compiler to alter its behavior in ways that make it unsound. So I also would prefer an unambiguous version of the compressor. |
compression$ zstd -9 enwik9 -o enwik9.zstd
enwik9 : 27.23% ( 954 MiB => 260 MiB, enwik9.zstd) decompressionmaster$ time ./target/release/zstd -c -d enwik9.zstd > /dev/null
File: enwik9.zstd
100 % done
Decoded frames: 1 bytes: 1000000000
1 of 1 checksums are ok!
real 0m2.365s
user 0m2.320s
sys 0m0.033s we are std.$ time ./target/release/zstd -c -d enwik9.zstd > /dev/null
File: enwik9.zstd
100 % done
Decoded frames: 1 bytes: 1000000000
1 of 1 checksums are ok!
real 0m2.364s
user 0m2.324s
sys 0m0.030s |
I'm not sure I understand the concerns you have. The biggest variable in benchmarking this crate is the hardware used to run the benchmark. The CPU, compressor/compression level and underlying data don't really matter as long as they represent a somewhat realistic scenario that stresses the bitreader. Since you are not comparing against any benchmarks I have made (because you obviously do not have the hardware I use for benchmarking, because that hardware is sitting under my desk) but comparing runs of different implementations on your machine with these variables controlled by yourself, this is a perfectly valid experiment. Which choices you make for these variables doesn't really matter, not just to me, it literally just don't matter, as long as you keep them consistent for the runs you are comparing. Otherwise, yes, that wouldn't be an experiment but it would also be just nonsense... Thanks for running the benchmark, this looks promising. |
Hi I checked out the code, the changes to the bitreader look good, not doing the evening out is actually an improvement. This was just done because the byteorder library did not offer odd sized reads. On my machine the change in the hot path is a slight performance regression but it's acceptable for the purposes of moving this PR along. If you could alter the github CI to do this, that would be great. The rustc-dep-of-std feature breaks the build on stable.
|
Nevermind, I just saw that clippy is also complaining in unrelated code, I'll have to touch this anyways to make the CI green... |
Thank you! |
Ah one more question, do I need to keep anything in mind maintaining the special dependencies this crate has now with the |
@KillingSpark As long as nothing happens that introduces a direct dependency on |
Apologies if this has already been requested but could you cut an ruzstd release with this PR? This is the last thing blocking supporting zstd-compressed debug sections in backtrace (rust-lang/backtrace-rs#626). |
Oh sorry I wasn't aware that was time critical. I'll make a release this evening |
Thanks! |
With #66 this will complete the steps in this crate necessary for std to be able to assimilate it.