-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: specialize SpecFromElem
for ()
#118094
feat: specialize SpecFromElem
for ()
#118094
Conversation
r? @thomcc (rustbot has picked a reviewer for you, use r? to override) |
b7a85c7
to
c6c00cd
Compare
This comment has been minimized.
This comment has been minimized.
While a better approach would be to implement it for all ZSTs which are `Copy` and have trivial `Clone`, the last property cannot be detected for now. Signed-off-by: Petr Portnov <me@progrm-jarvis.ru>
c6c00cd
to
2fd9442
Compare
This comment has been minimized.
This comment has been minimized.
Signed-off-by: Petr Portnov <me@progrm-jarvis.ru>
Signed-off-by: Petr Portnov <me@progrm-jarvis.ru>
@bors r+ I'm not sure how useful this is but it seems harmless. |
FYI we already do this kind of optimization elsewhere, where we assume that if a type has both Copy and Clone, that they're semantically equivalent. That said, I'm not suggesting you should make that change, it seems hard to justify for ZSTs |
Rollup of 7 pull requests Successful merges: - rust-lang#117790 (CFI: Add missing use core::ffi::c_int) - rust-lang#118059 (Explicitly unset $CARGO for compiletest) - rust-lang#118081 (`rustc_ty_utils` cleanups) - rust-lang#118094 (feat: specialize `SpecFromElem` for `()`) - rust-lang#118097 (Update books) - rust-lang#118115 (Fix occurrences of old fn names in comment and tracing) - rust-lang#118121 (`rustc_hir` cleanups) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#118094 - JarvisCraft:SpecFromElem-for-empty-tuple, r=thomcc feat: specialize `SpecFromElem` for `()` # Description This PR adds a specialization `SpecFromElem for ()` which allows to significantly reduce `vec![(), N]` time in debug builds (specifically, tests) turning it from observable $O(n)$ to $O(1)$. # Observing the change The problem this PR aims to fix explicitly is slow `vec![(), N]` on big `N`s which may appear in tests (see [Background section](#Background) for more details). The minimal example to see the problem: ```rust #![feature(test)] extern crate test; #[cfg(test)] mod tests { const HUGE_SIZE: usize = i32::MAX as usize + 1; #[bench] fn bench_vec_literal(b: &mut test::Bencher) { b.iter(|| vec![(); HUGE_SIZE]); } #[bench] fn bench_vec_repeat(b: &mut test::Bencher) { b.iter(|| [(); 1].repeat(HUGE_SIZE)); } } ``` <details><summary>Output</summary> <p> ```bash cargo +nightly test -- --report-time -Zunstable-options Compiling huge-zst-vec-literal-bench v0.1.0 (/home/progrm_jarvis/RustroverProjects/huge-zst-vec-literal-bench) Finished test [unoptimized + debuginfo] target(s) in 0.31s Running unittests src/lib.rs (target/debug/deps/huge_zst_vec_literal_bench-e43b1ef287ba8b36) running 2 tests test tests::bench_vec_repeat ... ok <0.000s> test tests::bench_vec_literal ... ok <14.382s> test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 14.38s Doc-tests huge-zst-vec-literal-bench running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s ``` </p> </details> > [!IMPORTANT] > This problem is only observable in Debug (unoptimized) builds, while Release (optimized) ones do not observe this problem. It still is worth fixing it, IMO, since the original scenario observes the problem in tests for which optimizations are disabled by default and it seems unreasonable to override this for the whole project while the problem is very local. # Background While working on a crate for a custom data format which has an `i32::MAX` limit on its list's sizes, I wrote the following test to ensure that this invariant is upheld: ```rust #[test] fn lists_should_have_i32_size() { assert!( RawNbtList::try_from(vec![(); i32::MAX as usize]).is_ok(), "lists should permit the size up to {}", i32::MAX ); assert!( RawNbtList::try_from(vec![(); i32::MAX as usize + 1]).is_err(), "lists should have the size of at most {}", i32::MAX ); } ``` Soon I discovered that this takes $\approx 3--4s$ per assertion on my machine, almost all of which is spent on `vec![..]`. While this would be logical for a non-ZST vector (which would require actual $O(n)$ allocation), here `()` was used intentionally considering that for ZSTs size-changing operations should anyway be $O(1)$ (at least from allocator perspective). Yet, this "overhead" is logical if we consider that in general case `clone()` (which is used by `Vec` literal) may have a non-trivial implementation and thus each element has to actually be visited (even if they occupy no space). In my specific case, the following (contextual) equivalent solved the issue: ```rust #[test] fn lists_should_have_i32_size() { assert!( RawNbtList::try_from([(); 1].repeat(i32::MAX as usize)).is_ok(), "lists should permit the size up to {}", i32::MAX ); assert!( RawNbtList::try_from([(); 1].repeat(i32::MAX as usize + 1)).is_err(), "lists should have the size of at most {}", i32::MAX ); } ``` which works since `repeat` explicitly uses `T: Copy` and so does not have to think about non-trivial `Clone`. But it still may be counter-intuitive for users to observe such long time on the "canonical" vec literal thus the PR. # Generic solution This change is explicitly non-generic. Initially I considered it possible to implement in generically, but this would require the specialization to have the following type requirements: - ✅ the type must be a ZST: easily done via ```rust if core::mem::size_of::<T>() == 0 { todo!("specialization") } ``` or ```rust use core::mem::SizedTypeProperties; if T::IS_ZST { todo!("specialization") } ``` - :white_check_mark`: the type must implement `Copy`: implementable non-conflictable via a separate specialization: ```rust trait IsCopyZst: Sized { fn is_copy_zst() -> bool; } impl<T> IsCopyZst for T { fn is_copy_zst() -> bool { false } } impl<T: Copy> IsCopyZst for T { fn is_copy_zst() -> bool { Self::IS_ZST } } ``` - ❌ the type should have a trivial `Clone` implementation: since `vec![t; n]` is specified to use `clone()`, we can only use this "performance optimization" when we are guaranteed that `clone()` does nothing except for copying. The latter is the real blocker for a generic fix since I am unaware of any way to get this information in a compiler-guaranteed way. While there may be a fix for this (my friend `@CertainLach` has suggested a potential solution by an perma-unstable fn in `Clone` like `is_trivially_cloneable()` defaulting to `false` and only overridable by `rustc` on derive), this is surely out of this PRs scope.
Description
This PR adds a specialization$O(n)$ to $O(1)$ .
SpecFromElem for ()
which allows to significantly reducevec![(), N]
time in debug builds (specifically, tests) turning it from observableObserving the change
The problem this PR aims to fix explicitly is slow
vec![(), N]
on bigN
s which may appear in tests (see Background section for more details).The minimal example to see the problem:
Output
Important
This problem is only observable in Debug (unoptimized) builds, while Release (optimized) ones do not observe this problem. It still is worth fixing it, IMO, since the original scenario observes the problem in tests for which optimizations are disabled by default and it seems unreasonable to override this for the whole project while the problem is very local.
Background
While working on a crate for a custom data format which has an
i32::MAX
limit on its list's sizes, I wrote the following test to ensure that this invariant is upheld:Soon I discovered that this takes$\approx 3--4s$ per assertion on my machine, almost all of which is spent on $O(n)$ allocation), here $O(1)$ (at least from allocator perspective). Yet, this "overhead" is logical if we consider that in general case
vec![..]
.While this would be logical for a non-ZST vector (which would require actual
()
was used intentionally considering that for ZSTs size-changing operations should anyway beclone()
(which is used byVec
literal) may have a non-trivial implementation and thus each element has to actually be visited (even if they occupy no space).In my specific case, the following (contextual) equivalent solved the issue:
which works since
repeat
explicitly usesT: Copy
and so does not have to think about non-trivialClone
.But it still may be counter-intuitive for users to observe such long time on the "canonical" vec literal thus the PR.
Generic solution
This change is explicitly non-generic. Initially I considered it possible to implement in generically, but this would require the specialization to have the following type requirements:
Copy
: implementable non-conflictable via a separate specialization:Clone
implementation: sincevec![t; n]
is specified to useclone()
, we can only use this "performance optimization" when we are guaranteed thatclone()
does nothing except for copying.The latter is the real blocker for a generic fix since I am unaware of any way to get this information in a compiler-guaranteed way.
While there may be a fix for this (my friend @CertainLach has suggested a potential solution by an perma-unstable fn in
Clone
likeis_trivially_cloneable()
defaulting tofalse
and only overridable byrustc
on derive), this is surely out of this PRs scope.