Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: specialize SpecFromElem for () #118094

Merged
merged 3 commits into from
Nov 21, 2023

Conversation

JarvisCraft
Copy link
Contributor

@JarvisCraft JarvisCraft commented Nov 20, 2023

Description

This PR adds a specialization SpecFromElem for () which allows to significantly reduce vec![(), N] time in debug builds (specifically, tests) turning it from observable $O(n)$ to $O(1)$.

Observing the change

The problem this PR aims to fix explicitly is slow vec![(), N] on big Ns which may appear in tests (see Background section for more details).

The minimal example to see the problem:

#![feature(test)]

extern crate test;

#[cfg(test)]
mod tests {
    const HUGE_SIZE: usize = i32::MAX as usize + 1;

    #[bench]
    fn bench_vec_literal(b: &mut test::Bencher) {
        b.iter(|| vec![(); HUGE_SIZE]);
    }

    #[bench]
    fn bench_vec_repeat(b: &mut test::Bencher) {
        b.iter(|| [(); 1].repeat(HUGE_SIZE));
    }
}
Output

cargo +nightly test -- --report-time -Zunstable-options
   Compiling huge-zst-vec-literal-bench v0.1.0 (/home/progrm_jarvis/RustroverProjects/huge-zst-vec-literal-bench)
    Finished test [unoptimized + debuginfo] target(s) in 0.31s
     Running unittests src/lib.rs (target/debug/deps/huge_zst_vec_literal_bench-e43b1ef287ba8b36)

running 2 tests
test tests::bench_vec_repeat  ... ok <0.000s>
test tests::bench_vec_literal ... ok <14.382s>

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 14.38s

   Doc-tests huge-zst-vec-literal-bench

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Important

This problem is only observable in Debug (unoptimized) builds, while Release (optimized) ones do not observe this problem. It still is worth fixing it, IMO, since the original scenario observes the problem in tests for which optimizations are disabled by default and it seems unreasonable to override this for the whole project while the problem is very local.

Background

While working on a crate for a custom data format which has an i32::MAX limit on its list's sizes, I wrote the following test to ensure that this invariant is upheld:

#[test]
fn lists_should_have_i32_size() {
    assert!(
        RawNbtList::try_from(vec![(); i32::MAX as usize]).is_ok(),
        "lists should permit the size up to {}",
        i32::MAX
    );
    assert!(
        RawNbtList::try_from(vec![(); i32::MAX as usize + 1]).is_err(),
        "lists should have the size of at most {}",
        i32::MAX
    );
}

Soon I discovered that this takes $\approx 3--4s$ per assertion on my machine, almost all of which is spent on vec![..].
While this would be logical for a non-ZST vector (which would require actual $O(n)$ allocation), here () was used intentionally considering that for ZSTs size-changing operations should anyway be $O(1)$ (at least from allocator perspective). Yet, this "overhead" is logical if we consider that in general case clone() (which is used by Vec literal) may have a non-trivial implementation and thus each element has to actually be visited (even if they occupy no space).

In my specific case, the following (contextual) equivalent solved the issue:

#[test]
fn lists_should_have_i32_size() {
    assert!(
        RawNbtList::try_from([(); 1].repeat(i32::MAX as usize)).is_ok(),
        "lists should permit the size up to {}",
        i32::MAX
    );
    assert!(
        RawNbtList::try_from([(); 1].repeat(i32::MAX as usize + 1)).is_err(),
        "lists should have the size of at most {}",
        i32::MAX
    );
}

which works since repeat explicitly uses T: Copy and so does not have to think about non-trivial Clone.

But it still may be counter-intuitive for users to observe such long time on the "canonical" vec literal thus the PR.

Generic solution

This change is explicitly non-generic. Initially I considered it possible to implement in generically, but this would require the specialization to have the following type requirements:

  • ✅ the type must be a ZST: easily done via
    if core::mem::size_of::<T>() == 0 {
      todo!("specialization")
    }
    or
    use core::mem::SizedTypeProperties;
    if T::IS_ZST {
      todo!("specialization")
    }
  • ✅ the type must implement Copy: implementable non-conflictable via a separate specialization:
    trait IsCopyZst: Sized {
      fn is_copy_zst() -> bool;
    }
    impl<T> IsCopyZst for T {
      fn is_copy_zst() -> bool {
          false
      }
    }
    impl<T: Copy> IsCopyZst for T {
      fn is_copy_zst() -> bool {
          Self::IS_ZST
      }
    }
  • ❌ the type should have a trivial Clone implementation: since vec![t; n] is specified to use clone(), we can only use this "performance optimization" when we are guaranteed that clone() does nothing except for copying.

The latter is the real blocker for a generic fix since I am unaware of any way to get this information in a compiler-guaranteed way.

While there may be a fix for this (my friend @CertainLach has suggested a potential solution by an perma-unstable fn in Clone like is_trivially_cloneable() defaulting to false and only overridable by rustc on derive), this is surely out of this PRs scope.

@rustbot
Copy link
Collaborator

rustbot commented Nov 20, 2023

r? @thomcc

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 20, 2023
@rust-log-analyzer

This comment has been minimized.

@JarvisCraft JarvisCraft marked this pull request as draft November 20, 2023 15:28
While a better approach would be to implement it for all ZSTs
which are `Copy` and have trivial `Clone`,
the last property cannot be detected for now.

Signed-off-by: Petr Portnov <me@progrm-jarvis.ru>
@JarvisCraft JarvisCraft marked this pull request as ready for review November 20, 2023 15:29
@rust-log-analyzer

This comment has been minimized.

Signed-off-by: Petr Portnov <me@progrm-jarvis.ru>
Signed-off-by: Petr Portnov <me@progrm-jarvis.ru>
@thomcc
Copy link
Member

thomcc commented Nov 21, 2023

@bors r+

I'm not sure how useful this is but it seems harmless.

@bors
Copy link
Contributor

bors commented Nov 21, 2023

📌 Commit 72a8633 has been approved by thomcc

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 21, 2023
@thomcc
Copy link
Member

thomcc commented Nov 21, 2023

❌ the type should have a trivial Clone implementation: since vec![t; n] is specified to use clone(), we can only use this "performance optimization" when we are guaranteed that clone() does nothing except for copying.

FYI we already do this kind of optimization elsewhere, where we assume that if a type has both Copy and Clone, that they're semantically equivalent.

That said, I'm not suggesting you should make that change, it seems hard to justify for ZSTs

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 21, 2023
Rollup of 7 pull requests

Successful merges:

 - rust-lang#117790 (CFI: Add missing use core::ffi::c_int)
 - rust-lang#118059 (Explicitly unset $CARGO for compiletest)
 - rust-lang#118081 (`rustc_ty_utils` cleanups)
 - rust-lang#118094 (feat: specialize `SpecFromElem` for `()`)
 - rust-lang#118097 (Update books)
 - rust-lang#118115 (Fix occurrences of old fn names in comment and tracing)
 - rust-lang#118121 (`rustc_hir` cleanups)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit bfc2675 into rust-lang:master Nov 21, 2023
11 checks passed
@rustbot rustbot added this to the 1.76.0 milestone Nov 21, 2023
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 21, 2023
Rollup merge of rust-lang#118094 - JarvisCraft:SpecFromElem-for-empty-tuple, r=thomcc

feat: specialize `SpecFromElem` for `()`

# Description

This PR adds a specialization `SpecFromElem for ()` which allows to significantly reduce `vec![(), N]` time in debug builds (specifically, tests) turning it from observable $O(n)$ to $O(1)$.

# Observing the change

The problem this PR aims to fix explicitly is slow `vec![(), N]` on big `N`s which may appear in tests (see [Background section](#Background) for more details).

The minimal example to see the problem:

```rust
#![feature(test)]

extern crate test;

#[cfg(test)]
mod tests {
    const HUGE_SIZE: usize = i32::MAX as usize + 1;

    #[bench]
    fn bench_vec_literal(b: &mut test::Bencher) {
        b.iter(|| vec![(); HUGE_SIZE]);
    }

    #[bench]
    fn bench_vec_repeat(b: &mut test::Bencher) {
        b.iter(|| [(); 1].repeat(HUGE_SIZE));
    }
}
```
<details><summary>Output</summary>
<p>

```bash
cargo +nightly test -- --report-time -Zunstable-options
   Compiling huge-zst-vec-literal-bench v0.1.0 (/home/progrm_jarvis/RustroverProjects/huge-zst-vec-literal-bench)
    Finished test [unoptimized + debuginfo] target(s) in 0.31s
     Running unittests src/lib.rs (target/debug/deps/huge_zst_vec_literal_bench-e43b1ef287ba8b36)

running 2 tests
test tests::bench_vec_repeat  ... ok <0.000s>
test tests::bench_vec_literal ... ok <14.382s>

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 14.38s

   Doc-tests huge-zst-vec-literal-bench

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
```
</p>
</details>

> [!IMPORTANT]
> This problem is only observable in Debug (unoptimized) builds, while Release (optimized) ones do not observe this problem. It still is worth fixing it, IMO, since the original scenario observes the problem in tests for which optimizations are disabled by default and it seems unreasonable to override this for the whole project while the problem is very local.

# Background

While working on a crate for a custom data format which has an `i32::MAX` limit on its list's sizes, I wrote the following test to ensure that this invariant is upheld:

```rust
#[test]
fn lists_should_have_i32_size() {
    assert!(
        RawNbtList::try_from(vec![(); i32::MAX as usize]).is_ok(),
        "lists should permit the size up to {}",
        i32::MAX
    );
    assert!(
        RawNbtList::try_from(vec![(); i32::MAX as usize + 1]).is_err(),
        "lists should have the size of at most {}",
        i32::MAX
    );
}
```

Soon I discovered that this takes $\approx 3--4s$ per assertion on my machine, almost all of which is spent on `vec![..]`.
While this would be logical for a non-ZST vector (which would require actual $O(n)$ allocation), here `()` was used intentionally considering that for ZSTs size-changing operations should anyway be $O(1)$ (at least from allocator perspective). Yet, this "overhead" is logical if we consider that in general case `clone()` (which is used by `Vec` literal) may have a non-trivial implementation and thus each element has to actually be visited (even if they occupy no space).

In my specific case, the following (contextual) equivalent solved the issue:

```rust
#[test]
fn lists_should_have_i32_size() {
    assert!(
        RawNbtList::try_from([(); 1].repeat(i32::MAX as usize)).is_ok(),
        "lists should permit the size up to {}",
        i32::MAX
    );
    assert!(
        RawNbtList::try_from([(); 1].repeat(i32::MAX as usize + 1)).is_err(),
        "lists should have the size of at most {}",
        i32::MAX
    );
}
```

which works since `repeat` explicitly uses `T: Copy` and so does not have to think about non-trivial `Clone`.

But it still may be counter-intuitive for users to observe such long time on the "canonical" vec literal thus the PR.

# Generic solution

This change is explicitly non-generic. Initially I considered it possible to implement in generically, but this would require the specialization to have the following type requirements:
- ✅ the type must be a ZST: easily done via
  ```rust
  if core::mem::size_of::<T>() == 0 {
    todo!("specialization")
  }
  ```
  or
  ```rust
  use core::mem::SizedTypeProperties;
  if T::IS_ZST {
    todo!("specialization")
  }
  ```
- :white_check_mark`: the type must implement `Copy`: implementable non-conflictable via a separate specialization:
  ```rust
  trait IsCopyZst: Sized {
    fn is_copy_zst() -> bool;
  }
  impl<T> IsCopyZst for T {
    fn is_copy_zst() -> bool {
        false
    }
  }
  impl<T: Copy> IsCopyZst for T {
    fn is_copy_zst() -> bool {
        Self::IS_ZST
    }
  }
  ```
- ❌ the type should have a trivial `Clone` implementation: since `vec![t; n]` is specified to use `clone()`, we can only use this "performance optimization" when we are guaranteed that `clone()` does nothing except for copying.

The latter is the real blocker for a generic fix since I am unaware of any way to get this information in a compiler-guaranteed way.

While there may be a fix for this (my friend `@CertainLach` has suggested a potential solution by an perma-unstable fn in `Clone` like `is_trivially_cloneable()` defaulting to `false` and only overridable by `rustc` on derive), this is surely out of this PRs scope.
@JarvisCraft JarvisCraft deleted the SpecFromElem-for-empty-tuple branch November 21, 2023 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants