Implement `Vec::from_elem` specialization for all `Copy` types #41335

ranma42 · 2017-04-17T06:25:40Z

If the input element is zero, Vec::from_elem can just invoke
calloc for any Copy type.

If the input is non-zero, but its size is 1, it can allocate and then
memset the buffer.

rust-highfive · 2017-04-17T06:25:56Z

(rust_highfive has picked a reviewer for you, use r? to override)

durka · 2017-04-17T15:38:06Z

src/libcollections/vec.rs

@@ -78,7 +78,6 @@ use core::intrinsics::{arith_offset, assume};
 use core::iter::{FromIterator, FusedIterator, TrustedLen};
 use core::mem;
 #[cfg(not(test))]
-use core::num::Float;


This is why I think attributes should be on the same line with use/mod/extern crate items.

Argh, sorry! It should now be fixed.

If the input element is zero, `Vec::from_elem` can just invoke `calloc` for any `Copy` type. If the input is non-zero, but its size is 1, it can allocate and then `memset` the buffer.

arielb1 · 2017-04-18T08:16:10Z

Thanks for the PR @ranma42! @BurntSushi or some other reviewer will be looking at your PR shortly.

BurntSushi · 2017-04-19T10:29:48Z

src/libcollections/vec.rs

+            _ => 0u8 == chunked_or(x),
+        }
+    }
+}


Can you write some comments explaining how chunked_or and is_zero work? Are there any pitfalls? What happens when the alignment of a type is bigger than 16?

I added some comments that should answer your questions, with the exception of "are there any pitfalls?".
There should be no correctness issues, but there is certainly a performance tradeoff: we are now checking the contents of the input and we pay for that.
We are making vec![None; 1024*1024] much faster, but vec![Some(3); 1] is going to be a little slower (if the compiler can see the literal value, the checks are going to be optimized away, but in general that might not be possible).

BurntSushi · 2017-04-19T10:30:17Z

cc @rust-lang/libs

BurntSushi · 2017-04-19T10:30:41Z

@aturon I haven't used specialization yet, so some extra eyes on that piece would be great.

Add an explanation of the safety requirements for `chunked_or` and how `is_zero` wraps it exposing a safe interface.

alexcrichton · 2017-04-20T22:12:32Z

I personally feel like this is too much magic and may be an example of specialization going too far. This may also not work for composite types due to padding. Is there a use case motivating this today?

Ideally I'd prefer to just wait for specialization to stabilize in which case we can just mark the implementation in libstd as overridable, so downstream crates can customize the implementation.

ranma42 · 2017-04-20T23:32:47Z

@alexcrichton Could you explain a little more what is the issue with padding? AFAICT in the worst case the input object might have some padding initialized to a non-zero value, in which case it would not benefit from calloc.

From what I understood about coherence rules and specialization, it would be impossible to customize the implementation for several types for which this would be useful, such as Option, tuples and small arrays.

aturon · 2017-04-21T03:21:08Z

Do we have evidence of this making a performance difference?

alexcrichton · 2017-04-21T03:54:30Z

@ranma42 ah yes sorry, I mean what you're thinking. A "zero" type like (0u8, 0u16) is not guaranteed to receive this optimization, but it may sporadically do so. That means that you could in development see the optimization, start to rely on it, and then it doesn't work on production mysteriously.

ranma42 · 2017-04-21T07:58:47Z

@alexcrichton yes, that can be a problem, but on the bright side it would not affect the correctness of the program: the "issue" would be that in some conditions the program run faster than you expect, instead of always being slow. Moreover, the same holds true for most optimisations (especially when combined). In some specific contexts (avoiding timing attacks in cryptography, meeting guaranteed deadlines in realtime) predictable timings are more desirable than fast programs, but I believe this is not the general case nor something that LLVM has strong support for.

@aturon I rerun the benchmark from #40409 (comment) adapted as follows:

#![feature(test)]

extern crate test;

use test::Bencher;

#[derive(Clone)]
struct Clonable<T>(T);

#[bench]
fn bench_data(b: &mut Bencher) {
    b.iter(|| vec![0u8; 1024 * 1024]);
}

#[bench]
fn bench_clone(b: &mut Bencher) {
    b.iter(|| vec![Clonable(0u8); 1024 * 1024]);
}

#[bench]
fn bench_copy(b: &mut Bencher) {
    b.iter(|| vec![(0u8,); 1024 * 1024]);
}

#[bench]
fn bench_nullptr(b: &mut Bencher) {
    b.iter(|| vec![std::ptr::null::<u8>(); 1024 * 1024]);
}

#[bench]
fn bench_none_ref(b: &mut Bencher) {
    b.iter(|| vec![None::<&u8>; 1024 * 1024]);
}

#[bench]
fn bench_padded(b: &mut Bencher) {
    b.iter(|| vec![(0u8,0u16); 1024 * 1024]);
}

shows these results:

Before:

test bench_clone    ... bench:     249,831 ns/iter (+/- 165)
test bench_copy     ... bench:     249,943 ns/iter (+/- 559)
test bench_data     ... bench:         956 ns/iter (+/- 5)
test bench_none_ref ... bench:     625,907 ns/iter (+/- 872)
test bench_nullptr  ... bench:     625,807 ns/iter (+/- 821)
test bench_padded   ... bench:     897,815 ns/iter (+/- 1,117)

After

test bench_clone    ... bench:     249,393 ns/iter (+/- 193)
test bench_copy     ... bench:         941 ns/iter (+/- 7)
test bench_data     ... bench:         940 ns/iter (+/- 6)
test bench_none_ref ... bench:         721 ns/iter (+/- 2)
test bench_nullptr  ... bench:         719 ns/iter (+/- 1)
test bench_padded   ... bench:         749 ns/iter (+/- 4)

sfackler · 2017-04-21T08:01:06Z

Doesn't reading padding bytes produce a undefined value?

ranma42 · 2017-04-21T08:32:08Z

@sfackler yes, that is what @alexcrichton mentioned: reading padding bytes can result in an undef as a result for is_zero, which eventually leaves the compiler with the freedom to choose whether using the calloc path or the slow one.

sfackler · 2017-04-21T08:33:14Z

Is that the extent of it? Are you sure it's not going to poison other future computation?

hanna-kruppe · 2017-04-21T08:43:48Z

If padding bytes are merely undef, current semantics are that the compiler can pick an arbitrary value for them, and possibly a different one every time they're read. This is very problematic in most contexts (e.g., if you use it as an index, you might see different values in the bounds check vs. in the indexing, breaking memory safety) but in this case I can't see a way for that to break (even if you get a different value during the actual copy, you're writing to padding bytes so it shouldn't matter). That doesn't necessarily mean it's fine, though, and it makes me very uneasy to rely on such tricky minutia!

However, undef may be slowly phased out in favor of a revised poison value, see http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf — for our purposes here, the gist is that reading uninitialized memory (which padding is, probably) would indeed give you a poison value, which propagates through all computations, and is UB to branch on. This means is_zero's result would be poison if there are any padding bytes, and thus if is_zero(x) { ... } would trigger UB.

The same proposal adds a freeze instruction, which fixes an arbitrary value for a poison value (like current undef), but it's far from clear to me by what rule rustc should be inserting freeze operations into is_zero (and if it does, whether that would impact optimizations).

And of course, all this is only as far as LLVM is concerned, the unsafe code guidelines probably have or will have an opinion on reading padding bytes, too.

tl;dr Reading padding bytes is a minefield.

alexcrichton · 2017-04-21T13:56:35Z

@rkruppe so to clarify, you think that this PR leads to undefined behavior? If so that seems quite bad!

hanna-kruppe · 2017-04-21T16:29:46Z

@alexcrichton I don't think it currently leads to UB as far as LLVM is concerned. However,

I'm not 100% sure.
This may change in future LLVM versions.
The unsafe code guidelines may differ.

alexcrichton · 2017-04-21T17:39:32Z

Hm ok. I could definitely see how LLVM could think it's reading UB b/c you're going through and doing u8 loads where there may be padding in the middle, meaning that you're definitely at some point capable of reading undefined memory and could start propagating undef and/or poison values.

In light of that I would not be in favor of merging this, but @ranma42 is there perhaps an alternative, safer, method to achieving the same results?

ranma42 · 2017-04-21T22:47:12Z

@rkruppe I think the paper proposes a nice direction to improve the consistency of handling undef values across different passes in LLVM. AFAICT the same approach they propose for bitfields could apply just as well to any composite data structure, even if it includes padding.

@alexcrichton yes, as per #41335 (comment), that is expected. My understanding of LLVM semantics is that when a jump depends on an undef right now LLVM is required to (non-deterministically) choose one of the branches. It looks like this might change in the future, but this would most likely be a breaking change (the removal of undef in LLVM would be breaking as well).

The safest route is certainly to close without merging. It is a pity that the handling of Copy types is not unified and that several type which could benefit from this are not covered, but it looks like Rust cannot express the low-level check is_zero in a satisfing way.

Part of the results of this PR could be achieved by explicitly implementing the trait for *{const,mut} T and Option<&T> (the two types which I would expect to gain the most from the optimization, as they can be used to create an array of null pointers).

hanna-kruppe · 2017-04-22T08:50:15Z

AFAICT the same approach they propose for bitfields could apply just as well to any composite data structure, even if it includes padding.

Yes, but we most likely wouldn't want to do it for padding, at least not everywhere. It bloats IR, possibly inhibits optimizations, and is entirely pointless for all code except code that tries to be too smart for its own good about padding bytes.

rust-highfive assigned BurntSushi Apr 17, 2017

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Apr 17, 2017

durka reviewed Apr 17, 2017

View reviewed changes

Implement Vec::from_elem specialization for all Copy types

55e52f3

If the input element is zero, `Vec::from_elem` can just invoke `calloc` for any `Copy` type. If the input is non-zero, but its size is 1, it can allocate and then `memset` the buffer.

ranma42 force-pushed the pod-calloc branch from ba418aa to 55e52f3 Compare April 17, 2017 16:57

arielb1 added O-wasm Target: WASM (WebAssembly), http://webassembly.org/ S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed O-wasm Target: WASM (WebAssembly), http://webassembly.org/ labels Apr 18, 2017

BurntSushi assigned aturon Apr 19, 2017

BurntSushi reviewed Apr 19, 2017

View reviewed changes

Add explanation for is_zero and chunked_or

a53dfbe

Add an explanation of the safety requirements for `chunked_or` and how `is_zero` wraps it exposing a safe interface.

ranma42 closed this Apr 21, 2017

ranma42 mentioned this pull request May 7, 2017

Add an in-place rotate method for slices to libcore #41670

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `Vec::from_elem` specialization for all `Copy` types #41335

Implement `Vec::from_elem` specialization for all `Copy` types #41335

ranma42 commented Apr 17, 2017

rust-highfive commented Apr 17, 2017

durka Apr 17, 2017

ranma42 Apr 17, 2017

arielb1 commented Apr 18, 2017

BurntSushi Apr 19, 2017

ranma42 Apr 19, 2017 •

edited

Loading

BurntSushi commented Apr 19, 2017

BurntSushi commented Apr 19, 2017

alexcrichton commented Apr 20, 2017

ranma42 commented Apr 20, 2017

aturon commented Apr 21, 2017

alexcrichton commented Apr 21, 2017

ranma42 commented Apr 21, 2017

sfackler commented Apr 21, 2017 •

edited

Loading

ranma42 commented Apr 21, 2017

sfackler commented Apr 21, 2017

hanna-kruppe commented Apr 21, 2017 •

edited

Loading

alexcrichton commented Apr 21, 2017

hanna-kruppe commented Apr 21, 2017 •

edited

Loading

alexcrichton commented Apr 21, 2017

ranma42 commented Apr 21, 2017 •

edited

Loading

hanna-kruppe commented Apr 22, 2017

Implement Vec::from_elem specialization for all Copy types #41335

Implement Vec::from_elem specialization for all Copy types #41335

Conversation

ranma42 commented Apr 17, 2017

rust-highfive commented Apr 17, 2017

durka Apr 17, 2017

Choose a reason for hiding this comment

ranma42 Apr 17, 2017

Choose a reason for hiding this comment

arielb1 commented Apr 18, 2017

BurntSushi Apr 19, 2017

Choose a reason for hiding this comment

ranma42 Apr 19, 2017 • edited Loading

Choose a reason for hiding this comment

BurntSushi commented Apr 19, 2017

BurntSushi commented Apr 19, 2017

alexcrichton commented Apr 20, 2017

ranma42 commented Apr 20, 2017

aturon commented Apr 21, 2017

alexcrichton commented Apr 21, 2017

ranma42 commented Apr 21, 2017

sfackler commented Apr 21, 2017 • edited Loading

ranma42 commented Apr 21, 2017

sfackler commented Apr 21, 2017

hanna-kruppe commented Apr 21, 2017 • edited Loading

alexcrichton commented Apr 21, 2017

hanna-kruppe commented Apr 21, 2017 • edited Loading

alexcrichton commented Apr 21, 2017

ranma42 commented Apr 21, 2017 • edited Loading

hanna-kruppe commented Apr 22, 2017

Implement `Vec::from_elem` specialization for all `Copy` types #41335

Implement `Vec::from_elem` specialization for all `Copy` types #41335

ranma42 Apr 19, 2017 •

edited

Loading

sfackler commented Apr 21, 2017 •

edited

Loading

hanna-kruppe commented Apr 21, 2017 •

edited

Loading

hanna-kruppe commented Apr 21, 2017 •

edited

Loading

ranma42 commented Apr 21, 2017 •

edited

Loading