Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Vec::from_elem specialization for all Copy types #41335

Closed
wants to merge 2 commits into from

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Apr 17, 2017

If the input element is zero, Vec::from_elem can just invoke
calloc for any Copy type.

If the input is non-zero, but its size is 1, it can allocate and then
memset the buffer.

@rust-highfive
Copy link
Collaborator

r? @BurntSushi

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Apr 17, 2017
@@ -78,7 +78,6 @@ use core::intrinsics::{arith_offset, assume};
use core::iter::{FromIterator, FusedIterator, TrustedLen};
use core::mem;
#[cfg(not(test))]
use core::num::Float;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I think attributes should be on the same line with use/mod/extern crate items.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, sorry! It should now be fixed.

If the input element is zero, `Vec::from_elem` can just invoke
`calloc` for any `Copy` type.

If the input is non-zero, but its size is 1, it can allocate and then
`memset` the buffer.
@arielb1
Copy link
Contributor

arielb1 commented Apr 18, 2017

Thanks for the PR @ranma42! @BurntSushi or some other reviewer will be looking at your PR shortly.

@arielb1 arielb1 added O-wasm Target: WASM (WebAssembly), http://webassembly.org/ S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed O-wasm Target: WASM (WebAssembly), http://webassembly.org/ labels Apr 18, 2017
_ => 0u8 == chunked_or(x),
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some comments explaining how chunked_or and is_zero work? Are there any pitfalls? What happens when the alignment of a type is bigger than 16?

Copy link
Contributor Author

@ranma42 ranma42 Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments that should answer your questions, with the exception of "are there any pitfalls?".
There should be no correctness issues, but there is certainly a performance tradeoff: we are now checking the contents of the input and we pay for that.
We are making vec![None; 1024*1024] much faster, but vec![Some(3); 1] is going to be a little slower (if the compiler can see the literal value, the checks are going to be optimized away, but in general that might not be possible).

@BurntSushi
Copy link
Member

cc @rust-lang/libs

@BurntSushi
Copy link
Member

@aturon I haven't used specialization yet, so some extra eyes on that piece would be great.

Add an explanation of the safety requirements for `chunked_or` and how
`is_zero` wraps it exposing a safe interface.
@alexcrichton
Copy link
Member

I personally feel like this is too much magic and may be an example of specialization going too far. This may also not work for composite types due to padding. Is there a use case motivating this today?

Ideally I'd prefer to just wait for specialization to stabilize in which case we can just mark the implementation in libstd as overridable, so downstream crates can customize the implementation.

@ranma42
Copy link
Contributor Author

ranma42 commented Apr 20, 2017

@alexcrichton Could you explain a little more what is the issue with padding? AFAICT in the worst case the input object might have some padding initialized to a non-zero value, in which case it would not benefit from calloc.

From what I understood about coherence rules and specialization, it would be impossible to customize the implementation for several types for which this would be useful, such as Option, tuples and small arrays.

@aturon
Copy link
Member

aturon commented Apr 21, 2017

Do we have evidence of this making a performance difference?

@alexcrichton
Copy link
Member

@ranma42 ah yes sorry, I mean what you're thinking. A "zero" type like (0u8, 0u16) is not guaranteed to receive this optimization, but it may sporadically do so. That means that you could in development see the optimization, start to rely on it, and then it doesn't work on production mysteriously.

@ranma42
Copy link
Contributor Author

ranma42 commented Apr 21, 2017

@alexcrichton yes, that can be a problem, but on the bright side it would not affect the correctness of the program: the "issue" would be that in some conditions the program run faster than you expect, instead of always being slow. Moreover, the same holds true for most optimisations (especially when combined). In some specific contexts (avoiding timing attacks in cryptography, meeting guaranteed deadlines in realtime) predictable timings are more desirable than fast programs, but I believe this is not the general case nor something that LLVM has strong support for.

@aturon I rerun the benchmark from #40409 (comment) adapted as follows:

#![feature(test)]

extern crate test;

use test::Bencher;

#[derive(Clone)]
struct Clonable<T>(T);

#[bench]
fn bench_data(b: &mut Bencher) {
    b.iter(|| vec![0u8; 1024 * 1024]);
}

#[bench]
fn bench_clone(b: &mut Bencher) {
    b.iter(|| vec![Clonable(0u8); 1024 * 1024]);
}

#[bench]
fn bench_copy(b: &mut Bencher) {
    b.iter(|| vec![(0u8,); 1024 * 1024]);
}

#[bench]
fn bench_nullptr(b: &mut Bencher) {
    b.iter(|| vec![std::ptr::null::<u8>(); 1024 * 1024]);
}

#[bench]
fn bench_none_ref(b: &mut Bencher) {
    b.iter(|| vec![None::<&u8>; 1024 * 1024]);
}

#[bench]
fn bench_padded(b: &mut Bencher) {
    b.iter(|| vec![(0u8,0u16); 1024 * 1024]);
}

shows these results:

Before:

test bench_clone    ... bench:     249,831 ns/iter (+/- 165)
test bench_copy     ... bench:     249,943 ns/iter (+/- 559)
test bench_data     ... bench:         956 ns/iter (+/- 5)
test bench_none_ref ... bench:     625,907 ns/iter (+/- 872)
test bench_nullptr  ... bench:     625,807 ns/iter (+/- 821)
test bench_padded   ... bench:     897,815 ns/iter (+/- 1,117)

After

test bench_clone    ... bench:     249,393 ns/iter (+/- 193)
test bench_copy     ... bench:         941 ns/iter (+/- 7)
test bench_data     ... bench:         940 ns/iter (+/- 6)
test bench_none_ref ... bench:         721 ns/iter (+/- 2)
test bench_nullptr  ... bench:         719 ns/iter (+/- 1)
test bench_padded   ... bench:         749 ns/iter (+/- 4)

@sfackler
Copy link
Member

sfackler commented Apr 21, 2017

Doesn't reading padding bytes produce a undefined value?

@ranma42
Copy link
Contributor Author

ranma42 commented Apr 21, 2017

@sfackler yes, that is what @alexcrichton mentioned: reading padding bytes can result in an undef as a result for is_zero, which eventually leaves the compiler with the freedom to choose whether using the calloc path or the slow one.

@sfackler
Copy link
Member

Is that the extent of it? Are you sure it's not going to poison other future computation?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 21, 2017

If padding bytes are merely undef, current semantics are that the compiler can pick an arbitrary value for them, and possibly a different one every time they're read. This is very problematic in most contexts (e.g., if you use it as an index, you might see different values in the bounds check vs. in the indexing, breaking memory safety) but in this case I can't see a way for that to break (even if you get a different value during the actual copy, you're writing to padding bytes so it shouldn't matter). That doesn't necessarily mean it's fine, though, and it makes me very uneasy to rely on such tricky minutia!

However, undef may be slowly phased out in favor of a revised poison value, see http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf — for our purposes here, the gist is that reading uninitialized memory (which padding is, probably) would indeed give you a poison value, which propagates through all computations, and is UB to branch on. This means is_zero's result would be poison if there are any padding bytes, and thus if is_zero(x) { ... } would trigger UB.

The same proposal adds a freeze instruction, which fixes an arbitrary value for a poison value (like current undef), but it's far from clear to me by what rule rustc should be inserting freeze operations into is_zero (and if it does, whether that would impact optimizations).

And of course, all this is only as far as LLVM is concerned, the unsafe code guidelines probably have or will have an opinion on reading padding bytes, too.

tl;dr Reading padding bytes is a minefield.

@alexcrichton
Copy link
Member

@rkruppe so to clarify, you think that this PR leads to undefined behavior? If so that seems quite bad!

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 21, 2017

@alexcrichton I don't think it currently leads to UB as far as LLVM is concerned. However,

  1. I'm not 100% sure.
  2. This may change in future LLVM versions.
  3. The unsafe code guidelines may differ.

@alexcrichton
Copy link
Member

Hm ok. I could definitely see how LLVM could think it's reading UB b/c you're going through and doing u8 loads where there may be padding in the middle, meaning that you're definitely at some point capable of reading undefined memory and could start propagating undef and/or poison values.

In light of that I would not be in favor of merging this, but @ranma42 is there perhaps an alternative, safer, method to achieving the same results?

@ranma42
Copy link
Contributor Author

ranma42 commented Apr 21, 2017

@rkruppe I think the paper proposes a nice direction to improve the consistency of handling undef values across different passes in LLVM. AFAICT the same approach they propose for bitfields could apply just as well to any composite data structure, even if it includes padding.

@alexcrichton yes, as per #41335 (comment), that is expected. My understanding of LLVM semantics is that when a jump depends on an undef right now LLVM is required to (non-deterministically) choose one of the branches. It looks like this might change in the future, but this would most likely be a breaking change (the removal of undef in LLVM would be breaking as well).

The safest route is certainly to close without merging. It is a pity that the handling of Copy types is not unified and that several type which could benefit from this are not covered, but it looks like Rust cannot express the low-level check is_zero in a satisfing way.

Part of the results of this PR could be achieved by explicitly implementing the trait for *{const,mut} T and Option<&T> (the two types which I would expect to gain the most from the optimization, as they can be used to create an array of null pointers).

@ranma42 ranma42 closed this Apr 21, 2017
@hanna-kruppe
Copy link
Contributor

AFAICT the same approach they propose for bitfields could apply just as well to any composite data structure, even if it includes padding.

Yes, but we most likely wouldn't want to do it for padding, at least not everywhere. It bloats IR, possibly inhibits optimizations, and is entirely pointless for all code except code that tries to be too smart for its own good about padding bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants