Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize Vec::from_elem to use calloc #40409

Merged
merged 2 commits into from
Apr 16, 2017
Merged

Conversation

mbrubeck
Copy link
Contributor

@mbrubeck mbrubeck commented Mar 10, 2017

Fixes #38723. This specializes the implementation for u8 only, but it could be extended to other zeroable types if desired.

I haven't tested this extensively, but I did verify that it gives the expected performance boost for large vec![0; n] allocations with both alloc_system and jemalloc, on Linux. (I have not tested or even built the Windows code.)

@rust-highfive
Copy link
Collaborator

r? @BurntSushi

(rust_highfive has picked a reviewer for you, use r? to override)

@leonardo-m
Copy link

What's the performance for short (n <= 8)?

@mbrubeck
Copy link
Contributor Author

mbrubeck commented Mar 10, 2017

For the following benchmarks, using the default allocator (alloc_jemalloc) on Linux:

#[bench]
fn bench_big(b: &mut Bencher) {
    b.iter(|| vec![0u8; 1024 * 1024]);
}
#[bench]
fn bench_medium(b: &mut Bencher) {
    b.iter(|| vec![0u8; 1024]);
}
#[bench]
fn bench_small(b: &mut Bencher) {
    b.iter(|| vec![0u8; 8]);
}

Before this PR:

test bench_big    ... bench:     231,338 ns/iter (+/- 97,246)
test bench_medium ... bench:          34 ns/iter (+/- 1)
test bench_small  ... bench:          24 ns/iter (+/- 3)

After this PR:

test bench_big    ... bench:         853 ns/iter (+/- 47)
test bench_medium ... bench:          37 ns/iter (+/- 2)
test bench_small  ... bench:          25 ns/iter (+/- 1)

@clarfonthey
Copy link
Contributor

clarfonthey commented Mar 10, 2017

Is there a reason why this can't be specialised for all integer types if the value is zero?

@mbrubeck
Copy link
Contributor Author

Is there a reason why this can't be specialised for all integer types if the value is zero?

It can, and I'm working on a patch to do that. It could also be specialized for floats, and for some less-common types like Option<&T> and raw pointers; I'm not sure where the point of diminishing returns is.

@clarfonthey
Copy link
Contributor

I think that specialising for integer zeroes is definitely reasonable (because [0; n] is the most common way of initialising an array). I personally would like to see the others you mentioned too, although that's just my opinion. :P

@bluss bluss added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 11, 2017
@aturon aturon assigned sfackler and unassigned BurntSushi Mar 14, 2017
@solson
Copy link
Member

solson commented Mar 14, 2017

@whitequark made a good point on IRC: If we had a Pod (plain old data) trait, you could make this specialization for all T: Pod in a single impl. This would handle many of the types mentioned above but it would also handle arbitrary user-defined types that meet the Pod requirements.

Previous discussions here and here.

Copy link
Member

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(damn github review didn’t get posted)

} else {
let flags = align_to_flags(align);
unsafe {
let ptr = mallocx(size as size_t, flags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MALLOCX_ZERO flag instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sfackler
Copy link
Member

@solson how would that work here though? This only handles the case of an all-zero initial bit pattern.

} else {
let ptr = aligned_malloc(size, align);
if !ptr.is_null() {
ptr::write_bytes(ptr, 0, size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fallback seems a bit unfortunate, but I guess there's no other option?

@solson
Copy link
Member

solson commented Mar 15, 2017

@sfackler My understanding is that for T: Pod, it's safe to use std::mem::zeroed::<T>(), so it's also safe to allocate an array of them with something like calloc (when the value is represented by the zero bit pattern).

EDIT: See this sentence from this issue description:

The trait can only be implemented by a subset of all types and identifies objects that are valid when they contain arbitrary bit patterns.

@clarfonthey
Copy link
Contributor

Also one thing to consider is that Zeroable is already part of libcore and could be useful here for this optimisation.

@bluss
Copy link
Member

bluss commented Mar 15, 2017

@solson the missing part of the explanation is how to identify that x is equivalent to (all bits) zero in vec![x; n] for T: Pod.

@mbrubeck
Copy link
Contributor Author

Added a patch to specialize vec![0; n] for other integer types.

@mbrubeck mbrubeck changed the title Specialize Vec::from_elem<u8> to use calloc or memset Specialize Vec::from_elem to use calloc Mar 15, 2017
@solson
Copy link
Member

solson commented Mar 15, 2017

@bluss One basic way would be to iterate over the bytes of x and check that they are all zero (which seems like something the optimizer could easily simplify). We could perhaps do something a bit more direct with [0; size_of::<T>()] in the future when size_of is a const fn.

@@ -92,6 +98,16 @@ mod imp {
}

#[no_mangle]
pub extern "C" fn __rust_allocate_zeroed(size: usize, align: usize) -> *mut u8 {
if align <= MIN_ALIGN {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super familiar with jemalloc - is calloc faster than unconditionally calling mallocx here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mallocx is parsing the options out from flags’ bitmask, whereas calloc sets them directly. Difference likely negligible though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. Seems fine either way.

@sfackler
Copy link
Member

r=me other than one dumb question about jemalloc.

@sfackler
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Mar 15, 2017

📌 Commit 4961f6c has been approved by sfackler

@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Mar 16, 2017
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Mar 17, 2017
Specialize Vec::from_elem to use calloc

Fixes rust-lang#38723.  This specializes the implementation for `u8` only, but it could be extended to other zeroable types if desired.

I haven't tested this extensively, but I did verify that it gives the expected performance boost for large `vec![0; n]` allocations with both alloc_system and jemalloc, on Linux.  (I have not tested or even built the Windows code.)
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Mar 17, 2017
Specialize Vec::from_elem to use calloc

Fixes rust-lang#38723.  This specializes the implementation for `u8` only, but it could be extended to other zeroable types if desired.

I haven't tested this extensively, but I did verify that it gives the expected performance boost for large `vec![0; n]` allocations with both alloc_system and jemalloc, on Linux.  (I have not tested or even built the Windows code.)
@bors
Copy link
Contributor

bors commented Apr 16, 2017

⌛ Testing commit aad2062 with merge 966a32a...

@bors
Copy link
Contributor

bors commented Apr 16, 2017

💔 Test failed - status-travis

@Mark-Simulacrum
Copy link
Member

Timed out, presumably spurious. Retrying.

@bors retry

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 16, 2017
Specialize Vec::from_elem to use calloc

Fixes rust-lang#38723.  This specializes the implementation for `u8` only, but it could be extended to other zeroable types if desired.

I haven't tested this extensively, but I did verify that it gives the expected performance boost for large `vec![0; n]` allocations with both alloc_system and jemalloc, on Linux.  (I have not tested or even built the Windows code.)
@bors
Copy link
Contributor

bors commented Apr 16, 2017

⌛ Testing commit aad2062 with merge 31bbdc1...

@bors
Copy link
Contributor

bors commented Apr 16, 2017

💔 Test failed - status-appveyor

@Mark-Simulacrum
Copy link
Member

@bors retry

[1139/1629] Building CXX object lib/Target/ARM/CMakeFiles/LLVMARMCodeGen.dir/Thumb2ITBlockPass.cpp.obj
[1140/1629] Building CXX object lib/Target/ARM/CMakeFiles/LLVMARMCodeGen.dir/Thumb2SizeReduction.cpp.obj
[1141/1629] Building CXX object lib/Target/ARM/CMakeFiles/LLVMARMCodeGen.dir/ThumbRegisterInfo.cpp.obj
[1142/1629] Building CXX object lib/Target/ARM/CMakeFiles/LLVMARMCodeGen.dir/Thumb1InstrInfo.cpp.obj
[1143/1629] Building CXX object lib/Target/ARM/TargetInfo/CMakeFiles/LLVMARMInfo.dir/ARMTargetInfo.cpp.obj
[1144/1629] Linking CXX static library lib\libLLVMARMInfo.a
FAILED: lib/libLLVMARMInfo.a 
cmd.exe /C "cd . && "C:\Program Files (x86)\CMake\bin\cmake.exe" -E remove lib\libLLVMARMInfo.a && C:\projects\rust\mingw64\bin\ar.exe qc lib\libLLVMARMInfo.a  lib/Target/ARM/TargetInfo/CMakeFiles/LLVMARMInfo.dir/ARMTargetInfo.cpp.obj && C:\projects\rust\mingw64\bin\ranlib.exe lib\libLLVMARMInfo.a && cd ."
C:\projects\rust\mingw64\bin\ar.exe: unable to rename 'lib\libLLVMARMInfo.a'; reason: Permission denied
[1145/1629] Linking CXX static library lib\libLLVMARMCodeGen.a
ninja: build stopped: subcommand failed.
thread 'main' panicked at '
command did not execute successfully, got: exit code: 1
build script failed, must exit now', C:\Users\appveyor\.cargo\registry\src\github.com-1ecc6299db9ec823\cmake-0.1.22\src\lib.rs:617
note: Run with `RUST_BACKTRACE=1` for a backtrace.
	finished in 282.127
Build completed unsuccessfully in 0:04:43

@Mark-Simulacrum Mark-Simulacrum added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Apr 16, 2017
@bors
Copy link
Contributor

bors commented Apr 16, 2017

⌛ Testing commit aad2062 with merge 7627e3d...

bors added a commit that referenced this pull request Apr 16, 2017
Specialize Vec::from_elem to use calloc

Fixes #38723.  This specializes the implementation for `u8` only, but it could be extended to other zeroable types if desired.

I haven't tested this extensively, but I did verify that it gives the expected performance boost for large `vec![0; n]` allocations with both alloc_system and jemalloc, on Linux.  (I have not tested or even built the Windows code.)
@bors
Copy link
Contributor

bors commented Apr 16, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing 7627e3d to master...

@bors bors merged commit aad2062 into rust-lang:master Apr 16, 2017
tbu- added a commit to tbu-/steed that referenced this pull request Apr 19, 2017
This is fallout from rust-lang/rust#40409 which requires that all
allocators provide a `__rust_alloc_zeroed` function.

Fixes japaric#136.
anatol pushed a commit to anatol/steed that referenced this pull request May 3, 2017
This is fallout from rust-lang/rust#40409 which requires that all
allocators provide a `__rust_alloc_zeroed` function.

Fixes japaric#136.
kmindg added a commit to kmindg/jemallocator that referenced this pull request May 24, 2017
rust-lang/rust#40409 requires a __rust_alloc_zeroed function.
kmindg added a commit to kmindg/jemallocator that referenced this pull request May 24, 2017
rust-lang/rust#40409 requires a __rust_alloc_zeroed function.
kmindg added a commit to kmindg/jemallocator that referenced this pull request May 24, 2017
rust-lang/rust#40409 requires a __rust_alloc_zeroed function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.