Add missing inline annotations to Cell #48905

Marwes · 2018-03-10T08:51:46Z

Were seeing some odd performance problems when using incremental compilation where Rc pointers were actually slower than Arc pointers (the problem goes away when using non-incremental compilation). I haven't been able to build rustc locally to verify that this fixes it but these missing inline annotations seem to be the only thing that could affect performance (to this extent).

test vector_push_back        ... bench:  11,668,015 ns/iter (+/- 772,861)
test vector_push_back_mut    ... bench:   1,423,771 ns/iter (+/- 22,011)
test vector_push_back_mut_rc ... bench:   1,181,765 ns/iter (+/- 123,724)
test vector_push_back_rc     ... bench:  17,141,746 ns/iter (+/- 203,048)

(Source and non incremental benchmarks orium/rpds#7 (comment))

Were seeing some odd performance problems when using incremental compilation where `Rc` pointers were actually slower than `Arc` pointers (the problem goes away when using non-incremental compilation). I haven't been able to build rustc locally to verify that this fixes it but these missing inline annotations seem to be the only thing that could affect performance (to this extent). ``` test vector_push_back ... bench: 11,668,015 ns/iter (+/- 772,861) test vector_push_back_mut ... bench: 1,423,771 ns/iter (+/- 22,011) test vector_push_back_mut_rc ... bench: 1,181,765 ns/iter (+/- 123,724) test vector_push_back_rc ... bench: 17,141,746 ns/iter (+/- 203,048) ``` (Source and non incremental benchmarks orium/rpds#7 (comment))

rust-highfive · 2018-03-10T08:51:51Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

kennytm · 2018-03-10T13:56:10Z

@bors try

Trying to get a compiled libstd, and going to compare the benchmark result with the un-inlined version as wanted in #42716 (comment).

bors · 2018-03-10T13:56:19Z

⌛ Trying commit 58fced4 with merge 3a1fd61...

Add missing inline annotations to Cell Were seeing some odd performance problems when using incremental compilation where `Rc` pointers were actually slower than `Arc` pointers (the problem goes away when using non-incremental compilation). I haven't been able to build rustc locally to verify that this fixes it but these missing inline annotations seem to be the only thing that could affect performance (to this extent). ``` test vector_push_back ... bench: 11,668,015 ns/iter (+/- 772,861) test vector_push_back_mut ... bench: 1,423,771 ns/iter (+/- 22,011) test vector_push_back_mut_rc ... bench: 1,181,765 ns/iter (+/- 123,724) test vector_push_back_rc ... bench: 17,141,746 ns/iter (+/- 203,048) ``` (Source and non incremental benchmarks orium/rpds#7 (comment))

bors · 2018-03-10T15:52:07Z

☀️ Test successful - status-travis
State: approved= try=True

kennytm · 2018-03-10T16:55:19Z

Benchmark result based on Marwes/rpds@e482d5a, first one is without #[inline], second one is this PR.

$ cargo +87344aa59af2ebb868253228e2b558d701573dff bench vector
...
test vector_drop_last        ... bench:  10,546,095 ns/iter (+/- 35,342)
test vector_drop_last_mut    ... bench:     768,934 ns/iter (+/- 2,609)
test vector_get              ... bench:     322,871 ns/iter (+/- 621)
test vector_iterate          ... bench:     144,636 ns/iter (+/- 2,131)
test vector_push_back        ... bench:  11,965,770 ns/iter (+/- 100,901)
test vector_push_back_mut    ... bench:   1,576,195 ns/iter (+/- 41,488)
test vector_push_back_mut_rc ... bench:   1,371,017 ns/iter (+/- 42,805)
test vector_push_back_rc     ... bench:   7,474,717 ns/iter (+/- 120,604)
...

$ cargo +3a1fd611fc2ad3085a9298b46a85cd055724c45e bench vector
...
test vector_drop_last        ... bench:  10,771,358 ns/iter (+/- 19,155)
test vector_drop_last_mut    ... bench:     777,418 ns/iter (+/- 28,679)
test vector_get              ... bench:     322,735 ns/iter (+/- 713)
test vector_iterate          ... bench:     144,504 ns/iter (+/- 568)
test vector_push_back        ... bench:  12,188,718 ns/iter (+/- 51,151)
test vector_push_back_mut    ... bench:   1,567,152 ns/iter (+/- 34,553)
test vector_push_back_mut_rc ... bench:   1,367,240 ns/iter (+/- 39,629)
test vector_push_back_rc     ... bench:   7,471,407 ns/iter (+/- 32,327)
...

I cannot see any significant change in timing. Also, contrary to OP's measurement, the Rc version is much faster than the Arc version.

You may install the toolchains yourself to check:

$ cargo install rustup-toolchain-install-master
$ rustup-toolchain-install-master 3a1fd611fc2ad3085a9298b46a85cd055724c45e 87344aa59af2ebb868253228e2b558d701573dff
$ cargo +87344aa59af2ebb868253228e2b558d701573dff bench vector
$ cargo +3a1fd611fc2ad3085a9298b46a85cd055724c45e bench vector

Marwes · 2018-03-10T18:23:27Z

Getting an error when trying to download the artifact unfortunately (guess it does not include windows binaries, linux subsystem does not work either since I haven't managed to get openssl installed in it...).

rustup-toolchain-install-master 3a1fd611fc2ad3085a9298b46a85cd055724c45e
downloading <https://s3-us-west-1.amazonaws.com/rust-lang-ci2/rustc-builds/3a1fd611fc2ad3085a9298b46a85cd055724c45e/rustc-nightly-x86_64-pc-windows-msvc.tar.xz>...
skipping 3a1fd611fc2ad3085a9298b46a85cd055724c45e due to failure:
Error { kind: ClientError(NotFound), url: Some("https://s3-us-west-1.amazonaws.com/rust-lang-ci2/rustc-builds/3a1fd611fc2ad3085a9298b46a85cd055724c45e/rustc-nightly-x86_64-pc-windows-msvc.tar.xz") }

@kennytm Did you run with CARGO_INCREMENTAL=1? It does not appear in normal release builds since those do not use incremental compilation. While one wouldn't use incremental compilation in a release build (I had the variable set by default but I have removed it since) I'd expect a similiar slowdown to happen for debug builds.

If this slowdown is due to these missing inline annotations and that is acceptable, then I'd assume that we should remove the annotations from the other functions instead to keep things consistent?

kennytm · 2018-03-10T18:39:47Z

@Marwes Unfortunately, only 64-bit Linux build is available before merging 😞 You can force a Linux build with:

rustup-toolchain-install-master --host x86_64-unknown-linux-gnu 3a1fd611fc2ad3085a9298b46a85cd055724c45e

I've retried using CARGO_INCREMENTAL=1, the result are:

$ CARGO_INCREMENTAL=1 cargo +87344aa59af2ebb868253228e2b558d701573dff bench vector
...
test vector_drop_last        ... bench:  13,697,880 ns/iter (+/- 136,696)
test vector_drop_last_mut    ... bench:   1,176,880 ns/iter (+/- 2,256)
test vector_get              ... bench:     493,647 ns/iter (+/- 2,101)
test vector_iterate          ... bench:     506,707 ns/iter (+/- 1,191)
test vector_push_back        ... bench:  15,053,624 ns/iter (+/- 35,672)
test vector_push_back_mut    ... bench:   2,035,044 ns/iter (+/- 42,841)
test vector_push_back_mut_rc ... bench:   1,768,721 ns/iter (+/- 43,902)
test vector_push_back_rc     ... bench:  20,100,645 ns/iter (+/- 34,784)
...

$ CARGO_INCREMENTAL=1 cargo +3a1fd611fc2ad3085a9298b46a85cd055724c45e bench vector
...
test vector_drop_last        ... bench:  13,644,513 ns/iter (+/- 30,618)
test vector_drop_last_mut    ... bench:   1,149,987 ns/iter (+/- 5,317)
test vector_get              ... bench:     511,327 ns/iter (+/- 1,883)
test vector_iterate          ... bench:     502,438 ns/iter (+/- 2,337)
test vector_push_back        ... bench:  15,010,549 ns/iter (+/- 51,013)
test vector_push_back_mut    ... bench:   1,995,922 ns/iter (+/- 44,019)
test vector_push_back_mut_rc ... bench:   1,763,790 ns/iter (+/- 44,449)
test vector_push_back_rc     ... bench:  18,610,890 ns/iter (+/- 45,462)
...

So there is 7% improvement in the vector_push_back_rc test case, though still significantly slower than Arc 🤔.

Marwes · 2018-03-10T18:44:30Z

:/ The only other real difference is that Rc uses unwrap_or_else, but I'd expect that to be trivially inlined

rust/src/liballoc/rc.rs

Line 1349 in 2f0e6a3

    
           self.inner().strong.set(self.strong().checked_add(1).unwrap_or_else(|| unsafe { abort() }));

kennytm · 2018-03-10T19:02:58Z

I've checked debug build (opt-level=0) as well, no improvement is seen, because #[inline] won't take effect without optimization 🙃

$ CARGO_INCREMENTAL=1 cargo +87344aa59af2ebb868253228e2b558d701573dff bench vector
...
test vector_drop_last        ... bench:  77,054,997 ns/iter (+/- 1,630,919)
test vector_drop_last_mut    ... bench:   7,855,141 ns/iter (+/- 68,446)
test vector_get              ... bench:   3,356,728 ns/iter (+/- 22,170)
test vector_iterate          ... bench:   3,758,851 ns/iter (+/- 4,583)
test vector_push_back        ... bench:  82,371,195 ns/iter (+/- 686,300)
test vector_push_back_mut    ... bench:   8,274,384 ns/iter (+/- 50,500)
test vector_push_back_mut_rc ... bench:   6,840,017 ns/iter (+/- 53,296)
test vector_push_back_rc     ... bench: 138,269,887 ns/iter (+/- 150,359)
...

$ CARGO_INCREMENTAL=1 cargo +3a1fd611fc2ad3085a9298b46a85cd055724c45e bench vector
...
test vector_drop_last        ... bench:  77,315,873 ns/iter (+/- 597,236)
test vector_drop_last_mut    ... bench:   7,880,990 ns/iter (+/- 7,060)
test vector_get              ... bench:   3,341,212 ns/iter (+/- 4,199)
test vector_iterate          ... bench:   3,780,185 ns/iter (+/- 3,689)
test vector_push_back        ... bench:  81,823,875 ns/iter (+/- 318,751)
test vector_push_back_mut    ... bench:   8,294,160 ns/iter (+/- 51,599)
test vector_push_back_mut_rc ... bench:   6,813,080 ns/iter (+/- 53,938)
test vector_push_back_rc     ... bench: 137,634,856 ns/iter (+/- 788,074)
...

I'm now going to defer this to someone from the libs team to decide whether to merge or not.

r? @BurntSushi

Summary: 7% improvement in release mode with incremental. No improvement otherwise.

BurntSushi · 2018-03-10T19:07:18Z

@alexcrichton Has opinions on inline annotations, so I defer to him. :-)

r? @alexcrichton

alexcrichton · 2018-03-11T18:02:44Z

Thanks for the PR! I think though that this PR may not be necessary in the sense that incremental compilation in release mode is not at all tuned for performance. These methods are all inlined as necessary (due to them being generic) so the extra #[inline] here isn't necessary to get LLVM to inline them in release mode.

Due to how codegen units are partitioned and how we do incremental compilation, though, you may not be able to inline these across codegen units in incremental mode. This will eventually be fixed if we implement incremental ThinLTO, but that has yet to be done at this point!

In that sense I would be inclined to not merge this PR as extra #[inline] annotations can be harmful to code size and compile time, and otherwise it looks like this is only related to performance in incremental + release mode, right?

Marwes · 2018-03-11T19:29:31Z

In that sense I would be inclined to not merge this PR as extra #[inline] annotations can be harmful to code size and compile time, and otherwise it looks like this is only related to performance in incremental + release mode, right?

I didn't think to check incremental + debug but it appears that the problem also occurs there

running 8 tests
test vector_drop_last        ... bench:  72,176,998 ns/iter (+/- 5,394,864)
test vector_drop_last_mut    ... bench:   6,730,589 ns/iter (+/- 3,288,727)
test vector_get              ... bench:   2,451,287 ns/iter (+/- 322,803)
test vector_iterate          ... bench:   3,149,383 ns/iter (+/- 94,201)
test vector_push_back        ... bench:  75,140,253 ns/iter (+/- 5,813,070)
test vector_push_back_mut    ... bench:   7,294,028 ns/iter (+/- 439,593)
test vector_push_back_mut_rc ... bench:   6,153,137 ns/iter (+/- 240,858)
test vector_push_back_rc     ... bench: 115,874,988 ns/iter (+/- 6,698,258)

The added inline annotations here does not seem to help much though so it seems prudent to just close this (though I'd argue that the inline annotations that exist should be removed in that case as there is no reason to single out these 4 methods as the only ones without inline annotations).

alexcrichton · 2018-03-11T20:20:24Z

Er wait is that running benchmarks in debug mode? If so we've never optimized for that, so I wouldn't put much weight in those numbers!

Marwes · 2018-03-11T21:58:39Z

The ones in the first comment were release + incremental, the ones in #48905 (comment) were debug (opt-level = 0) + incremental. While I don't expect the debug mode code to be fast, I wanted to point out that it is rather unexpected that the Rc pointer code is so much slower than the Arc code, given that the only reason to use Rc is because it is supposed to be faster than Arc.

Still, that might be fine.

alexcrichton · 2018-03-12T04:04:49Z

@Marwes would you be ok closing this or would you still prefer to land this?

rust-highfive assigned kennytm Mar 10, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 10, 2018

rust-highfive assigned BurntSushi and unassigned kennytm Mar 10, 2018

rust-highfive assigned alexcrichton and unassigned BurntSushi Mar 10, 2018

Marwes closed this Mar 12, 2018

Marwes deleted the patch-1 branch March 12, 2018 08:16

Marwes mentioned this pull request Jul 13, 2018

Add inline annotations on Vec's Deref and DerefMut #52343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing inline annotations to Cell #48905

Add missing inline annotations to Cell #48905

Marwes commented Mar 10, 2018

rust-highfive commented Mar 10, 2018

kennytm commented Mar 10, 2018

bors commented Mar 10, 2018

bors commented Mar 10, 2018

kennytm commented Mar 10, 2018 •

edited

Loading

Marwes commented Mar 10, 2018

kennytm commented Mar 10, 2018 •

edited

Loading

Marwes commented Mar 10, 2018

kennytm commented Mar 10, 2018

BurntSushi commented Mar 10, 2018

alexcrichton commented Mar 11, 2018

Marwes commented Mar 11, 2018

alexcrichton commented Mar 11, 2018

Marwes commented Mar 11, 2018 •

edited

Loading

alexcrichton commented Mar 12, 2018

Add missing inline annotations to Cell #48905

Add missing inline annotations to Cell #48905

Conversation

Marwes commented Mar 10, 2018

rust-highfive commented Mar 10, 2018

kennytm commented Mar 10, 2018

bors commented Mar 10, 2018

bors commented Mar 10, 2018

kennytm commented Mar 10, 2018 • edited Loading

Marwes commented Mar 10, 2018

kennytm commented Mar 10, 2018 • edited Loading

Marwes commented Mar 10, 2018

kennytm commented Mar 10, 2018

BurntSushi commented Mar 10, 2018

alexcrichton commented Mar 11, 2018

Marwes commented Mar 11, 2018

alexcrichton commented Mar 11, 2018

Marwes commented Mar 11, 2018 • edited Loading

alexcrichton commented Mar 12, 2018

kennytm commented Mar 10, 2018 •

edited

Loading

kennytm commented Mar 10, 2018 •

edited

Loading

Marwes commented Mar 11, 2018 •

edited

Loading