Use `triomphe::Arc` by default. #88

michaelsproul · 2023-10-31T00:03:18Z

Change the default Arc backend to triomphe, which results in speedups of up to 40% (per #85).

The default backend is changed to ArcTK, while retaining the ability for the user to swap out to ArcK if they prefer.

This PR does not currently include the ability to completely compile-out triomphe. If that were desired, we would have to expose a feature called triomphe which controls the backend, and causes std::sync::Arc to be used when the feature is disabled.

Note: this PR is a reopening of #86 against the new default branch main.

michaelsproul · 2023-10-31T00:03:32Z

Some more benchmarks, this time on x86_64 Linux: https://gist.github.com/michaelsproul/4462d18ce7075479b03517349ffd06e2

Unfortunately it looks like we can't just Pareto-improve by switching to triomphe. There are some substantial regressions in some of the x86_64 benchmarks, especially for RedBlackTreeMap. I made a spreadsheet of results here: https://docs.google.com/spreadsheets/d/1WsuOhA2DoKUCfoEE-1OAFPjMMUgo4jFpiZx8RVbrpH0/edit

One strategy would be to switch the backend only where the gains are universal, i.e. for List, Queue and HashTrieMap. Vector could go either way, and RedBlackTreeMap is a hard no IMO unless we gate by target-arch (or OS). I'm not entirely sure whether it's the ARM CPU or the OS making more of a difference, as I know the macOS allocator is substantially better with fragmentation than GNU malloc. In production I tend to use jemalloc, and will try to re-run the Linux benchmarks with jemalloc to see if this makes a difference.

My benchmarking process was:

git checkout master
cargo bench -- sync # baseline
git checkout triomphe
cargo bench -- sync # comparison

If you have any different hardware @orium it might also be good to see the results from that.

michaelsproul · 2023-10-31T10:28:59Z

CI is failing during vendoring. Blocked on an archery release.

orium · 2023-10-31T19:19:20Z

Blocked on an archery release.

I've just released archery v1.1.0.

michaelsproul · 2023-11-01T00:28:14Z

Updated! Thanks!

What do you think of our options re: architectures & defaults?

codecov · 2023-11-01T11:27:03Z

Codecov Report

Merging #88 (c933de8) into main (d3faad5) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #88   +/-   ##
=======================================
  Coverage   96.29%   96.29%           
=======================================
  Files          11       11           
  Lines        1645     1645           
=======================================
  Hits         1584     1584           
  Misses         61       61

Files	Coverage Δ
src/list/mod.rs	`95.41% <ø> (ø)`
src/map/hash_trie_map/mod.rs	`96.06% <ø> (ø)`
src/map/red_black_tree_map/mod.rs	`95.80% <ø> (ø)`
src/queue/mod.rs	`100.00% <ø> (ø)`
src/set/hash_trie_set/mod.rs	`89.47% <ø> (ø)`
src/set/red_black_tree_set/mod.rs	`96.11% <ø> (ø)`
src/stack/mod.rs	`100.00% <ø> (ø)`
src/vector/mod.rs	`97.61% <ø> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

orium · 2023-11-01T12:02:55Z

I've run the benchmark in an intel x86_64 and I saw more regressions than in the spreadsheet you shared, but I ran it while doing other light work on my laptop so the results are not very reliable. I'll run the benchmarks properly soon and decide based on both our results.

orium · 2023-11-02T19:06:15Z

Results in an intel x86_64:

rpds hash trie map sync insert
                        change: [+2.6368% +3.4711% +4.3543%]
rpds hash trie map sync insert mut
                        change: [-2.0413% -0.8094% +0.2234%]
rpds hash trie map sync remove
                        change: [-2.0775% -1.0785% -0.0544%]
rpds hash trie map sync remove mut
                        change: [-7.8858% -6.7220% -5.4148%]
rpds hash trie map sync get
                        change: [-3.9807% -3.1931% -2.3541%]
rpds hash trie map sync iterate
                        change: [-1.1821% -0.3057% +0.5844%]
rpds list sync push front
                        change: [-20.446% -19.655% -18.806%]
rpds list sync push front mut
                        change: [-15.251% -14.533% -13.671%]
rpds list sync drop first
                        change: [-21.615% -20.416% -19.504%]
rpds list sync drop first mut
                        change: [-2.4113% -1.7169% -0.8925%]
rpds list sync reverse  
                        change: [-17.576% -17.022% -16.375%]
rpds list sync reverse mut
                        change: [-59.669% -59.168% -58.692%]
rpds list sync iterate  
                        change: [+2.7504% +3.4314% +4.0881%]
rpds queue sync enqueue 
                        change: [-15.407% -14.566% -13.534%]
rpds queue sync enqueue mut
                        change: [-18.247% -17.734% -17.248%]
rpds queue sync dequeue 
                        change: [-19.536% -18.902% -18.271%]
rpds queue sync dequeue mut
                        change: [-9.4082% -8.8096% -8.1704%]
rpds queue sync iterate 
                        change: [+3.1755% +3.9384% +4.7430%]
rpds red black tree map sync insert
                        change: [+1.4412% +2.4130% +3.4448%]
rpds red black tree map sync insert mut
                        change: [-21.915% -21.323% -20.706%]
rpds red black tree map sync remove
                        change: [-3.2492% -2.5512% -1.8362%]
rpds red black tree map sync remove mut
                        change: [-40.328% -39.836% -39.391%]
rpds red black tree map sync get
                        change: [-8.0047% -7.3228% -6.5182%]
rpds red black tree map sync iterate
                        change: [+1.4399% +2.2009% +3.0057%]
rpds vector sync push back
                        change: [+1.6571% +2.5189% +3.4616%]
rpds vector sync push back mut
                        change: [-6.4764% -5.5797% -4.6442%]
rpds vector sync drop last
                        change: [-0.6938% +0.2731% +1.1257%]
rpds vector sync drop last mut
                        change: [-32.543% -31.923% -31.408%]
rpds vector sync get    
                        change: [+1.5769% +3.2076% +4.3876%]
rpds vector sync iterate
                        change: [+3.7501% +4.7765% +5.8097%]

Results in arm (raspberry pi):

rpds hash trie map sync insert
                        change: [-4.7598% -4.1822% -3.5100%]
rpds hash trie map sync insert mut
                        change: [-17.672% -17.230% -16.825%]
rpds hash trie map sync remove
                        change: [-4.1593% -3.8020% -3.4559%]
rpds hash trie map sync remove mut
                        change: [-25.591% -25.305% -24.996%]
rpds hash trie map sync get
                        change: [-5.3317% -5.0032% -4.6927%]
rpds hash trie map sync iterate
                        change: [+22.415% +23.323% +24.237%]
rpds list sync push front
                        change: [-23.504% -23.140% -22.801%]
rpds list sync push front mut
                        change: [-24.341% -23.925% -23.566%]
rpds list sync drop first
                        change: [-19.712% -19.550% -19.398%]
rpds list sync drop first mut
                        change: [-20.398% -20.123% -19.885%]
rpds list sync reverse  
                        change: [-13.789% -13.756% -13.721%]
rpds list sync reverse mut
                        change: [-65.247% -65.097% -64.933%]
rpds list sync iterate  
                        change: [-46.215% -45.697% -45.193%]
rpds queue sync enqueue 
                        change: [-18.596% -18.243% -17.910%]
rpds queue sync enqueue mut
                        change: [-22.085% -21.689% -21.352%]
rpds queue sync dequeue 
                        change: [-16.732% -16.385% -16.059%]
rpds queue sync dequeue mut
                        change: [-26.592% -26.450% -26.302%]
rpds queue sync iterate 
                        change: [-27.569% -27.106% -26.634%]
rpds red black tree map sync insert
                        change: [-7.8592% -7.7074% -7.5590%]
rpds red black tree map sync insert mut
                        change: [-42.397% -42.258% -42.116%]
rpds red black tree map sync remove
                        change: [-6.2934% -6.1735% -6.0456%]
rpds red black tree map sync remove mut
                        change: [-47.404% -47.297% -47.186%]
rpds red black tree map sync get
                        change: [-16.655% -16.193% -15.749%]
rpds red black tree map sync iterate
                        change: [-30.892% -30.479% -30.042%]
rpds vector sync push back
                        change: [-1.2102% -1.0596% -0.9017%]
rpds vector sync push back mut
                        change: [-33.433% -33.304% -33.169%]
rpds vector sync drop last
                        change: [-4.2658% -4.1403% -4.0246%]
rpds vector sync drop last mut
                        change: [-41.569% -41.369% -41.185%]
rpds vector sync get    
                        change: [+0.1514% +0.2058% +0.2704%]
rpds vector sync iterate
                        change: [+26.207% +26.376% +26.529%]

In my results:

In intel x86_64 I see mostly improvements and the regressions are relatively small.
In arm there are significant regressions on iteration of vector and hash trie map.

In your results:

In amd x86_64 there are significant regressions in red black tree insert/remove the non-mutable method (which is know to be slower than the _mut() methods).
In arm M1 it is an improvement across the board.

My conclusion is still a win overall, even if there are some performance regressions. I think using triomphe by default for all data structures is probably the best tradeoff based on this benchmarks. What do you think?

I also reported this to triomphe because maybe there are optimizations in std::sync::Arc that triomphe is missing (Manishearth/triomphe#74).

Use triomphe::Arc by default.

d242257

This was referenced Oct 31, 2023

Add benchmarks using triomphe::Arc #85

Closed

Use triomphe::Arc by default. #86

Closed

Use archery 1.1.0

c933de8

orium mentioned this pull request Nov 2, 2023

Performance of triomphe::Arc is sometimes worse than std::sync::Arc Manishearth/triomphe#74

Open

orium merged commit 98045ce into orium:main Nov 5, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `triomphe::Arc` by default. #88

Use `triomphe::Arc` by default. #88

michaelsproul commented Oct 31, 2023 •

edited

Loading

michaelsproul commented Oct 31, 2023

michaelsproul commented Oct 31, 2023

orium commented Oct 31, 2023 •

edited

Loading

michaelsproul commented Nov 1, 2023

codecov bot commented Nov 1, 2023

orium commented Nov 1, 2023

orium commented Nov 2, 2023 •

edited

Loading

Use triomphe::Arc by default. #88

Use triomphe::Arc by default. #88

Conversation

michaelsproul commented Oct 31, 2023 • edited Loading

michaelsproul commented Oct 31, 2023

michaelsproul commented Oct 31, 2023

orium commented Oct 31, 2023 • edited Loading

michaelsproul commented Nov 1, 2023

codecov bot commented Nov 1, 2023

Codecov Report

orium commented Nov 1, 2023

orium commented Nov 2, 2023 • edited Loading

Use `triomphe::Arc` by default. #88

Use `triomphe::Arc` by default. #88

michaelsproul commented Oct 31, 2023 •

edited

Loading

orium commented Oct 31, 2023 •

edited

Loading

orium commented Nov 2, 2023 •

edited

Loading