-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize work-stealing deque #124
base: main
Are you sure you want to change the base?
Conversation
ff22c9f
to
1c98c25
Compare
1c98c25
to
77cd1b9
Compare
0d2e6ea
to
1cfc137
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
I haven't noticed any issue with theses changes. I will push a PR soon with more DSCheck tests for this queue though. The ones that we currently have where written for a way slower version of DSCheck and we should be able to do way more now :)
test/ws_deque/stm_ws_deque.ml
Outdated
(* | ||
WSDT_dom.neg_agree_test_par ~count | ||
~name:"STM Saturn_lockfree.Ws_deque test parallel, negative"; | ||
~name:"STM Saturn_lockfree.Ws_deque test parallel, negative"; *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say that this should either be removed or a note should be added to explain why this is a bad idea to launch such a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test was causing problems at one point, which is why I commented it out. I was planning to take another look to understand the issue better. The work-stealing deque data structure is specifically designed to take advantage of the asymmetry between the (single) owner and the (multiple) thiefs. That negative test specifically uses the work-stealing deque in a manner in which it is not intended to work. So, I'm inclided to simply remove the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, with this change to the code in this PR
modified src_lockfree/ws_deque.ml
@@ -54,7 +54,7 @@ module M : S = struct
let create () =
let top = Atomic.make 0 |> Multicore_magic.copy_as_padded in
let bottom = Atomic.make 0 |> Multicore_magic.copy_as_padded in
- let tab = Array.make min_capacity (Obj.magic ()) in
+ let tab = Array.make min_capacity (ref (Obj.magic ())) in
{ top; bottom; tab } |> Multicore_magic.copy_as_padded
let realloc a t b sz new_sz =
the negative test passes.
The original code (before this PR) also has the same behavior:
saturn/src_lockfree/ws_deque.ml
Line 103 in 8937436
tab = Atomic.make (CArray.create min_size (Obj.magic ())); |
In other words, the circular array is initialized with
Obj.magic ()
values rather than ref (Obj.magic ())
values. I haven't analyzed why exactly the original code doesn't crash, but I suspect it is purely out of luck.
Even with the ref (Obj.magic ())
things are not guaranteed to work. If I change the test use float
s, i.e.
QCheck.make ~print:show_cmd
(Gen.oneof
[
- Gen.map (fun i -> Push i) int_gen;
+ Gen.map (fun i -> Push i) Gen.float;
Gen.return Pop;
(*Gen.return Steal;*)
(with corresponding changes to use float
instead of int
) the negative test will cause a segmentation fault even with ref (Obj.magic ())
. I would expect the original code to also exhibit a segmentation fault with this change.
The way I see it, the whole point of this particular data structure is to exploit the asymmetry of having one owner and multiple thiefs. If we would instead choose to make this data structure safe to use with multiple owners, then we would be implementing a fundamentally different data structure (i.e. a multi consumer, single producer deque). Implementing a data structure that doesn't cause a segmentation fault, but instead returns wrong result in case of misuse is not something I would recommend. So, personally I would just recommend embracing the fact that calling pop
in parallel from multiple domains is not safe and would remove the negative test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could also have a "safe" implementation that raises an issue in case of wrong use (a thief calling push or pop), but I am guessing this could have quite a cost in term of performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would expect the cost (of detecting misuse) to be relatively high in case it needs to be reliable. It would be great if misuse could be detected for free and an error produced reliably. In all likelyhood, detecting misuse reliably would be expensive and detecting misuse only partially while avoiding segmentation faults would also likely be expensive and could also lead to the program silently giving incorrect results, which I consider worse than crashing.
e409e1a
to
5bd372d
Compare
204ae86
to
6052cd0
Compare
I created a PR adding new benchmarks for the work-stealing deque #130 and I added a commit to this PR to optimize the work-stealing deque to avoid performance pitfalls on those benchmarks. Here is a run from before adding the commit:
Here is a run after the commit:
Performance on the SPMC style benchmarks improved significantly. The optimization in the last commit (caching the index used by thieves) is mentioned in the original paper that introduces the work-stealing deque (section 2.3 Avoid top accesses in pushBottom). |
d36ac07
to
8eb58ee
Compare
8eb58ee
to
fbcbd98
Compare
ca1f35e
to
cd21775
Compare
- Add padding to avoid false sharing - Use a GADT to express desired result type - Use various tweaks to improve performance - Remove negative test that uses the WS deque in an invalid unsafe way - Implement caching of the thief side index
cd21775
to
98f99bd
Compare
This PR optimizes the work-stealing deque.
The graphs on current-bench show the progress of optimizing the work-stealing deque quite nicely:
Here is a run of the benchmarks on my M3 Max before the optimizations:
Here is a run after the optimizations:
General approach: