Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New AsyncMutex implementation #3562

Merged
merged 14 commits into from
Apr 28, 2023

Conversation

BalmungSan
Copy link
Contributor

@BalmungSan BalmungSan commented Apr 25, 2023

I was half-following the recent discussions about the Mutex issue and on an internal conversation with Arman, we both concluded that the current version of AsyncMutex was hard to follow, whereas the two previous versions were "easier" (it still requires a lot of comments and some mental force but still it is more straightforward IMHO). There is also the point that the previous versions always preserved fairness (FIFO semantics), contrary to current.

Thus, considering that the main rationale for the implementation change was performance (especially on the happy path) I was curious and decided to re-run the benchmarks. Since the version Arman introduced on #3409 was heavily modified, I expected some changes.


For clarification, during the lifetime of this PR, we have had the following versions:

  • current is the version currently on main series/3.x
  • previous is the final version presented on Optimize Mutex & AtomicCell #3346 (this version has been discarded since a new test shows it has a serious bug which would allow multiple acquires of the Mutex)
  • deferred is the original version presented on Optimize Mutex & AtomicCell #3346 (this version only requires Concurrent so it can also replace semaphore implementation)
  • new is the final version presented on this PR (it is based on deferred but taking advantage of Async)

Right now we (Arman & Luis) propose to use either deferred or new, since both pass all the current tests (including the FIFO order one) as well as being easier to understand (in our humble opinion).


Benchmark results

Current

Benchmark (fibers) (iterations) Mode Cnt Score Error Units
MutexBenchmark.happyPathAsync 10 1000 thrpt 20 219.723 ± 0.334 ops/s
MutexBenchmark.happyPathAsync 50 1000 thrpt 20 44.040 ± 0.563 ops/s
MutexBenchmark.happyPathAsync 100 1000 thrpt 20 22.485 ± 0.152 ops/s
- - - - - - - -
MutexBenchmark.highContentionAsync 10 1000 thrpt 20 31.041 ± 1.161 ops/s
MutexBenchmark.highContentionAsync 50 1000 thrpt 20 9.442 ± 0.129 ops/s
MutexBenchmark.highContentionAsync 100 1000 thrpt 20 5.477 ± 0.131 ops/s
- - - - - - - -
MutexBenchmark.cancellationAsync 10 1000 thrpt 20 21.953 ± 0.809 ops/s
MutexBenchmark.cancellationAsync 50 1000 thrpt 20 7.899 ± 0.202 ops/s
MutexBenchmark.cancellationAsync 100 1000 thrpt 20 4.844 ± 0.034 ops/s

Previous / Deferred

Benchmark (fibers) (iterations) Mode Cnt Score Error Units
MutexBenchmark.happyPathAsync 10 1000 thrpt 20 213.143 ± 1.696 ops/s
MutexBenchmark.happyPathAsync 50 1000 thrpt 20 44.325 ± 0.215 ops/s
MutexBenchmark.happyPathAsync 100 1000 thrpt 20 22.191 ± 0.215 ops/s
- - - - - - - -
MutexBenchmark.highContentionAsync 10 1000 thrpt 20 34.244 ± 1.025 ops/s
MutexBenchmark.highContentionAsync 50 1000 thrpt 20 8.179 ± 0.230 ops/s
MutexBenchmark.highContentionAsync 100 1000 thrpt 20 4.424 ± 0.028 ops/s
- - - - - - - -
MutexBenchmark.cancellationAsync 10 1000 thrpt 20 22.864 ± 0.901 ops/s
MutexBenchmark.cancellationAsync 50 1000 thrpt 20 7.927 ± 0.066 ops/s
MutexBenchmark.cancellationAsync 100 1000 thrpt 20 4.895 ± 0.045 ops/s

New

Benchmark (fibers) (iterations) Mode Cnt Score Error Units
MutexBenchmark.happyPathAsync 10 1000 thrpt 20 183.778 ± 0.156 ops/s
MutexBenchmark.happyPathAsync 50 1000 thrpt 20 37.519 ± 0.046 ops/s
MutexBenchmark.happyPathAsync 100 1000 thrpt 20 18.778 ± 0.121 ops/s
- - - - - - - -
MutexBenchmark.highContentionAsync 10 1000 thrpt 20 32.455 ± 1.579 ops/s
MutexBenchmark.highContentionAsync 50 1000 thrpt 20 8.192 ± 0.276 ops/s
MutexBenchmark.highContentionAsync 100 1000 thrpt 20 4.486 ± 0.075 ops/s
- - - - - - - -
MutexBenchmark.cancellationAsync 10 1000 thrpt 20 24.626 ± 1.514 ops/s
MutexBenchmark.cancellationAsync 50 1000 thrpt 20 8.150 ± 0.073 ops/s
MutexBenchmark.cancellationAsync 100 1000 thrpt 20 5.046 ± 0.089 ops/s

For context, here are the results of the semaphore implementation for all runs.

Current

[info] MutexBenchmark.happyPathConcurrent             10          1000  thrpt   20  198.903 ± 0.638  ops/s
[info] MutexBenchmark.happyPathConcurrent             50          1000  thrpt   20   40.689 ± 0.348  ops/s
[info] MutexBenchmark.happyPathConcurrent            100          1000  thrpt   20   20.214 ± 0.116  ops/s
[info] MutexBenchmark.highContentionConcurrent        10          1000  thrpt   20   31.392 ± 1.198  ops/s
[info] MutexBenchmark.highContentionConcurrent        50          1000  thrpt   20    7.466 ± 0.156  ops/s
[info] MutexBenchmark.highContentionConcurrent       100          1000  thrpt   20    4.014 ± 0.067  ops/s
[info] MutexBenchmark.cancellationConcurrent          10          1000  thrpt   20   21.554 ± 0.651  ops/s
[info] MutexBenchmark.cancellationConcurrent          50          1000  thrpt   20    7.455 ± 0.049  ops/s
[info] MutexBenchmark.cancellationConcurrent         100          1000  thrpt   20    3.619 ± 0.056  ops/s

Previous / Deferred

[info] MutexBenchmark.happyPathConcurrent             10          1000  thrpt   20  198.903 ± 0.638  ops/s
[info] MutexBenchmark.happyPathConcurrent             50          1000  thrpt   20   40.689 ± 0.348  ops/s
[info] MutexBenchmark.happyPathConcurrent            100          1000  thrpt   20   20.214 ± 0.116  ops/s
[info] MutexBenchmark.highContentionConcurrent        10          1000  thrpt   20   31.392 ± 1.198  ops/s
[info] MutexBenchmark.highContentionConcurrent        50          1000  thrpt   20    7.466 ± 0.156  ops/s
[info] MutexBenchmark.highContentionConcurrent       100          1000  thrpt   20    4.014 ± 0.067  ops/s
[info] MutexBenchmark.cancellationConcurrent          10          1000  thrpt   20   21.554 ± 0.651  ops/s
[info] MutexBenchmark.cancellationConcurrent          50          1000  thrpt   20    7.455 ± 0.049  ops/s
[info] MutexBenchmark.cancellationConcurrent         100          1000  thrpt   20    3.619 ± 0.056  ops/s

New

[info] MutexBenchmark.happyPathConcurrent             10          1000  thrpt   20  199.503 ± 0.725  ops/s
[info] MutexBenchmark.happyPathConcurrent             50          1000  thrpt   20   40.464 ± 0.115  ops/s
[info] MutexBenchmark.happyPathConcurrent            100          1000  thrpt   20   20.236 ± 0.263  ops/s
[info] MutexBenchmark.highContentionConcurrent        10          1000  thrpt   20   32.125 ± 1.281  ops/s
[info] MutexBenchmark.highContentionConcurrent        50          1000  thrpt   20    7.467 ± 0.117  ops/s
[info] MutexBenchmark.highContentionConcurrent       100          1000  thrpt   20    3.934 ± 0.026  ops/s
[info] MutexBenchmark.cancellationConcurrent          10          1000  thrpt   20   21.359 ± 0.838  ops/s
[info] MutexBenchmark.cancellationConcurrent          50          1000  thrpt   20    7.438 ± 0.102  ops/s
[info] MutexBenchmark.cancellationConcurrent         100          1000  thrpt   20    3.432 ± 0.017  ops/s

Lies, Damn Lies, and Benchmarks

Comment on lines 172 to 195
"preserve waiters order (fifo) on a non-race cancellation" in ticked { implicit ticker =>
val numbers = List.range(1, 10)
val p = mutex.flatMap { m =>
IO.ref(List.empty[Int]).flatMap { ref =>
for {
f1 <- m.lock.allocated
(_, f1Release) = f1
f2 <- m.lock.use_.start
_ <- IO.sleep(1.millis)
t <- numbers.parTraverse_ { i =>
IO.sleep(i.millis) >>
m.lock.surround(ref.update(acc => i :: acc))
}.start
_ <- IO.sleep(100.millis)
_ <- f2.cancel
_ <- f1Release
_ <- t.join
r <- ref.get
} yield r.reverse
}
}

p must completeAs(numbers)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes for this PR but fails with the current implementation on series/3.x because waiter #1 moves to the back of the line after #10. IMHO that is a pretty significant corruption of fairness esp. considering there are no race conditions at all here, this test is fully ticked.

@durban durban marked this pull request as draft April 26, 2023 12:43
@durban
Copy link
Contributor

durban commented Apr 26, 2023

@BalmungSan I've set this as "draft" due to your "do not merge" comment. I've also left another comment.

@BalmungSan
Copy link
Contributor Author

As expected, this PR version of Mutex fails the new test.
FWIW, the current version on series/3.x does pass the test! 🎉
cc @durban @armanbilge @djspiewak

@BalmungSan
Copy link
Contributor Author

So, the "original" Deferred based lock-chain idea pass all the current tests.
I am pushing it as a milestone, I think I can optimize it a little bit but just in case.
If we decide to keep this version, since it only needs Concurrent we can just replace the Semaphore one with this.

@BalmungSan BalmungSan marked this pull request as ready for review April 26, 2023 23:37
@BalmungSan BalmungSan requested a review from armanbilge April 26, 2023 23:37
@BalmungSan BalmungSan changed the title Revert AsyncMutex implementation New AsyncMutex implementation Apr 26, 2023
@BalmungSan
Copy link
Contributor Author

BalmungSan commented Apr 26, 2023

I just updated the PR description with the benchmark results for current, previous (deferred) & new. And it seems the performance of all versions is very similar.

Thus, I think the best is to just use the deferred one and remove the semaphore one.

cc: @durban @armanbilge @djspiewak

@djspiewak
Copy link
Member

Yeah this is super-duper close. It definitely feels like it's best to pick the simplest one with the most reasonable semantics.

@BalmungSan
Copy link
Contributor Author

BalmungSan commented Apr 27, 2023

Okay, so I reverted and unified both the Async & Concurrent implementations to the original LockChain idea.
I fleshed out the naming a bit, added a couple of comments, and made some minor improvements like using eq, avoiding wrapper allocation, and the like. Plus, a minor optimization on release for when there is no one waiting for the Mutex.

Final version benchmark results

Benchmark (fibers) (iterations) Mode Cnt Score Error Units
MutexBenchmark.happyPathConcurrent 10 1000 thrpt 20 213.946 ± 1.116 ops/s
MutexBenchmark.happyPathConcurrent 50 1000 thrpt 20 44.099 ± 0.356 ops/s
MutexBenchmark.happyPathConcurrent 100 1000 thrpt 20 21.624 ± 0.089 ops/s
- - - - - - - -
MutexBenchmark.highContentionConcurrent 10 1000 thrpt 20 33.514 ± 1.624 ops/s
MutexBenchmark.highContentionConcurrent 50 1000 thrpt 20 7.655 ± 0.363 ops/s
MutexBenchmark.highContentionConcurrent 100 1000 thrpt 20 4.203 ± 0.024 ops/s
- - - - - - - -
MutexBenchmark.cancellationConcurrent 10 1000 thrpt 20 22.853 ± 1.031 ops/s
MutexBenchmark.cancellationConcurrent 50 1000 thrpt 20 7.895 ± 0.176 ops/s
MutexBenchmark.cancellationConcurrent 100 1000 thrpt 20 4.997 ± 0.080 ops/s

c.c. @djspiewak @armanbilge @durban

@BalmungSan BalmungSan requested a review from durban April 27, 2023 17:43
@djspiewak djspiewak merged commit 741f87a into typelevel:series/3.x Apr 28, 2023
@armanbilge armanbilge linked an issue Apr 28, 2023 that may be closed by this pull request
@BalmungSan BalmungSan deleted the revert-mutex-impl branch April 28, 2023 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Async mutex corrupts FIFO on happy(ish) path
4 participants