Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross platform weak bag implementation #2673

Merged
merged 14 commits into from
Dec 20, 2021
Merged

Cross platform weak bag implementation #2673

merged 14 commits into from
Dec 20, 2021

Conversation

vasilmkd
Copy link
Member

@vasilmkd vasilmkd commented Dec 20, 2021

Completely replaces the WeakHashMap mechanism, on each WorkerThread, the fallback and JS.

A possible remedy for #2634.

build.sbt Outdated
Comment on lines 454 to 455
("org.scala-js" %%% "scalajs-weakreferences" % JsWeakReferencesVersion)
.cross(CrossVersion.for3Use2_13)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some interesting reading about this scala-js/scala-js-weakreferences#7 (comment)

@vasilmkd
Copy link
Member Author

"io.vasilev" %% "cats-effect" % "3.3-87-1d1f0ff"

@vasilmkd
Copy link
Member Author

vasilmkd commented Dec 20, 2021

Benchmark results:

Baseline on series/3.3.x with tracing off (this case is unaffected by this PR):

benchmarks/Jmh/run -wi 10 -i 10 -f 2 -t 1 -prof gc --jvmArgs -Dcats.effect.tracing.mode=none --jvmArgs -Dcats.effect.tracing.exceptions.enhanced=false ParallelBenchmark.par
Benchmark                                                       (cpuTokens)  (size)   Mode  Cnt        Score       Error   Units
ParallelBenchmark.parTraverse                                         10000    1000  thrpt   20      316.579 ±     3.402   ops/s
ParallelBenchmark.parTraverse:·gc.alloc.rate                          10000    1000  thrpt   20     1417.315 ±    15.392  MB/sec
ParallelBenchmark.parTraverse:·gc.alloc.rate.norm                     10000    1000  thrpt   20  4929586.675 ±   579.390    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space                 10000    1000  thrpt   20     1429.037 ±    22.488  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space.norm            10000    1000  thrpt   20  4970221.334 ± 47058.032    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space             10000    1000  thrpt   20        0.222 ±     0.022  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space.norm        10000    1000  thrpt   20      771.322 ±    75.550    B/op
ParallelBenchmark.parTraverse:·gc.count                               10000    1000  thrpt   20      919.000              counts
ParallelBenchmark.parTraverse:·gc.time                                10000    1000  thrpt   20     1325.000                  ms

Baseline on series/3.3.x with tracing on:

benchmarks/Jmh/run -wi 10 -i 10 -f 2 -t 1 -prof gc --jvmArgs -Dcats.effect.tracing.mode=cached --jvmArgs -Dcats.effect.tracing.exceptions.enhanced=true ParallelBenchmark.par
Benchmark                                                       (cpuTokens)  (size)   Mode  Cnt        Score        Error   Units
ParallelBenchmark.parTraverse                                         10000    1000  thrpt   20      204.719 ±     19.250   ops/s
ParallelBenchmark.parTraverse:·gc.alloc.rate                          10000    1000  thrpt   20     1044.091 ±     97.878  MB/sec
ParallelBenchmark.parTraverse:·gc.alloc.rate.norm                     10000    1000  thrpt   20  5616091.869 ±   2648.642    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space                 10000    1000  thrpt   20     1052.138 ±    110.314  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space.norm            10000    1000  thrpt   20  5665606.945 ± 358765.601    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space             10000    1000  thrpt   20        0.814 ±      0.502  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space.norm        10000    1000  thrpt   20     4325.508 ±   2692.273    B/op
ParallelBenchmark.parTraverse:·gc.count                               10000    1000  thrpt   20       92.000               counts
ParallelBenchmark.parTraverse:·gc.time                                10000    1000  thrpt   20    17783.000                   ms

This PR with tracing on:

benchmarks/Jmh/run -wi 10 -i 10 -f 2 -t 1 -prof gc --jvmArgs -Dcats.effect.tracing.mode=cached --jvmArgs -Dcats.effect.tracing.exceptions.enhanced=true ParallelBenchmark.par
Benchmark                                                       (cpuTokens)  (size)   Mode  Cnt        Score       Error   Units
ParallelBenchmark.parTraverse                                         10000    1000  thrpt   20      278.251 ±     1.785   ops/s
ParallelBenchmark.parTraverse:·gc.alloc.rate                          10000    1000  thrpt   20     1391.663 ±     9.091  MB/sec
ParallelBenchmark.parTraverse:·gc.alloc.rate.norm                     10000    1000  thrpt   20  5507256.392 ±   575.319    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space                 10000    1000  thrpt   20     1407.085 ±    15.138  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space.norm            10000    1000  thrpt   20  5568406.214 ± 57756.867    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space             10000    1000  thrpt   20        0.246 ±     0.020  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space.norm        10000    1000  thrpt   20      972.881 ±    79.171    B/op
ParallelBenchmark.parTraverse:·gc.count                               10000    1000  thrpt   20      812.000              counts
ParallelBenchmark.parTraverse:·gc.time                                10000    1000  thrpt   20     1386.000                  ms

This PR improves performance significantly and reduces GC time to the levels of the case without tracing.

@vasilmkd vasilmkd marked this pull request as ready for review December 20, 2021 16:49
@vasilmkd
Copy link
Member Author

Under the advice of @armanbilge, this PR shades https://github.com/scala-js/scala-js-weakreferences by copying the source code (~100 LOC, not including license headers). This is due to the existence of https://github.com/scala-js/scala-js-fake-weakreferences which are implemented in terms of strong references and would wreak havoc if selected over the weak reference implementations.

@vasilmkd
Copy link
Member Author

vasilmkd commented Dec 20, 2021

For reference these results are from series/3.2.x and the 3.2.9 release:

With tracing:

benchmarks/Jmh/run -wi 10 -i 10 -f 2 -t 1 -prof gc --jvmArgs -Dcats.effect.tracing.mode=cached --jvmArgs -Dcats.effect.tracing.exceptions.enhanced=true ParallelBenchmark.par
Benchmark                                                       (cpuTokens)  (size)   Mode  Cnt        Score       Error   Units
ParallelBenchmark.parTraverse                                         10000    1000  thrpt   20      290.942 ±     2.351   ops/s
ParallelBenchmark.parTraverse:·gc.alloc.rate                          10000    1000  thrpt   20     1521.536 ±    12.438  MB/sec
ParallelBenchmark.parTraverse:·gc.alloc.rate.norm                     10000    1000  thrpt   20  5758396.046 ±   580.724    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space                 10000    1000  thrpt   20     1536.991 ±    20.210  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space.norm            10000    1000  thrpt   20  5816806.929 ± 53534.035    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space             10000    1000  thrpt   20        0.303 ±     0.024  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space.norm        10000    1000  thrpt   20     1147.693 ±    89.317    B/op
ParallelBenchmark.parTraverse:·gc.count                               10000    1000  thrpt   20      887.000              counts
ParallelBenchmark.parTraverse:·gc.time                                10000    1000  thrpt   20     1316.000                  ms

Without tracing:

Benchmark                                                       (cpuTokens)  (size)   Mode  Cnt        Score       Error   Units
ParallelBenchmark.parTraverse                                         10000    1000  thrpt   20      295.814 ±     2.993   ops/s
ParallelBenchmark.parTraverse:·gc.alloc.rate                          10000    1000  thrpt   20     1559.392 ±    10.000  MB/sec
ParallelBenchmark.parTraverse:·gc.alloc.rate.norm                     10000    1000  thrpt   20  5804947.416 ± 40176.019    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space                 10000    1000  thrpt   20     1575.089 ±    15.339  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Eden_Space.norm            10000    1000  thrpt   20  5863488.584 ± 67068.118    B/op
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space             10000    1000  thrpt   20        0.318 ±     0.025  MB/sec
ParallelBenchmark.parTraverse:·gc.churn.G1_Survivor_Space.norm        10000    1000  thrpt   20     1181.857 ±    90.029    B/op
ParallelBenchmark.parTraverse:·gc.count                               10000    1000  thrpt   20      909.000              counts
ParallelBenchmark.parTraverse:·gc.time                                10000    1000  thrpt   20     1338.000                  ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants