Significant performance improvements, new scheduler #107

computablee · 2023-11-06T16:42:33Z

Which issue are you addressing?

Significant performance improvements, new work-stealing scheduler.

How have you addressed the issue?

This PR implements the WorkStealingScheduler class for parallel for loops which use a work-stealing scheduler. Much of the scheduling code has undergone serious optimization, including a 40% improvement in a particular benchmark for static scheduling with chunk_size=1. Improvements were made to collapsed loops as well, incorporating division-by-multiplication. More testing is required here.

How have you tested your patch?

Unit tests have been written where necessary, and all unit tests pass.

…ct_results

codecov · 2023-11-06T16:48:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e140dc3) 99.12% compared to head (ad2c447) 99.21%.
Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
+ Coverage   99.12%   99.21%   +0.09%     
==========================================
  Files          12       12              
  Lines        1137     1271     +134     
  Branches      113      132      +19     
==========================================
+ Hits         1127     1261     +134     
  Misses          5        5              
  Partials        5        5

Files	Coverage Δ
DotMP/Parallel.cs	`98.97% <100.00%> (+<0.01%)`	⬆️
DotMP/Schedule.cs	`96.66% <100.00%> (+4.35%)`	⬆️
DotMP/WorkShare.cs	`99.00% <100.00%> (-0.08%)`	⬇️
DotMP/Wrappers.cs	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

computablee · 2023-11-06T17:30:58Z

After further performance testing of the updates to collapsed loops, it seems that performance may be worsened for average use cases. Avoiding merging this for now until more data can be collected.

computablee · 2023-11-07T14:40:57Z

Division-by-multiplication was removed in collapsed loops, and instead, manual iteration was implemented. This avoids any expensive operations like division, modulo, multiply, etc. The performance improvements from this are insane, well over 2x across the board for different approaches to loops.

Collapse(3) was also optimized, although remains untested. I would be very shocked if performance gains were anything less than 3x. Collapse(4) and Collapse(n) remain unoptimized, due to code complexity. There should be a writeup discussing the "yes"s and "no"s of the library as far as performance. Collapse(4) or higher is definitely a "no" for lightweight loops due to the extreme overhead of calculating indices.

Optimizing high-dimension collapsed loops shouldn't be too difficult if I get requests for it. Certainly a far easier approach than prior iterations of the collapsed chunk executor. Opening an issue and doing this later.

computablee added 11 commits November 6, 2023 03:32

add test for workstealing, use Assert.Equal in *_should_produce_corre…

565890e

…ct_results

base implementation for workstealing, sans the stealing part

3ec9e92

add new test to ensure that workstealing properly load balances

82950b3

add new workstealing scheduler

647af91

integrate workstealing scheduler into official support

18570ab

optimizations and cleanup

9d7cfdf

removed thr class, refactoring

8884a5a

minor changes to improve testing comprehensiveness

f89442b

implement division-by-multiplication for collapsed loops

1c808ac

update benchmark

826c322

add comment

a4105ef

computablee added 2 commits November 6, 2023 11:10

jscpd ignore whole file

9f2e2b5

add test for workstealing runtime schedule

bcf42af

computablee added 3 commits November 7, 2023 08:22

fix new potential bounds issue with updated chunk execution code

fabccd8

remove dividesharp project-- unnecessary

fbd0482

enormous performance gains in collapsed loops (5x in some cases)

ad2c447

computablee merged commit c5c28a9 into main Nov 7, 2023
17 checks passed

computablee mentioned this pull request Nov 7, 2023

[PERFORMANCE] Optimize Collapse(4) and higher. #108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant performance improvements, new scheduler #107

Significant performance improvements, new scheduler #107

computablee commented Nov 6, 2023

codecov bot commented Nov 6, 2023 •

edited

Loading

computablee commented Nov 6, 2023

computablee commented Nov 7, 2023

Significant performance improvements, new scheduler #107

Significant performance improvements, new scheduler #107

Conversation

computablee commented Nov 6, 2023

Which issue are you addressing?

How have you addressed the issue?

How have you tested your patch?

codecov bot commented Nov 6, 2023 • edited Loading

Codecov Report

computablee commented Nov 6, 2023

computablee commented Nov 7, 2023

codecov bot commented Nov 6, 2023 •

edited

Loading