You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In certain contexts it may be possible to use stackless threads and avoid some of the thread creation/context switching overheads associated with stackful threads. We should identify where it's safe to use stackless threads and test if it has any impact. I'd expect this to help with strong scaling and small block sizes.
The text was updated successfully, but these errors were encountered:
I tried this out on hohgant with the mc backend and there is a small but measurable improvement from using stackless threads. The effect depends on the algorithm and configuration and at best gives a ~20 % speedup, at worst the performance is the same. band_to_tridiag seems to benefit the most, the full pipeline benefits a little bit. The main downside right now is that the tridiagonal solver seems to hang sometimes with stackless threads. All other miniapps including the full pipelines seem to work fine with stackless threads. This means we can't just enable stackless threads everywhere right away. I suspect that removing the last uses of futures may help with the remaining problems (they use condition variables internallly for waiting which may be bad on a stackless thread) and I would revisit this after that. If that doesn't help a bit of debugging will likely uncover what is blocking the tridiagonal solver. We can also just opt a few algorithms that benefit most from this to use stackless threads as an intermediate solution. A few plots to see the effect:
This can be closed since #1037 was merged. This is also a reminder that stackless threads are opt-in, so when adding new algorithms or refactoring them, please consider if a certain task could benefit from using a stackless thread.
In certain contexts it may be possible to use stackless threads and avoid some of the thread creation/context switching overheads associated with stackful threads. We should identify where it's safe to use stackless threads and test if it has any impact. I'd expect this to help with strong scaling and small block sizes.
The text was updated successfully, but these errors were encountered: