tune CUDA kernels automatically #206

simeonschaub · 2021-02-07T20:49:45Z

This is still quite rough around the edges, but I am putting this up for feedback. This automatically splits up the threads over leading dimensions of the ndrange for better performance if the first dimension is small.

simeonschaub · 2021-02-07T23:53:25Z

Trying this out on my real example, it seems like this adds quite a lot of overhead, so the optimal thread size should probably be determined at kernel creation time, not every time it is invoked.

simeonschaub · 2021-02-08T11:15:56Z

Turns out that this overhead came from having to switch from a static to a dynamic workgroupsize, so I am thinking that it might be better to have a separate API for determining the optimal workgroupsize for a kernel.

simeonschaub · 2021-02-08T12:56:26Z

Well, that was embarassing... Turns out the weird performance issues I was seeing came from me accidentally capping the CUDA memory limit way to low. 🤦 With that fixed, this does actually give a nice speedup even if I change to a dynamic workgroupsize.

src/backends/cuda.jl

vchuravy · 2021-02-08T18:11:58Z

bors try

Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>

bors · 2021-02-08T19:01:27Z

try

Build succeeded:

simeonschaub · 2021-02-08T20:21:41Z

Ok, I think I am fairly happy with this now. If you agree with the way this works, it should be good to go from my side.

src/backends/cuda.jl

vchuravy · 2021-02-08T21:31:41Z

bors r+

bors · 2021-02-08T21:44:17Z

Build succeeded:

simeonschaub · 2021-02-08T22:13:21Z

Mind also tagging a release with this?

JuliaGPU/KernelAbstractions.jl#206 added the ability to automatically tune the workgroupsize of CUDA kernels. This PR stops hardcoding a default workgroupsize and lets KernelAbstractions handle that. This does change the workgroupsize from being statically sized to being dynamically sized, but in my testing, even with fairly small workgroupsizes, that didn't really make a difference.

vchuravy · 2021-02-09T21:01:11Z

Done JuliaRegistries/General#29752

RFC: tune CUDA kernels automatically

c7e3e6d

vchuravy reviewed Feb 8, 2021

View reviewed changes

src/backends/cuda.jl Outdated Show resolved Hide resolved

src/backends/cuda.jl Outdated Show resolved Hide resolved

bors bot added a commit that referenced this pull request Feb 8, 2021

Try #206:

a62efba

Update src/backends/cuda.jl

aaa7c38

Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>

simeonschaub changed the title ~~RFC: tune CUDA kernels automatically~~ tune CUDA kernels automatically Feb 8, 2021

simeonschaub marked this pull request as ready for review February 8, 2021 20:17

clean the logic up a bit

f484b67

simeonschaub force-pushed the sds/autotune branch from f051d95 to f484b67 Compare February 8, 2021 20:18

vchuravy reviewed Feb 8, 2021

View reviewed changes

src/backends/cuda.jl Show resolved Hide resolved

more cleanup

95e6136

bors bot merged commit 8d50887 into JuliaGPU:master Feb 8, 2021

simeonschaub deleted the sds/autotune branch February 8, 2021 21:55

simeonschaub mentioned this pull request Feb 8, 2021

use autotuning for CUDA kernels by default mcabbott/Tullio.jl#82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tune CUDA kernels automatically #206

tune CUDA kernels automatically #206

simeonschaub commented Feb 7, 2021

simeonschaub commented Feb 7, 2021

simeonschaub commented Feb 8, 2021

simeonschaub commented Feb 8, 2021

vchuravy commented Feb 8, 2021

bors bot commented Feb 8, 2021

simeonschaub commented Feb 8, 2021 •

edited

Loading

vchuravy commented Feb 8, 2021

bors bot commented Feb 8, 2021

simeonschaub commented Feb 8, 2021

vchuravy commented Feb 9, 2021

tune CUDA kernels automatically #206

tune CUDA kernels automatically #206

Conversation

simeonschaub commented Feb 7, 2021

simeonschaub commented Feb 7, 2021

simeonschaub commented Feb 8, 2021

simeonschaub commented Feb 8, 2021

vchuravy commented Feb 8, 2021

bors bot commented Feb 8, 2021

try

simeonschaub commented Feb 8, 2021 • edited Loading

vchuravy commented Feb 8, 2021

bors bot commented Feb 8, 2021

simeonschaub commented Feb 8, 2021

vchuravy commented Feb 9, 2021

simeonschaub commented Feb 8, 2021 •

edited

Loading