support GPU #7

foldfelis · 2021-08-11T18:40:56Z

This is awkward...

FFTW.fft and FFTW.ifft work on CUDA only if dims=1.
According to documents, NNlib.batched_adjoint has same behavior with both batched_transpose, and PermutedDimsArray, but neither of them works on CUDA. See this and this.
I use @tullio to construct CUDA kernels for batched transpose and einsum to do the calculation.

foldfelis added 5 commits August 12, 2021 01:32

enable gpu

e56f5d6

update cb

defb9ca

refactor SpectralConv1d

d8370f8

remove redundant file

83649cc

update compat

18f83a9

foldfelis added the enhancement New feature or request label Aug 11, 2021

foldfelis self-assigned this Aug 11, 2021

fix mismatched uuid

1817c9d

foldfelis merged commit 609e6dd into master Aug 11, 2021

foldfelis deleted the gpu branch August 11, 2021 18:45

Provide feedback