Skip to content
This repository has been archived by the owner on Sep 28, 2024. It is now read-only.

support GPU #7

Merged
merged 6 commits into from
Aug 11, 2021
Merged

support GPU #7

merged 6 commits into from
Aug 11, 2021

Conversation

foldfelis
Copy link
Contributor

@foldfelis foldfelis commented Aug 11, 2021

This is awkward...

  1. FFTW.fft and FFTW.ifft work on CUDA only if dims=1.
  2. According to documents, NNlib.batched_adjoint has same behavior with both batched_transpose, and PermutedDimsArray, but neither of them works on CUDA. See this and this.
  3. I use @tullio to construct CUDA kernels for batched transpose and einsum to do the calculation.

@foldfelis foldfelis added the enhancement New feature or request label Aug 11, 2021
@foldfelis foldfelis self-assigned this Aug 11, 2021
@foldfelis foldfelis merged commit 609e6dd into master Aug 11, 2021
@foldfelis foldfelis deleted the gpu branch August 11, 2021 18:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant