how to make CUDA functionalities an extension #2265

CarloLucibello · 2023-06-14T21:04:16Z

Now that NNlibCUDA is an extension in NNlib with a weak dependence on CUDA and cuDNN (see FluxML/NNlib.jl#492), we have to decide how to move forward here.

The actual CUDA code we have here is quite lean, https://github.com/FluxML/Flux.jl/blob/master/src/cuda/cudnn.jl, and actually it should be reduced to none by moving all the functional implementation of normalization to NNlib.

What really matters is that both CUDA.jl and cuDNN.jl are loaded in order to activate all cuda functionalities in NNlib.

I think we basically have 2 options:

We tell the users to explicitly do using Flux, CUDA, cuDNN to unlock cuda functionality. We will have an extension FluxCUDAcuDNNExt here, with CUDA, cuDNN as weak dependencies.
Notice that the extension is also loaded with using Flux, cuDNN, since cuDNN loads CUDA.
We create a dummy package FluxCUDA with no code inside, but with CUDA and cuDNN as dependencies.
Here we will have an extension FluxFluxCUDAExt, with FluxCUDA as a weak dependency. Users will have to do using Flux, FluxCUDA.

I lean more towards option 2 1. Thoughts?

The text was updated successfully, but these errors were encountered:

ToucheSir · 2023-06-17T05:40:40Z

Copying comment from #2268:

I'm pretty ambivalent on whether 1) or 2) would be better, but 2) might let us keep compat with older Julia versions if we made it load NNlibCUDA and/or the Flux CUDA code on <1.9.

RE:

Here we will have an extension FluxFluxCUDAExt, with FluxCUDA as a weak dependency.

I don't believe this is required. Since it's a proper package and not an extension, we could move the existing Flux CUDA code into it. Low tech, but it would work. Whenever we figure out how to reduce the import length issue, we can always convert that code into a package extension in Flux itself and deprecate FluxCUDA.

Alternatively, we could still use a FluxCUDAExt and FluxCUDA would trigger that. However, that would require us to have a (frozen, backport only) copy of the Flux CUDA code in FluxCUDA itself to support Julia <1.9.

CarloLucibello · 2023-06-17T06:37:04Z

Since it's a proper package and not an extension, we could move the existing Flux CUDA code into it.

yes we can do that but I'd rather have all of the Flux code in a single repo, it would make changes involving one or multiple backends and maintenance in general much easier

However, that would require us to have a (frozen, backport only) copy of the Flux CUDA code in FluxCUDA itself to support Julia <1.9.

I'd rather drop < 1.9, supporting old julia versions is not a good use of our time, we can do it when it doesn't take much effort but with julia 1.9 everything changes and we want to rip the benefits without being burdened by the need to support previous releases

darsnack · 2023-06-17T12:23:45Z

Before we drop support for <1.9, we should count bug fixes over the last year that would rise to the level of a backport (i.e. if our Julia compat had increased, these are fixes we would want to backport).

Personally, I think dropping 1.6 support should be considered carefully. It would not be a great situation when the foremost ML package does not support LTS and the last supported release has major bugs.

darsnack · 2023-06-17T12:24:49Z

Also my vote is for Option 2 regardless of where we end up putting the CUDA-related code.

CarloLucibello · 2023-06-17T13:54:07Z

If we go with FluxCUDA, should we also have FluxAMDGPU and FluxMetal as well? They are currently loaded by the single imports using AMDGPU and using Metal.

ToucheSir · 2023-06-17T14:35:28Z

I think those are less important because they don't support the LTS. Unlike with CUDA.jl, people are going to want to use a very recent version of Julia with AMDGPU or Metal.

CarloLucibello · 2023-06-17T15:31:00Z

I thought a bit more about this and I think we can orthogonalize the two discussions:

a) Whether to go with option 1) or 2)

b) Should we support 1.6 only through backports or we keep the julia compact to v1.6 in future versions, meaning that flux should behave consistently on different julia versions.

In order to keep the 1.6 compat, independently of the outcome of a), we need to do the following:

put the CUDA extension in NNlib under @require for VERSION < v1.9 and relax the julia compat there again to 1.6
do the same here in Flux

NNlib depends already on Requires.jl although I was hoping to get rid of it (FluxML/NNlib.jl#494).
So I guess this option is not so bad, only a few extra lines of Requires code to keep the compatibility.
Should we do it?

ToucheSir · 2023-06-17T15:49:53Z

put the CUDA extension in NNlib under @require for VERSION < v1.9 and relax the julia compat there again to 1.6

That's what I tried in FluxML/NNlib.jl#445, but as noted in that thread the additional latency from giving up precompilation was way too extreme. Requires would probably work for the Flux functionality though given that is much smaller (NNlibCUDA is massive in comparison).

CarloLucibello · 2023-06-17T15:55:13Z

That's what I tried in FluxML/NNlib.jl#445, but as noted in that thread the additional latency from giving up precompilation was way too extreme.

So we should use Requires in Flux but not in NNlib? In this case on older julia versions we have to point Flux to NNlibCUDA, but I would strongly oppose this, we would use a deprecated package or not depending on the julia version

ToucheSir · 2023-06-17T16:01:13Z

This is precisely why I've noted on multiple occasions how careful we need to be about this transition! There are so many options that either don't work technically or that people do not like, so taking some time to really nail down the steps required is a good idea.

CarloLucibello · 2023-06-17T16:12:43Z

We have to drop 1.6, I don't see any other options. We can explicitly state though that we guarantee backports to the LTS for critical bugs.

CarloLucibello · 2023-06-18T07:24:11Z

What about options 1) and 2)? I think I lean more towards option 1) now, for symmetry with the other extensions, and because @mcabbott's argument on the line of "people should not need to know about an obscure package" (cuDNN) could be said to apply to FluxCUDA as well, so maybe better be transparent.
But maybe having a FluxCUDA package is more future-proof, could avoid breaking changes in case different loading schemes will be needed in the future.

CarloLucibello · 2023-07-09T16:36:38Z

done in #2268

CarloLucibello added this to the v0.14 milestone Jun 15, 2023

CarloLucibello mentioned this issue Jun 16, 2023

add CUDA extension #2268

Merged

ToucheSir mentioned this issue Jun 17, 2023

[WIP] Move CUDA support to a package extension #2132

Closed

3 tasks

CarloLucibello mentioned this issue Jun 17, 2023

add back julia 1.6 compatibility FluxML/NNlib.jl#500

Closed

avik-pal mentioned this issue Jul 4, 2023

Force test NNLib v0.9 FluxML/Tracker.jl#144

Closed

CarloLucibello closed this as completed Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to make CUDA functionalities an extension #2265

how to make CUDA functionalities an extension #2265

CarloLucibello commented Jun 14, 2023 •

edited

Loading

ToucheSir commented Jun 17, 2023

CarloLucibello commented Jun 17, 2023 •

edited

Loading

darsnack commented Jun 17, 2023

darsnack commented Jun 17, 2023

CarloLucibello commented Jun 17, 2023

ToucheSir commented Jun 17, 2023

CarloLucibello commented Jun 17, 2023 •

edited

Loading

ToucheSir commented Jun 17, 2023 •

edited

Loading

CarloLucibello commented Jun 17, 2023

ToucheSir commented Jun 17, 2023

CarloLucibello commented Jun 17, 2023

CarloLucibello commented Jun 18, 2023

CarloLucibello commented Jul 9, 2023

how to make CUDA functionalities an extension #2265

how to make CUDA functionalities an extension #2265

Comments

CarloLucibello commented Jun 14, 2023 • edited Loading

ToucheSir commented Jun 17, 2023

CarloLucibello commented Jun 17, 2023 • edited Loading

darsnack commented Jun 17, 2023

darsnack commented Jun 17, 2023

CarloLucibello commented Jun 17, 2023

ToucheSir commented Jun 17, 2023

CarloLucibello commented Jun 17, 2023 • edited Loading

ToucheSir commented Jun 17, 2023 • edited Loading

CarloLucibello commented Jun 17, 2023

ToucheSir commented Jun 17, 2023

CarloLucibello commented Jun 17, 2023

CarloLucibello commented Jun 18, 2023

CarloLucibello commented Jul 9, 2023

CarloLucibello commented Jun 14, 2023 •

edited

Loading

CarloLucibello commented Jun 17, 2023 •

edited

Loading

CarloLucibello commented Jun 17, 2023 •

edited

Loading

ToucheSir commented Jun 17, 2023 •

edited

Loading