Option for filtering by CUDNN_DETERMINISTIC in `cudnnConvolutionAlgoPerfChoose` #938

ToucheSir · 2021-05-30T17:39:21Z

Is your feature request related to a problem? Please describe.

Following from https://discourse.julialang.org/t/flux-reproducibility-of-gpu-experiments/62092, there is no way to guarantee for users of e.g. NNlibCUDA to ensure convolution operations only use deterministic algorithms

Describe the solution you'd like

Something along the lines of
https://github.com/pytorch/pytorch/blob/6c70cbedb6102da08fe91186d40a41b50991681d/aten/src/ATen/native/cudnn/Conv_v7.cpp#L219-L252. Whether this would need to be plumbed through higher-level functions, set as a global option or exposed through a context manager is left for debate.

Describe alternatives you've considered

The only solution now is seems to be pirating cudnnConvolutionForwardAD such that it doesn't use cudnnConvolutionFwdAlgoPerf?

Additional context

https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#reproducibility states:

...the following routines do not guarantee reproducibility because they use atomic operations:

cudnnConvolutionBackwardFilter when CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 or CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3 is used

cudnnConvolutionBackwardData when CUDNN_CONVOLUTION_BWD_DATA_ALGO_0 is used

So I have no clue if/how often these show up in practice in cudnnConvolution*AlgoPerf, but I assume they must if users are seeing non-deterministic results?

The text was updated successfully, but these errors were encountered:

ericphanson · 2022-05-17T19:53:56Z

I saw this has a "good first issue" tag- any hints as to how to do this?

maleadt · 2022-05-17T21:24:24Z

I haven't put much thought in it; a good first step would be to have a good look at what other libraries/frameworks do (although there's a couple of references in the OP already). We also already have a math_mode, so maybe the deterministic mode should piggy-back on the pedantic math mode: https://github.com/JuliaGPU/CUDA.jl/blob/056a5269e41d3f447e957b4058bc424683bbbd09/lib/cudnn/CUDNN.jl#L43-L55=. If that doesn't work, a separate determinism TLS entry seems like the best way to go (similar to how math_mode is implemented).

marcpabst · 2022-09-10T09:26:49Z

This is still an issue that seems to be easily fixable once there is agreement on how to implement a deterministicoption/flag. I wonder however, if

the deterministic behavior should maybe be the default one? I find it super uninituitve that I use CUDA.seed! and still get non-reproducible results (maybe CUDA.seed! should implicitly set the deterministic flag?),
setting the deterministic flag should invalide the cache (I guess?), and
there are situations where no deterministic algoritm is available.

maleadt · 2022-09-10T10:38:06Z

CUDA.seed! seeds the RNG, which is unrelated to CUDNN, so I don't see why we should couple those.

marcpabst · 2022-09-10T11:47:40Z

Yeah, I agree. It's just that from a scientific computing language, I kina expect reproducibility by default. At the very least, there should be a clear warning in the docs (but more suited for Flux.jl, I guess).

ToucheSir · 2022-09-11T19:36:48Z

Yeah, I agree. It's just that from a scientific computing language, I kina expect reproducibility by default.

The missing context here is that deep learning in general has a much looser notion of "reproducibility" than most other scientific computing domains. This is due to many reasons and could be an entire discussion on its own, but suffice it to say that the slight deviances caused by allowing non-deterministic algorithms are not sufficient to get most all deep learning code to stop using them. Indeed, none of PyTorch/TF/JAX filter to only deterministic algorithms by default, and Flux is no exception.

maleadt · 2022-09-11T20:25:28Z

It's just that from a scientific computing language, I kina expect reproducibility by default.

We've had this discussion in the past already. Using Julia doesn't guarantee anything, it all depends on the specific domain. In the case of GPU computing, sacrificing some accuracy is the norm just because of how the devices work (threads executing in a non-deterministic fashion, resulting in inaccuracies due to floating-point semantics). We are not going to degrade performance if none of the other software packages in this specific domain (of GPU programming, deep learning, etc) behave like that.

ToucheSir · 2022-09-11T20:43:26Z

Back to the main issue, the main unanswered question is how the layering works here with so many libraries in the stack. Flux probably shouldn't be involved because we'd like it to be accelerator agnostic.

So that leaves CUDA, NNlib and NNlibCUDA. I think a global flag in CUDA.jl might be awkward because, unlike MATH_MODE, algorithm filtering doesn't neatly map to a single cu*Set*Mode call. If cudnnConvolutionForward! and friends could take either an algorithm override or callback to filter algorithms, NNlibCUDA could handle most of this logic. Then NNlib(CUDA) can choose to add global functions to toggle deterministic conv algorithms (i.e. flip a Ref value and and check that in conv!) at its leisure.

renatobellotti · 2022-09-15T12:07:39Z

I noticed that sparse matrix-vector multiplications are also not reproducible. The matrix-vector multiplications yield a different result whenever they are executed:

using CUDA
using Random

Random.seed!(546784)

A = cu(sprand(Float64, 1000, 1000, 0.6))
x = cu(rand(Float64, 1000))

y1 = A * x
y2 = A * x

maximum(abs.(y2 - y1))

ToucheSir added the enhancement New feature or request label May 30, 2021

maleadt added cuda libraries Stuff about CUDA library wrappers. good first issue Good for newcomers labels Jun 1, 2021

ToucheSir mentioned this issue Apr 14, 2022

CUDNN cache locking prevents finalizers resulting in OOMs #1461

Closed

renatobellotti mentioned this issue Sep 30, 2022

Option for deterministic sparse matrix vector multiplication #1607

Open

ToucheSir mentioned this issue Feb 9, 2023

Disable CUDNN_SOFTMAX_FAST or use a separate math mode variable for softmax FluxML/NNlib.jl#506

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option for filtering by CUDNN_DETERMINISTIC in `cudnnConvolutionAlgoPerfChoose` #938

Option for filtering by CUDNN_DETERMINISTIC in `cudnnConvolutionAlgoPerfChoose` #938

ToucheSir commented May 30, 2021

ericphanson commented May 17, 2022

maleadt commented May 17, 2022

marcpabst commented Sep 10, 2022 •

edited

Loading

maleadt commented Sep 10, 2022

marcpabst commented Sep 10, 2022

ToucheSir commented Sep 11, 2022

maleadt commented Sep 11, 2022

ToucheSir commented Sep 11, 2022

renatobellotti commented Sep 15, 2022

Option for filtering by CUDNN_DETERMINISTIC in cudnnConvolutionAlgoPerfChoose #938

Option for filtering by CUDNN_DETERMINISTIC in cudnnConvolutionAlgoPerfChoose #938

Comments

ToucheSir commented May 30, 2021

ericphanson commented May 17, 2022

maleadt commented May 17, 2022

marcpabst commented Sep 10, 2022 • edited Loading

maleadt commented Sep 10, 2022

marcpabst commented Sep 10, 2022

ToucheSir commented Sep 11, 2022

maleadt commented Sep 11, 2022

ToucheSir commented Sep 11, 2022

renatobellotti commented Sep 15, 2022

Option for filtering by CUDNN_DETERMINISTIC in `cudnnConvolutionAlgoPerfChoose` #938

Option for filtering by CUDNN_DETERMINISTIC in `cudnnConvolutionAlgoPerfChoose` #938

marcpabst commented Sep 10, 2022 •

edited

Loading