-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option for filtering by CUDNN_DETERMINISTIC in cudnnConvolutionAlgoPerfChoose
#938
Comments
I saw this has a "good first issue" tag- any hints as to how to do this? |
I haven't put much thought in it; a good first step would be to have a good look at what other libraries/frameworks do (although there's a couple of references in the OP already). We also already have a |
This is still an issue that seems to be easily fixable once there is agreement on how to implement a
|
|
Yeah, I agree. It's just that from a scientific computing language, I kina expect reproducibility by default. At the very least, there should be a clear warning in the docs (but more suited for |
The missing context here is that deep learning in general has a much looser notion of "reproducibility" than most other scientific computing domains. This is due to many reasons and could be an entire discussion on its own, but suffice it to say that the slight deviances caused by allowing non-deterministic algorithms are not sufficient to get most all deep learning code to stop using them. Indeed, none of PyTorch/TF/JAX filter to only deterministic algorithms by default, and Flux is no exception. |
We've had this discussion in the past already. Using Julia doesn't guarantee anything, it all depends on the specific domain. In the case of GPU computing, sacrificing some accuracy is the norm just because of how the devices work (threads executing in a non-deterministic fashion, resulting in inaccuracies due to floating-point semantics). We are not going to degrade performance if none of the other software packages in this specific domain (of GPU programming, deep learning, etc) behave like that. |
Back to the main issue, the main unanswered question is how the layering works here with so many libraries in the stack. Flux probably shouldn't be involved because we'd like it to be accelerator agnostic. So that leaves CUDA, NNlib and NNlibCUDA. I think a global flag in CUDA.jl might be awkward because, unlike MATH_MODE, algorithm filtering doesn't neatly map to a single |
I noticed that sparse matrix-vector multiplications are also not reproducible. The matrix-vector multiplications yield a different result whenever they are executed: using CUDA
using Random
Random.seed!(546784)
A = cu(sprand(Float64, 1000, 1000, 0.6))
x = cu(rand(Float64, 1000))
y1 = A * x
y2 = A * x
maximum(abs.(y2 - y1)) |
Is your feature request related to a problem? Please describe.
Following from https://discourse.julialang.org/t/flux-reproducibility-of-gpu-experiments/62092, there is no way to guarantee for users of e.g. NNlibCUDA to ensure convolution operations only use deterministic algorithms
Describe the solution you'd like
Something along the lines of
https://github.com/pytorch/pytorch/blob/6c70cbedb6102da08fe91186d40a41b50991681d/aten/src/ATen/native/cudnn/Conv_v7.cpp#L219-L252. Whether this would need to be plumbed through higher-level functions, set as a global option or exposed through a context manager is left for debate.
Describe alternatives you've considered
The only solution now is seems to be pirating
cudnnConvolutionForwardAD
such that it doesn't usecudnnConvolutionFwdAlgoPerf
?Additional context
https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#reproducibility states:
So I have no clue if/how often these show up in practice in
cudnnConvolution*AlgoPerf
, but I assume they must if users are seeing non-deterministic results?The text was updated successfully, but these errors were encountered: