Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 package has libcublas10 version 10.2 #1143

Closed
craigcitro opened this issue Dec 2, 2019 · 14 comments

Comments

@craigcitro
Copy link

I believe this is a bug:

[~] $ docker pull nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
10.1-cudnn7-devel-ubuntu18.04: Pulling from nvidia/cuda
7ddbc47eeb70: Already exists
c1bbdc448b72: Already exists
8c3b70e39044: Already exists
45d437916d57: Already exists
d8f1569ddae6: Already exists
85386706b020: Already exists
ee9b457b77d0: Already exists
8f6f72d62d47: Already exists
b50dcded52ed: Already exists
04b4269fbb2a: Already exists
Digest: sha256:03981dbd27dd4def33d09ab499dbc6c9ae7254b0d462b8118d90508c82c7a382
Status: Downloaded newer image for nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
docker.io/nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
[~] $ docker run -it --rm --entrypoint bash nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
root@8a24fe4ebd33:/# dpkg -l | grep libcublas
ii  libcublas-dev                 10.2.2.89-1                       amd64        CUBLAS native dev links, headers
ii  libcublas10                   10.2.2.89-1                       amd64        CUBLAS native runtime libraries

In particular, I think this was accidentally introduced here:
https://gitlab.com/nvidia/container-images/cuda/commit/5ef87657fa5b62b614ba6a829473e33c7f5257e1
based on git blame for the Dockerfile:
https://gitlab.com/nvidia/container-images/cuda/blame/master/dist/ubuntu18.04/10.1/devel/cudnn7/Dockerfile

Happily, if you apt update, you can still install a 10.1 version for libcublas, which makes it recoverable, but I think this is probably still a mistake and should be fixed.

@RenaudWasTaken
Copy link
Contributor

I forwarded to the maintainer of the cuda image, thanks for reporting this!

@demizer
Copy link

demizer commented Dec 3, 2019

Hello! Thanks for taking the time to report an issue... However, this is not a bug. libcublas is an "independent library" and the versions can be updated separate of the CUDA version.

libcublas is installed as a dependency of cuda-libraries-10-1:

$ apt show cuda-libraries-10-1                                                                                                                         
Package: cuda-libraries-10-1                                                                                                                                              
Version: 10.1.243-1                                                                                                                                                       
Priority: optional                                                                                                                                                        
Section: multiverse/devel                                                                                                                                                 
Source: cuda                                                                                                                                                              
Maintainer: cudatools <cudatools@nvidia.com>                                                                                                                              
Installed-Size: 25.6 kB                                                                                                                                                   
Depends: cuda-nvrtc-10-1 (>= 10.1.243), cuda-nvgraph-10-1 (>= 10.1.243), cuda-nvjpeg-10-1 (>= 10.1.243), cuda-cusolver-10-1 (>= 10.1.243), libcublas10 (>= 10.2.1.243), cu
da-cufft-10-1 (>= 10.1.243), cuda-curand-10-1 (>= 10.1.243), cuda-cusparse-10-1 (>= 10.1.243), cuda-npp-10-1 (>= 10.1.243), cuda-cudart-10-1 (>= 10.1.243), cuda-license-1
0-1 (>= 10.1.243) 

Note the last four digits of the version 10.2.1.243. These correspond to the cuda version 10.1 Update 2(10.1.243).

For example, 10.2 uses 10.2.2.89:

$ apt show cuda-libraries-10-2
Package: cuda-libraries-10-2
Version: 10.2.89-1
Status: install ok installed
Priority: optional
Section: multiverse/devel
Source: cuda
Maintainer: cudatools <cudatools@nvidia.com>
Installed-Size: 25.6 kB
Depends: cuda-nvrtc-10-2 (>= 10.2.89), cuda-nvgraph-10-2 (>= 10.2.89), cuda-nvjpeg-10-2 (>= 10.2.89), cuda-cusolver-10-2 (>= 10.2.89), libcublas10 (>= 10.2.2.89), cuda-cufft-10-2 (>= 10.2.89), cuda-curand-10-2 (>= 10.2.89), cuda-cusparse-10-2 (>= 10.2.89), cuda-npp-10-2 (>= 10.2.89), cuda-cudart-10-2 (>= 10.2.89), cuda-license-10-2 (>= 10.2.89)

Thanks!

@cliffwoolley
Copy link
Collaborator

@demizer - Please note from the OP:

docker run -it --rm --entrypoint bash nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
root@8a24fe4ebd33:/# dpkg -l | grep libcublas
ii  libcublas-dev                 10.2.2.89-1                       amd64        CUBLAS native dev links, headers
ii  libcublas10                   10.2.2.89-1                       amd64        CUBLAS native runtime libraries

@craigcitro
Copy link
Author

@demizer @cliffwoolley Is there a doc/webpage that would help me understand the compatibility between the various library versions? (Is the answer "apt deps are the truth"?)

In particular, is there some substring of the libcublas10 version number that corresponds to the cuda version number? (I would have guessed that libcublas10 10.x.y would be compatible with 10.x, or maybe 10.y, but based on what you're saying, maybe neither is the case?)

I'm barking up this tree because we (Google Colab) were seeing runtime failures with the tensorflow-gpu package we build, but only once the docker container we use included the git commit I mentioned at the top. I'm trying to figure out whether this is:

  • there was an unexpected upstream change, which gets fixed, and we don't have to worry about it
  • this is WAI upstream, in which case we need to be more careful about which library versions we include in our containers

I tried out a few versions in this notebook, and I get an error trying to multiply matrices with libcublas10=10.2.2.89-1, but with no other versions.

@samskalicky
Copy link

samskalicky commented Dec 4, 2019

This problem is affecting MXNet, MXNet fails when using libcublas10 version 10.2.2.89-1 with the cuda-cudart-10-1 version 10.1.243-1 with this error:

  what():  [02:24:40] /home/ubuntu/pip_build/mxnet-build/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:107: Check failed: err == CUBLAS_STATUS_SUCCESS (7 vs. 0) : Destory cublas handle failed

We found that we had to use CUBLAS 10.2.1.243-1 to get it to work

In the nvidia docs it says these are the versions that must be used together:
https://docs.nvidia.com/deeplearning/frameworks/mxnet-release-notes/rel_19-08.html#rel_19-08

@ptrendx

@larroy
Copy link

larroy commented Dec 4, 2019

Why cublas has 10.2 in the version number? is super confusing WRT cuda 10.1 and cuda 10.2...

@craigcitro
Copy link
Author

A shorter version of my comment above: should cuda-libraries-10-1 have a libcublas10 version restriction that looks like

>=10.2.1.243,<10.2.2.0

instead of just the first half, as it has now?

@maleadt
Copy link

maleadt commented Dec 5, 2019

Also running into this from the Julia world, where our CI uses these images on various systems. If running in the nvidia/cuda:10.1-devel-ubuntu18.04 image on a system that has a CUDA 10.1-compatible driver (e.g. 418.74), we get CUBLAS_STATUS_NOT_INITIALIZED by just calling cublasCreate_v2. With a more recent driver, such as 440.36 with support for CUDA 10.2, this call works as expected.

@demizer
Copy link

demizer commented Dec 5, 2019

I will work with my team today to get the correct version nailed down and push out an update. Thanks!

@larroy
Copy link

larroy commented Dec 5, 2019

Here's how we fixed in MXNet if somebody needs a hotfix quick:

Downgrading cublas seems to work.

apache/mxnet@edb583b#diff-2e7ef4cd776397d19edfa6aadd3e747eR25

@larroy
Copy link

larroy commented Dec 5, 2019

Slightly offtopic but related to ubuntu NVidia packages:

@demizer if you guys could also forward a request to create a metapackage for the nvidia-driver package which depends on the latest kernel version for ubuntu that would be wonderful. As seems that the driver package name constantly changes. So nvidia-driver would depend on nvidia-driver-440 and cuda versions would depend on nvidia-driver.

@demizer
Copy link

demizer commented Dec 5, 2019

Images with pinned cublas versions have been pushed out!

@larroy
Copy link

larroy commented Dec 6, 2019

Thanks @demizer . Is it possible to pin the container versions in our systems?

@craigcitro
Copy link
Author

Awesome, thanks @demizer -- I confirmed that we're up and running again:

$ docker pull nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
10.1-cudnn7-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:557de4ba2cb674029ffb602bed8f748d44d59bb7db9daa746ea72a102406d3ec
Status: Image is up to date for nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
docker.io/nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
$ docker run -it --rm --entrypoint bash nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
root@c3f6749e5218:/# dpkg -l | grep libcublas10
ii  libcublas10                   10.2.1.243-1                      amd64        CUBLAS native runtime libraries
root@c3f6749e5218:/#

One remaining question, though: is there an easy recipe for matching libcublas versions and cuda-libraries versions? The examples above all suggest "drop the second digit" works, but it'd be good to confirm that's something we can depend on in the future.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants