-
Notifications
You must be signed in to change notification settings - Fork 2k
nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 package has libcublas10 version 10.2 #1143
Comments
I forwarded to the maintainer of the cuda image, thanks for reporting this! |
Hello! Thanks for taking the time to report an issue... However, this is not a bug. libcublas is an "independent library" and the versions can be updated separate of the CUDA version. libcublas is installed as a dependency of
Note the last four digits of the version For example, 10.2 uses
Thanks! |
@demizer - Please note from the OP:
|
@demizer @cliffwoolley Is there a doc/webpage that would help me understand the compatibility between the various library versions? (Is the answer "apt deps are the truth"?) In particular, is there some substring of the libcublas10 version number that corresponds to the cuda version number? (I would have guessed that libcublas10 10.x.y would be compatible with 10.x, or maybe 10.y, but based on what you're saying, maybe neither is the case?) I'm barking up this tree because we (Google Colab) were seeing runtime failures with the tensorflow-gpu package we build, but only once the docker container we use included the git commit I mentioned at the top. I'm trying to figure out whether this is:
I tried out a few versions in this notebook, and I get an error trying to multiply matrices with |
This problem is affecting MXNet, MXNet fails when using libcublas10 version 10.2.2.89-1 with the cuda-cudart-10-1 version 10.1.243-1 with this error:
We found that we had to use CUBLAS 10.2.1.243-1 to get it to work In the nvidia docs it says these are the versions that must be used together: |
Why cublas has 10.2 in the version number? is super confusing WRT cuda 10.1 and cuda 10.2... |
A shorter version of my comment above: should
instead of just the first half, as it has now? |
Also running into this from the Julia world, where our CI uses these images on various systems. If running in the |
I will work with my team today to get the correct version nailed down and push out an update. Thanks! |
Here's how we fixed in MXNet if somebody needs a hotfix quick: Downgrading cublas seems to work. apache/mxnet@edb583b#diff-2e7ef4cd776397d19edfa6aadd3e747eR25 |
Slightly offtopic but related to ubuntu NVidia packages: @demizer if you guys could also forward a request to create a metapackage for the nvidia-driver package which depends on the latest kernel version for ubuntu that would be wonderful. As seems that the driver package name constantly changes. So nvidia-driver would depend on nvidia-driver-440 and cuda versions would depend on nvidia-driver. |
Images with pinned cublas versions have been pushed out! |
Thanks @demizer . Is it possible to pin the container versions in our systems? |
Awesome, thanks @demizer -- I confirmed that we're up and running again:
One remaining question, though: is there an easy recipe for matching |
I believe this is a bug:
In particular, I think this was accidentally introduced here:
https://gitlab.com/nvidia/container-images/cuda/commit/5ef87657fa5b62b614ba6a829473e33c7f5257e1
based on git blame for the Dockerfile:
https://gitlab.com/nvidia/container-images/cuda/blame/master/dist/ubuntu18.04/10.1/devel/cudnn7/Dockerfile
Happily, if you
apt update
, you can still install a10.1
version for libcublas, which makes it recoverable, but I think this is probably still a mistake and should be fixed.The text was updated successfully, but these errors were encountered: