Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jaxlib incompatibility with a fresh bss install #296

Closed
jmichel80 opened this issue May 31, 2024 · 6 comments
Closed

jaxlib incompatibility with a fresh bss install #296

jmichel80 opened this issue May 31, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@jmichel80
Copy link
Contributor

jmichel80 commented May 31, 2024

Describe the bug
It is currently not possible to use BSS.FreeEnergy.Relative.analyse() after a fresh install of biosimspace.

To Reproduce
Issue created by installing somd2 from scratch following the instructions here https://github.com/OpenBioSim/somd2/blob/main/README.md

Then attempting to run an MBAR analysis on a somd2 output folder will give the following:

(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ ipython
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.24.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import BioSimSpace as BSS

INFO:numexpr.utils:Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.

In [2]: pmf1, overlap1 = BSS.FreeEnergy.Relative.analyse("output")
INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': Unable to use CUDA because of the following issues with CUDA components:
Outdated CUDA installation found.
Version JAX was built against: 11080
Minimum supported: 12010
Installed version: 11080
The local installation version must be no lower than 12010.
--------------------------------------------------
Outdated cuBLAS installation found.
Version JAX was built against: 111103
Minimum supported: 120100
Installed version: 111103
The local installation version must be no lower than 120100.
--------------------------------------------------
Outdated cuSPARSE installation found.
Version JAX was built against: 11705
Minimum supported: 12100
Installed version: 11705
The local installation version must be no lower than 12100.
INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
WARNING:jax._src.xla_bridge:CUDA backend failed to initialize: Unable to use CUDA because of the following issues with CUDA components:
Outdated CUDA installation found.
Version JAX was built against: 11080
Minimum supported: 12010
Installed version: 11080
The local installation version must be no lower than 12010.
--------------------------------------------------
Outdated cuBLAS installation found.
Version JAX was built against: 111103
Minimum supported: 120100
Installed version: 111103
The local installation version must be no lower than 120100.
--------------------------------------------------
Outdated cuSPARSE installation found.
Version JAX was built against: 11705
Minimum supported: 12100
Installed version: 11705
The local installation version must be no lower than 12100..(Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

Here are the versions of the dependencies that may have triggered this error

(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ mamba list | grep alchemlyb
alchemlyb                 2.3.0              pyhd8ed1ab_0    conda-forge
(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ mamba list | grep openmm
openmm                    8.1.1           py310h43b6314_1    conda-forge
(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ mamba list | grep pym
pymbar                    4.0.3                hff52083_1    conda-forge
pymbar-core               4.0.3           py310h1f7b6fc_1    conda-forge
pymsmt                    22.0                     pypi_0    pypi

and the version of jaxlib installed

(somd2) director@neogodzilla:~/OBS/somd2/reproducibility-report/somd2/32~26_Free$ mamba list | grep jax
jax                       0.4.27             pyhd8ed1ab_0    conda-forge
jaxlib                    0.4.23          cuda118py310hd0f2884_202    conda-forge

(please complete the following information):
Issue tested on Linux Ubuntu 22.04 LTS, with python 3.12, 3.11 and 3.10

  • Version of BioSimSpace: 2024.2.0.dev+14.g598a6743
  • I confirm that I have checked this bug still exists in the latest released version of BioSimSpace: yes
@jmichel80 jmichel80 added the bug Something isn't working label May 31, 2024
@jmichel80
Copy link
Contributor Author

pinning pymbar to 4.0.2 solves the issue

@lohedges
Copy link
Contributor

Good stuff. I'll see if there's a report at their GitHub page when I'm back.

@lohedges
Copy link
Contributor

Reminds me of [this] (#207), where the solution was to also use pymbar 4.0.2. (They messed up a build a while back which causes an incorrect jaxlib to be pulled in, which has never been fixed properly.)

@fjclark
Copy link
Contributor

fjclark commented May 31, 2024

What worked for me, in case it's helpful in future (also working from a fresh BSS install, with python 3.12.3):

I think this is because jax 0.4.26 dropped support for CUDA 11 (jax-ml/jax#18032 (comment)). Avoiding upgrading CUDA, this error was fixed by by downgrading jax (mamba install "jax<0.4.26"). I then got an error from the XLA compiler: XlaRuntimeError: INTERNAL: XLA requires ptxas version 11.8 or higher which was fixed by installing cuda-nvcc with mamba install -c nvidia "cuda-nvcc=11.8" - pymbar 4.0.3 then works for me. I noticed William logged a ptaxs pymbar issue last year, which is still open : choderalab/pymbar#498 .

@lohedges
Copy link
Contributor

It will be nice when alchemlyb makes jax optional (I think pymbar have split things out now). It really makes no sense for this to bork our install when the jax stuff isn't even needed.

@lohedges
Copy link
Contributor

lohedges commented Aug 5, 2024

Closing as this doesn't appear to be causing issues at present. I believe the problem hasn't fundamentally been resolved, but hopefully we are now working with versions of alchemlyb and pymbar that don't trigger the environment resolution issues. At least we have this documented and can re-open if it does raise its head again in future.

@lohedges lohedges closed this as completed Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants