Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip installation fails, CUTCLASS not found #473

Closed
AbdBarho opened this issue Oct 9, 2022 · 21 comments · Fixed by #523
Closed

Pip installation fails, CUTCLASS not found #473

AbdBarho opened this issue Oct 9, 2022 · 21 comments · Fixed by #523

Comments

@AbdBarho
Copy link
Contributor

AbdBarho commented Oct 9, 2022

🐛 Bug

pip installation fails in a docker container, CUTCLASS not found, git submodule update --init --recursive not executed

To Reproduce

Dockerfile

FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
RUN pip install xformers

then

docker build .

Error Trace

open
#1 [internal] load build definition from Dockerfile
#1 sha256:bc3772a9760c6470030d3506e7afa0b9caa2a77f63376fe30fc296a334d5c980
#1 transferring dockerfile: 116B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:5b674e66e988c8852edbf605c0d0921ac6eed40841cd55d9112e0d92242091a1
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
#3 sha256:409f78a4f3551ef4b6d7a4b064ff72bb54f0677d599351b4d0dcdff08b926834
#3 DONE 0.8s

#4 [1/2] FROM docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime@sha256:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75
#4 sha256:2e3e89abd93f2e7b42b070196f0e6be4ce38a2d360c98232440e1d90189bdb02
#4 CACHED

#5 [2/2] RUN pip install xformers
#5 sha256:ef3133015f56a22d509f2aa1ef730afdcaa2591838105ba332650ff73ceb9ff9
#5 1.012 Collecting xformers
#5 1.313   Downloading xformers-0.0.13.tar.gz (292 kB)
#5 1.429      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.5/292.5 kB 2.6 MB/s eta 0:00:00
#5 1.534   Preparing metadata (setup.py): started
#5 2.952   Preparing metadata (setup.py): finished with status 'error'
#5 2.961   error: subprocess-exited-with-error
#5 2.961
#5 2.961   × python setup.py egg_info did not run successfully.
#5 2.961   │ exit code: 1
#5 2.961   ╰─> [8 lines of output]
#5 2.961       Traceback (most recent call last):
#5 2.961         File "<string>", line 36, in <module>
#5 2.961         File "<pip-setuptools-caller>", line 34, in <module>
#5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 239, in <module>
#5 2.961           ext_modules=get_extensions(),
#5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 158, in get_extensions
#5 2.961           "CUTLASS submodule not found. Did you forget "
#5 2.961       RuntimeError: CUTLASS submodule not found. Did you forget to run `git submodule update --init --recursive` ?
#5 2.961       [end of output]
#5 2.961
#5 2.961   note: This error originates from a subprocess, and is likely not a problem with pip.
#5 2.965 error: metadata-generation-failed
#5 2.965
#5 2.965 × Encountered error while generating package metadata.
#5 2.965 ╰─> See above for output.
#5 2.965
#5 2.965 note: This is an issue with the package mentioned above, not pip.
#5 2.965 hint: See above for details.
#5 ERROR: executor failed running [/bin/sh -c pip install xformers]: exit code: 1
------
 > [2/2] RUN pip install xformers:
------
executor failed running [/bin/sh -c pip install xformers]: exit code: 1

Expected behavior

installation should work.

Environment

in the container, running docker on windows

open
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A 

OS: Ubuntu 18.04.6 LTS (x86_64) 
GCC version: Could not collect  
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17

Python version: 3.7.13 (default, Mar 29 2022, 02:18:16)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060
Nvidia driver version: 517.48
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchtext==0.13.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl-service               2.4.0            py37h7f8727e_0
[conda] mkl_fft                   1.3.1            py37hd3c417c_0
[conda] mkl_random                1.2.2            py37h51133e4_0
[conda] numpy                     1.21.5           py37he7a7128_2
[conda] numpy-base                1.21.5           py37hf524024_2
[conda] pytorch                   1.12.1          py3.7_cuda11.3_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchtext                 0.13.1                     py37    pytorch
[conda] torchvision               0.13.1               py37_cu113    pytorch

Additional context

I don't think this problem has anything to do with os/python/pytorch/cuda/nvcc versions, the setup.py seems to be tailored for local / manual install, and fails in the context.

@AbdBarho
Copy link
Contributor Author

AbdBarho commented Oct 9, 2022

I just saw #471, seems that the pip build is not up-to-date?

@r3nor
Copy link

r3nor commented Oct 9, 2022

I am facing the exact same issue

@GrennKren
Copy link

GrennKren commented Oct 9, 2022

I managed to solve it by compiling it. You can also try the compiled version; there are two places I saw it on github. I'm doing it on my own because nothing worked for me.

The way I did it was like this.
(It is recommended to install it within a virtual environment)

First, make sure which Pytorch version is compatible with your CUDA. If not, please install or reinstall it.
You can try visiting this site. (Also make sure PyTorch is >= 1.12.0)
https://pytorch.org/get-started/locally/

Then install other packages.

$ pip install pyre-extensions==0.0.23
$ pip install numpy

Finally, just git clone this repository and compile it


$ git clone https://github.com/facebookresearch/xformers/
$ cd xformers
$ git submodule update --init --recursive
$ pip install --verbose --no-deps -e .

The reason I use --no-deps and install required packages at first is because I don't want it to replace PyTorch we have installed if it somehow gets replaced with an uncompatible version with CUDA we own.

And if that is done, 
you can also build it into a wheel file so we don't need to recompile it in the future.

# still in xformers location
$ python setup.py bdist_wheel --universal
# the output file is in xformers/dist directory.

@trufty
Copy link

trufty commented Oct 9, 2022

$ git install --verbose --no-deps -e .

@GrennKren I think you meant pip install here?
But yes this does seem to bypass the cutlass issue for nvidia docker builds at least

@slix
Copy link

slix commented Oct 9, 2022

A possible workaround is:

pip install git+https://github.com/facebookresearch/xformers.git@v0.0.13#egg=xformers

pip does the submodule update process in this case since it knows it's pulling from a git repo.

@Lordxan
Copy link

Lordxan commented Oct 10, 2022

A possible workaround is:

pip install git+https://github.com/facebookresearch/xformers.git@v0.0.13#egg=xformers

pip does the submodule update process in this case since it knows it's pulling from a git repo.

In my case it still don't work it just saying that it cant find modules xformers

@fmassa
Copy link
Contributor

fmassa commented Oct 10, 2022

Hi,

We have recently added a conda package for xformers, which can be found in https://anaconda.org/xformers/xformers

Maybe you could try installing from there?

Also, if installing from source I would recommend performing the installation via

FORCE_CUDA=1 pip install git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

this way you are sure that you are pulling the required submodules

@AbdBarho
Copy link
Contributor Author

AbdBarho commented Oct 11, 2022

Thank you all, it works.

However, my problem is not with installing xformers in general, my problem is installing xformers with pip from pypi specifically. If the package in pypi is deprecated, then it should be marked as such, otherwise, it should be possible to install correctly.

@fmassa
Copy link
Contributor

fmassa commented Oct 12, 2022

@AbdBarho yes, the package from pip is not up-to-date, we need to do something about it

@fmassa
Copy link
Contributor

fmassa commented Oct 12, 2022

@blefaudeux now that we have conda packages for xformers which ship with the precompiled binaries, what do you think about marking the pypi package from xformers as deprecated?

@blefaudeux
Copy link
Contributor

@blefaudeux now that we have conda packages for xformers which ship with the precompiled binaries, what do you think about marking the pypi package from xformers as deprecated?

hmm as you see fit, but it's a very common way for people to get to xformers (48k downloads or something, 5k/week) so that's a lot of people who will be let down. Not everyone in the python ecosystem is using conda actually, and it's not so discoverable (pip install xformers will return something, conda install xformers will not). So it's a little hard for me to see that as a benefit really, but I certainly understand the maintenance burden

@AbdBarho
Copy link
Contributor Author

@blefaudeux I totally agree, I don't want it gone either, pip is so ubiquitous it makes sense to have it.

Is there any way I can contribute?

@blefaudeux
Copy link
Contributor

@blefaudeux I totally agree, I don't want it gone either, pip is so ubiquitous it makes sense to have it.

Is there any way I can contribute?

basically xformers would need many builds depending on the cuda and python version, a bit like the attached screenshot for pytorch. Right now there's only one wheel on pypi, if that does not match the source install does not work since cutlass submodule became a requirement (in that case I think that we should upload another pip package without the cuda source, and warn very loudly that this installation will not have the mem efficient attention).

Fairseq has github actions which automatically produce a wheel given a release, I'm guessing that something similiar for xformers (covering a couple of typical CUDA versions, same as pytorch probably) would be best, or just reuse pytorch CD infra actually ? (might be overcomplicated because of the many other accelerators being supported). If you have skills in that world I think that the xformers team would gladly accept a PR :)

In short xformers is lacking a CD system and CD expertise in general, that's my now distant understanding

Screenshot from 2022-10-15 22-26-12

@AbdBarho
Copy link
Contributor Author

Ok, I will see what I can do, just to make sure I have the correct requirements:

  • pytorch >= 1.12
  • python 3.7 - 3.10
  • cuda 10.2, 11.3, 11.6
  • linux (glibc only) / windows

sounds good?

@blefaudeux
Copy link
Contributor

Ok, I will see what I can do, just to make sure I have the correct requirements:

  • pytorch >= 1.12
  • python 3.7 - 3.10
  • cuda 10.2, 11.3, 11.6
  • linux (glibc only) / windows

sounds good?

@fmassa and @danthe3rd would know more about the current targets, but from a distance this looks great ! I would possibly simplify a little to begin with if issues arise, for instance cuda 10.2 is pretty old by now and probably ok not to support it if that's too much work. Your call in any case but this would be an awesome contribution

@C43H66N12O12S2
Copy link

I believe you could extract a single .whl from conda packages using conda-press.

@zmrubin
Copy link

zmrubin commented Nov 5, 2022

Just wanted to post a quick thanks, as i also found this helpful

@brucethemoose
Copy link

I don't think this issues is solved? We may have python wheels now, but apparently not one for my environment, hence pip tries to build and fails in the same place.

pip install git+https://github.com/facebookresearch/xformers.git@v0.0.13#egg=xformers as mentioned might work... if I can figure out the compilation errors.

@danthe3rd
Copy link
Contributor

Yes indeed. We now have builds for pip, but they need to be downloaded manually from the latest Github action run.
Next step is to get them uploaded - tracking task is #533

@brucethemoose
Copy link

brucethemoose commented Dec 6, 2022

In the meantime (for anyone else running into this thread from search) building the latest git commit works for me, but only if I force compilation with gcc 11 instead of 12.

@deJQK
Copy link

deJQK commented Jun 15, 2023

In the meantime (for anyone else running into this thread from search) building the latest git commit works for me, but only if I force compilation with gcc 11 instead of 12.

Hi @brucethemoose , I meet a similar issue when trying to install xformer with pip. It says something like

error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

I wonder how to compile the git commit with gcc11? Could you please share the command and steps for it? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.