Skip to content

Commit

Permalink
Move environment-mac.yaml to Python 3.9 and patch dream.py for Macs.
Browse files Browse the repository at this point in the history
I'm using stable-diffusion on a 2022 Macbook M2 Air with 24 GB unified memory.
I see this taking about 2.0s/it.

I've moved many deps from pip to conda-forge, to take advantage of the
precompiled binaries. Some notes for Mac users, since I've seen a lot of
confusion about this:

One doesn't need the `apple` channel to run this on a Mac-- that's only
used by `tensorflow-deps`, required for running tensorflow-metal. For
that, I have an example environment.yml here:

https://developer.apple.com/forums/thread/711792?answerId=723276022#723276022

However, the `CONDA_ENV=osx-arm64` environment variable *is* needed to
ensure that you do not run any Intel-specific packages such as `mkl`,
which will fail with [cryptic errors](CompVis/stable-diffusion#25 (comment))
on the ARM architecture and cause the environment to break.

I've also added a comment in the env file about 3.10 not working yet.
When it becomes possible to update, those commands run on an osx-arm64
machine should work to determine the new version set.

Here's what a successful run of dream.py should look like:

```
$ python scripts/dream.py --full_precision                                                                                                           SIGABRT(6) ↵  08:42:59
* Initializing, be patient...

Loading model from models/ldm/stable-diffusion-v1/model.ckpt
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Using slower but more accurate full-precision math (--full_precision)
>> Setting Sampler to k_lms
model loaded in 6.12s

* Initialization done! Awaiting your command (-h for help, 'q' to quit)
dream> "an astronaut riding a horse"
Generating:   0%|                                                                                                                                                                         | 0/1 [00:00<?, ?it/s]/Users/corajr/Documents/lstein/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1662016319283/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  placeholder_idx = torch.where(
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:37<00:00,  1.95s/it]
Generating: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:38<00:00, 98.55s/it]
Usage stats:
   1 image(s) generated in 98.60s
   Max VRAM used for this generation: 0.00G
Outputs:
outputs/img-samples/000001.1525943180.png: "an astronaut riding a horse" -s50 -W512 -H512 -C7.5 -Ak_lms -F -S1525943180
```
  • Loading branch information
corajr committed Sep 1, 2022
1 parent 7011960 commit 9156597
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 55 deletions.
51 changes: 20 additions & 31 deletions README-Mac-MPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,7 @@ issue](https://github.com/CompVis/stable-diffusion/issues/25), and generally on

You have to have macOS 12.3 Monterey or later. Anything earlier than that won't work.

BTW, I haven't tested any of this on Intel Macs but I have read that one person
got it to work.
Tested on a 2022 Macbook M2 Air with 10-core gpu 24 GB unified memory.

How to:

Expand All @@ -22,17 +21,16 @@ git clone https://github.com/lstein/stable-diffusion.git
cd stable-diffusion
mkdir -p models/ldm/stable-diffusion-v1/
ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt
PATH_TO_CKPT="$HOME/Documents/stable-diffusion-v-1-4-original" # or wherever yours is.
ln -s "$PATH_TO_CKPT/sd-v1-4.ckpt" models/ldm/stable-diffusion-v1/model.ckpt
conda env create -f environment-mac.yaml
CONDA_SUBDIR=osx-arm64 conda env create -f environment-mac.yaml
conda activate ldm
python scripts/preload_models.py
python scripts/orig_scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
python scripts/dream.py --full_precision # half-precision requires autocast and won't work
```

We have not gotten lstein's dream.py to work yet.

After you follow all the instructions and run txt2img.py you might get several errors. Here's the errors I've seen and found solutions for.

### Is it slow?
Expand Down Expand Up @@ -94,10 +92,6 @@ get quick feedback.

python ./scripts/txt2img.py --prompt "ocean" --ddim_steps 5 --n_samples 1 --n_iter 1

### MAC: torch._C' has no attribute '_cuda_resetPeakMemoryStats' #234

We haven't fixed gotten dream.py to work on Mac yet.

### OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'...

python scripts/preload_models.py
Expand All @@ -108,7 +102,7 @@ Example error.

```
...
NotImplementedError: The operator 'aten::index.Tensor' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on [https://github.com/pytorch/pytorch/issues/77764](https://github.com/pytorch/pytorch/issues/77764). As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
NotImplementedError: The operator 'aten::_index_put_impl_' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on [https://github.com/pytorch/pytorch/issues/77764](https://github.com/pytorch/pytorch/issues/77764). As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
```

The lstein branch includes this fix in [environment-mac.yaml](https://github.com/lstein/stable-diffusion/blob/main/environment-mac.yaml).
Expand Down Expand Up @@ -137,27 +131,18 @@ still working on it.

OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.

There are several things you can do. First, you could use something
besides Anaconda like miniforge. I read a lot of things online telling
people to use something else, but I am stuck with Anaconda for other
reasons.

Or you can try this.

export KMP_DUPLICATE_LIB_OK=True

Or this (which takes forever on my computer and didn't work anyway).
You are likely using an Intel package by mistake. Be sure to run conda with
the environment variable `CONDA_SUBDIR=osx-arm64`, like so:

conda install nomkl
`CONDA_SUBDIR=osx-arm64 conda install ...`

This error happens with Anaconda on Macs, and
[nomkl](https://stackoverflow.com/questions/66224879/what-is-the-nomkl-python-package-used-for)
is supposed to fix the issue (it isn't a module but a fix of some
sort). [There's more
suggestions](https://stackoverflow.com/questions/53014306/error-15-initializing-libiomp5-dylib-but-found-libiomp5-dylib-already-initial),
like uninstalling tensorflow and reinstalling. I haven't tried them.
This error happens with Anaconda on Macs when the Intel-only `mkl` is pulled in by
a dependency. [nomkl](https://stackoverflow.com/questions/66224879/what-is-the-nomkl-python-package-used-for)
is a metapackage designed to prevent this, by making it impossible to install
`mkl`, but if your environment is already broken it may not work.

Since I switched to miniforge I haven't seen the error.
Do *not* use `os.environ['KMP_DUPLICATE_LIB_OK']='True'` or equivalents as this
masks the underlying issue of using Intel packages.

### Not enough memory.

Expand Down Expand Up @@ -226,4 +211,8 @@ What? Intel? On an Apple Silicon?
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.

This was actually the issue that I couldn't solve until I switched to miniforge.
This is due to the Intel `mkl` package getting picked up when you try to install
something that depends on it-- Rosetta can translate some Intel instructions but
not the specialized ones here. To avoid this, make sure to use the environment
variable `CONDA_SUBDIR=osx-arm64`, which restricts the Conda environment to only
use ARM packages, and use `nomkl` as described above.
68 changes: 46 additions & 22 deletions environment-mac.yaml
Original file line number Diff line number Diff line change
@@ -1,33 +1,57 @@
name: ldm
channels:
- apple
- conda-forge
- pytorch-nightly
- defaults
- conda-forge
dependencies:
- python=3.10.4
- pip=22.1.2
- python==3.9.13
- pip==22.2.2

# pytorch-nightly, left unpinned
- pytorch
- torchmetrics
- torchvision
- numpy=1.23.1

# I suggest to keep the other deps sorted for convenience.
# If you wish to upgrade to 3.10, try to run this:
#
# ```shell
# CONDA_CMD=conda
# sed -E 's/python==3.9.13/python==3.10.5/;s/ldm/ldm-3.10/;21,99s/- ([^=]+)==.+/- \1/' environment-mac.yaml > /tmp/environment-mac-updated.yml
# CONDA_SUBDIR=osx-arm64 $CONDA_CMD env create -f /tmp/environment-mac-updated.yml && $CONDA_CMD list -n ldm-3.10 | awk ' {print " - " $1 "==" $2;} '
# ```
#
# Unfortunately, as of 2022-08-31, this fails at the pip stage.
- albumentations==1.2.1
- coloredlogs==15.0.1
- einops==0.4.1
- grpcio==1.46.4
- humanfriendly
- imageio-ffmpeg==0.4.7
- imageio==2.21.2
- imgaug==0.4.0
- kornia==0.6.7
- mpmath==1.2.1
- nomkl
- numpy==1.23.2
- omegaconf==2.1.1
- onnx==1.12.0
- onnxruntime==1.12.1
- opencv==4.6.0
- pudb==2022.1
- pytorch-lightning==1.6.5
- scipy==1.9.1
- streamlit==1.12.2
- sympy==1.10.1
- tensorboard==2.9.0
- transformers==4.21.2
- pip:
- albumentations==0.4.6
- opencv-python==4.6.0.66
- pudb==2019.2
- imageio==2.9.0
- imageio-ffmpeg==0.4.2
- pytorch-lightning==1.4.2
- omegaconf==2.1.1
- test-tube>=0.7.5
- streamlit==1.12.0
- pillow==9.2.0
- einops==0.3.0
- torch-fidelity==0.3.0
- transformers==4.19.2
- torchmetrics==0.6.0
- kornia==0.6.0
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- invisible-watermark
- test-tube
- tokenizers
- torch-fidelity
- -e git+https://github.com/huggingface/diffusers.git@v0.2.4#egg=diffusers
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- -e git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
- -e .
variables:
Expand Down
5 changes: 3 additions & 2 deletions ldm/simplet2i.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,14 +272,15 @@ def process_image(image,seed):
if not(width == self.width and height == self.height):
width, height, _ = self._resolution_check(width, height, log=True)

scope = autocast if self.precision == 'autocast' else nullcontext
scope = autocast if self.precision == 'autocast' and torch.cuda.is_available() else nullcontext

if sampler_name and (sampler_name != self.sampler_name):
self.sampler_name = sampler_name
self._set_sampler()

tic = time.time()
torch.cuda.torch.cuda.reset_peak_memory_stats()
if torch.cuda.is_available():
torch.cuda.torch.cuda.reset_peak_memory_stats()
results = list()

try:
Expand Down

3 comments on commit 9156597

@magnusviri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corajr why did you you move from python 3.10.4 to 3.9.13? It doesn't work on Mac anymore. See #302 (comment)

@corajr
Copy link
Contributor Author

@corajr corajr commented on 9156597 Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magnusviri conda-forge has many of the desired packages which were being installed via pip previously, but the 3.10 environment with transformers in it is not solvable due to the absence of the tokenizers lib on 3.10. Hence I moved to 3.9 where all deps required were available in binary form, speeding up the initial installation and removing the requirement to have Rust installed.

However, with the move to include Birch-san's k-diffusion lib with 3.10-specific syntax, the merged environment-mac.yaml file stopped working. Hence my new PR to fix the environment by switching back to 3.10.

@magnusviri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corajr Can you add what you know about 3.9 vs 3.10 in this discussion?

Please sign in to comment.