Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating a layer with Docker/docker-compose #633

Merged
merged 27 commits into from
Apr 7, 2023
Merged

creating a layer with Docker/docker-compose #633

merged 27 commits into from
Apr 7, 2023

Conversation

loeken
Copy link
Contributor

@loeken loeken commented Mar 29, 2023

cleaned up version from #547

@oobabooga
Copy link
Owner

  1. Do you know what the Ubuntu equivalents of this are?

yay -S docker docker-compose buildkit nvidia-container-runtime
sudo systemctl restart docker # required by nvidia-container-runtime

  1. Have you tested the docker image with 4-bit and 8-bit models? It seems like the current bitsandbytes 8-bit setup is not included: undefined symbol: cget_col_row_stats / 8-bit not working / libsbitsandbytes_cpu.so not found  #400 (comment)

@loeken
Copy link
Contributor Author

loeken commented Mar 29, 2023

1.) as far as i know only docker and docker-compose are in the ubuntu/debian repos. buildkit should be part of docker since version 18.09 ( it is a separate package in manjaro ).

2.) Due to ram limits i so far only tested 4bit I've followed that link and noticed that im seeing the same errors. I ve now updated the Dockerfile to use the nvidia/cuda image in the second stage too, which provides the required files.
while it crashes with OOM for me with 8bit i dont see the errors so expect that this would solve the problems

logs from execution of 4bit/8bit: https://gist.github.com/loeken/a9c2141adee41a6181c9bbde509a75fe

@oobabooga
Copy link
Owner

I need better instructions on how to run this before I can merge. The average Linux user uses Ubuntu, so the README must contain all the necessary steps to get this running on Ubuntu.

@loeken
Copy link
Contributor Author

loeken commented Mar 31, 2023

alright i ll test on ubuntu 22.04 and get back to you with instructions

@loeken
Copy link
Contributor Author

loeken commented Apr 1, 2023

@oobabooga added a docs/README_docker.md with instructions for ubuntu

@deece
Copy link
Contributor

deece commented Apr 1, 2023

THe following lines in the Dockerfile would clobber any work-in-progress that would be intended for test:

ARG WEBUI_SHA=HEAD
RUN git reset --hard ${WEBUI_SHA}

@loeken
Copy link
Contributor Author

loeken commented Apr 1, 2023

valid point @deece the branches/shas have changed around a lot lately. I've removed setting values in the Dockerfile and since it switched from being a sha - to now being the cuda branch - i ve changed them to _VERSION.

I ll leave the WEBUI_VERSION on head for now, but I think this should be changed to a tag/release. Right now everybody installs from the last version of this repo

@oobabooga if you are fine with switching to a tag release flow i can provide a github action to build a docker image

@deece
Copy link
Contributor

deece commented Apr 1, 2023

The cuda branch of GPTQ introduces some breaking changes that need to be addressed. In the meantime, this is the commit before the breaking change: GPTQ_SHA=608f3ba71e40596c75f8864d73506eaf57323c6e

@deece
Copy link
Contributor

deece commented Apr 1, 2023

I'd suggest that the the git reset line be changed to something like:
RUN test -n "${WEBUI_VERSION}" && git reset --hard "${WEBUI_VERSION}"

This would allow uncommitted changes to be tested in the Docker environment.

@deece
Copy link
Contributor

deece commented Apr 1, 2023

The args in docker-compose.yml and Dockerfile don't align, eg, GPTQ_SHA vs GPTQ_VERSION

@loeken
Copy link
Contributor Author

loeken commented Apr 1, 2023

@deece as for the breaking change what are you refering to? i can build for both cuda and 608f3ba71e40596c75f8864d73506eaf57323c6e for GPTQ

@rklasen
Copy link

rklasen commented Apr 1, 2023

The docker image builds without problems. I've put llama-7b-4bit.safetensors in the models directory, but I'm still getting errors about a private HF repo:

text-generation-webui-text-generation-webui-1  | /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
text-generation-webui-text-generation-webui-1  |   warn(msg)
text-generation-webui-text-generation-webui-1  | /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
text-generation-webui-text-generation-webui-1  |   warn(msg)
text-generation-webui-text-generation-webui-1  | CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
text-generation-webui-text-generation-webui-1  | CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
text-generation-webui-text-generation-webui-1  | CUDA SETUP: Highest compute capability among GPUs detected: 8.6
text-generation-webui-text-generation-webui-1  | CUDA SETUP: Detected CUDA version 118
text-generation-webui-text-generation-webui-1  | CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
text-generation-webui-text-generation-webui-1  | Loading llama-7b-4bit...
text-generation-webui-text-generation-webui-1  | Found models/llama-7b-4bit.safetensors
text-generation-webui-text-generation-webui-1  | Traceback (most recent call last):
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
text-generation-webui-text-generation-webui-1  |     response.raise_for_status()
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
text-generation-webui-text-generation-webui-1  |     raise HTTPError(http_error_msg, response=self)
text-generation-webui-text-generation-webui-1  | requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/models/llama-7b-4bit/resolve/main/config.json
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | The above exception was the direct cause of the following exception:
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | Traceback (most recent call last):
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 409, in cached_file
text-generation-webui-text-generation-webui-1  |     resolved_file = hf_hub_download(
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
text-generation-webui-text-generation-webui-1  |     return fn(*args, **kwargs)
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1160, in hf_hub_download
text-generation-webui-text-generation-webui-1  |     metadata = get_hf_file_metadata(
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
text-generation-webui-text-generation-webui-1  |     return fn(*args, **kwargs)
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1501, in get_hf_file_metadata
text-generation-webui-text-generation-webui-1  |     hf_raise_for_status(r)
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
text-generation-webui-text-generation-webui-1  |     raise RepositoryNotFoundError(message, response) from e
text-generation-webui-text-generation-webui-1  | huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64283e05-294e106c21f09d494bf5bb13)
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | Repository Not Found for url: https://huggingface.co/models/llama-7b-4bit/resolve/main/config.json.
text-generation-webui-text-generation-webui-1  | Please make sure you specified the correct `repo_id` and `repo_type`.
text-generation-webui-text-generation-webui-1  | If you are trying to access a private or gated repo, make sure you are authenticated.
text-generation-webui-text-generation-webui-1  | Invalid username or password.
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | During handling of the above exception, another exception occurred:
text-generation-webui-text-generation-webui-1  | 
text-generation-webui-text-generation-webui-1  | Traceback (most recent call last):
text-generation-webui-text-generation-webui-1  |   File "/app/server.py", line 275, in <module>
text-generation-webui-text-generation-webui-1  |     shared.model, shared.tokenizer = load_model(shared.model_name)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/models.py", line 102, in load_model
text-generation-webui-text-generation-webui-1  |     model = load_quantized(model_name)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 114, in load_quantized
text-generation-webui-text-generation-webui-1  |     model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
text-generation-webui-text-generation-webui-1  |   File "/app/modules/GPTQ_loader.py", line 19, in _load_quant
text-generation-webui-text-generation-webui-1  |     config = AutoConfig.from_pretrained(model)
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 908, in from_pretrained
text-generation-webui-text-generation-webui-1  |     config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 573, in get_config_dict
text-generation-webui-text-generation-webui-1  |     config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 628, in _get_config_dict
text-generation-webui-text-generation-webui-1  |     resolved_config_file = cached_file(
text-generation-webui-text-generation-webui-1  |   File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 424, in cached_file
text-generation-webui-text-generation-webui-1  |     raise EnvironmentError(
text-generation-webui-text-generation-webui-1  | OSError: models/llama-7b-4bit is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
text-generation-webui-text-generation-webui-1  | If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
text-generation-webui-text-generation-webui-1 exited with code 1

I thought once the models are present, everything should be offline?

The config.json and tokenizer_config.json are in the models dir as well.

@loeken
Copy link
Contributor Author

loeken commented Apr 1, 2023

@deece tested only the build which passed earlier, I now started it up and im seeing errors - the gptq hash you provided does solve it. probably better to have fixed versions anyways.

@loeken
Copy link
Contributor Author

loeken commented Apr 1, 2023

@rklasen you are using the "updated" models?

#530 (comment)
#530 (comment)

compare checksums in those comments with your files

@deece
Copy link
Contributor

deece commented Apr 2, 2023

Apologies, the WEBUI line I provided earlier fails the build if WEBUI_VERSION is not set, here's a version that will always exit with 0:

RUN test -n "${WEBUI_VERSION}" && git reset --hard ${WEBUI_VERSION} || echo "Using provided webui source"

@rklasen
Copy link

rklasen commented Apr 2, 2023

@loeken thanks for the hashes. I've got:

ed8ec9c9f0ebb83210157ad0e3c5148760a4e9fd2acfb02cf00f8f2054d2743b  models/llama-7b-4bit-128g.safetensors
09841a1c4895e1da3b05c1bdbfb8271c6d43812661e4348c862ff2ab1e6ff5b3  models/llama-7b-4bit.safetensors

So at least those seem to be correct. I've just checked out the lastest changes of this PR after @deece 's latest commit, still the same error.

What's confusing me is that the model files are found at least:

text-generation-webui-text-generation-webui-1  | Loading llama-7b-4bit...
text-generation-webui-text-generation-webui-1  | Found models/llama-7b-4bit.safetensors

But there still seems to be some reference to https://huggingface.co/models/llama-7b-4bit/resolve/main/config.json. That URL exists, but requires login. Do I need to make a HF account?

@deece
Copy link
Contributor

deece commented Apr 2, 2023

@loeken thanks for the hashes. I've got:

...

But there still seems to be some reference to https://huggingface.co/models/llama-7b-4bit/resolve/main/config.json. That URL exists, but requires login. Do I need to make a HF account?

You have to have the following files too:

$ find models
models
models/llama-30b
models/llama-30b/config.json
models/llama-30b/pytorch_model.bin.index.json
models/llama-30b/generation_config.json
models/llama-30b-4bit.pt
models/place-your-models-here.txt

I'd actually prefer it if the 4 bit checkpoints were alongside the the matching tokenizer files

@garrettsutula
Copy link

Works great for me, thanks for doing this! IMO should be merged!

@deece
Copy link
Contributor

deece commented Apr 6, 2023

The Dockerfile provided in #174 only provides an interactive environment to play with, it doesn't really provide a service that could be started at boot (for example). Models would be stored within the Docker image, rather than volumes, so the many-gb of models would be lost if the image was rebuilt.

It's also missing many of the pre-reqs and workarounds required for various features.

As someone with some Docker experience, I've reviewed this and I'm happy to +1 this PR. I don't think there is anything here that cannot be justified.

@deece
Copy link
Contributor

deece commented Apr 6, 2023

@loeken To get past the sentencepiece issue, I had to add this:

RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && pip3 install sentencepiece protobuf==3.20.1

@loeken
Copy link
Contributor Author

loeken commented Apr 6, 2023

@oobabooga I do have a theory on your error, what docker-compose --version are you on?

from the changelog of docker-compose it seems it was added in version 1.28.0 can you confirm you run >= 1.28.0 ?

1.28.0(https://docs.docker.com/compose/release-notes/#1280)
(2021-01-20)

Features
Added support for NVIDIA GPUs through device requests.

so your version of docker-compose does not know how to handle this part of your docker-compose:

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities: [gpu]

@loeken
Copy link
Contributor Author

loeken commented Apr 6, 2023

@deece yeah it's not at a pinned version https://github.com/oobabooga/text-generation-webui/blob/main/requirements.txt shouldnt "hot fix" this in this PR but rather have a seperate PR pinning all dependencies to specific versions in that file so its not just solved for docker.

@loeken
Copy link
Contributor Author

loeken commented Apr 6, 2023

if you installed the docker-compose-plugin you might have ended up with a newer version ( you can try a docker compose up ) without the "-"

tested in focal based ubuntu docker container with your repo:
root@d4eb8c9bf89b:/# docker compose version
Docker Compose version v2.17.2
root@d4eb8c9bf89b:/# docker-compose --version
docker-compose version 1.25.0, build unknown

@oobabooga
Copy link
Owner

@loeken indeed, I have an older version

docker-compose version 1.25.0, build unknown

Let me see if I can find a way to install the newer version

@loeken
Copy link
Contributor Author

loeken commented Apr 6, 2023

try the "docker compose up" first you might already have it installed in a new version via the plugin

@oobabooga
Copy link
Owner

It was easier to just install the plugin manually with

DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
mkdir -p $DOCKER_CONFIG/cli-plugins
curl -SL https://github.com/docker/compose/releases/download/v2.17.2/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose
chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
export PATH="$HOME/.docker/cli-plugins:$PATH"

Now I'm trying to build the image.

@oobabooga
Copy link
Owner

It worked perfectly. This was pretty neat =)

@loeken Some questions/comments before I merge:

  1. Is the .dockerignore necessary? Why are these files/folders in particular excluded from being copied, and not others like characters?
/loras
/models
.env
Dockerfile
  1. I'll merge the documentation with the PR for the contribution to be accounted and afterwards move it to the wiki.
  2. For the one-click installer, I have been using my fork of gptq-for-llama, and I have reproduced this here and removed the gptq commit parameter.

For reference, these parameters were useful to me to launch an interactive shell inside the image:

docker-compose up -d
docker-compose exec text-generation-webui bash
docker-compose kill text-generation-webui

@deece
Copy link
Contributor

deece commented Apr 7, 2023

@oobabooga Those directories are brought in as volumes instead (so they don't take extra space). The 2 files aren't used within the image, and updating them will invalidate the image unnescessarily.

@loeken
Copy link
Contributor Author

loeken commented Apr 7, 2023

@loeken
Copy link
Contributor Author

loeken commented Apr 7, 2023

you can also use

docker exec -it text-generation-webui_text-generation-webui_1 bash

to get an interactive terminal after you started up via docker-compose up

@oobabooga
Copy link
Owner

I see @deece @loeken. Wouldn't it make sense to add all of the mounted folders to the .dockerignore?

      - ./characters:/app/characters
      - ./extensions:/app/extensions
      - ./loras:/app/loras
      - ./models:/app/models
      - ./presets:/app/presets
      - ./prompts:/app/prompts
      - ./softprompts:/app/softprompts
      - ./training:/app/training

@deece
Copy link
Contributor

deece commented Apr 7, 2023

@loeken here's a patch that reorders the COPYs so that pre-requisites aren't invalidated from the cache just because a source file has been updated.

diff --git a/Dockerfile b/Dockerfile
index 334f5a1..44e70c0 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -33,8 +33,7 @@ RUN apt-get update && \
     rm -rf /var/lib/apt/lists/*
 
 RUN --mount=type=cache,target=/root/.cache/pip pip3 install virtualenv
-
-COPY . /app/
+RUN mkdir /app
 
 WORKDIR /app
 
@@ -44,21 +43,29 @@ RUN test -n "${WEBUI_VERSION}" && git reset --hard ${WEBUI_VERSION} || echo "Usi
 RUN virtualenv /app/venv
 RUN . /app/venv/bin/activate && \
     pip3 install --upgrade pip setuptools && \
-    pip3 install torch torchvision torchaudio && \
-    pip3 install -r requirements.txt
+    pip3 install torch torchvision torchaudio
 
 COPY --from=builder /build /app/repositories/GPTQ-for-LLaMa
 RUN . /app/venv/bin/activate && \
     pip3 install /app/repositories/GPTQ-for-LLaMa/*.whl
 
-ENV CLI_ARGS=""
-
+COPY extensions/api/requirements.txt /app/extensions/api/requirements.txt
+COPY extensions/elevenlabs_tts/requirements.txt /app/extensions/elevenlabs_tts/requirements.txt
+COPY extensions/google_translate/requirements.txt /app/extensions/google_translate/requirements.txt
+COPY extensions/silero_tts/requirements.txt /app/extensions/silero_tts/requirements.txt
+COPY extensions/whisper_stt/requirements.txt /app/extensions/whisper_stt/requirements.txt
 RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/api && pip3 install -r requirements.txt
 RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/elevenlabs_tts && pip3 install -r requirements.txt
 RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/google_translate && pip3 install -r requirements.txt
 RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/silero_tts && pip3 install -r requirements.txt
 RUN --mount=type=cache,target=/root/.cache/pip . /app/venv/bin/activate && cd extensions/whisper_stt && pip3 install -r requirements.txt
 
+COPY requirements.txt /app/requirements.txt
+RUN . /app/venv/bin/activate && \
+    pip3 install -r requirements.txt
+
 RUN cp /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /app/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
 
+COPY . /app/
+ENV CLI_ARGS=""
 CMD . /app/venv/bin/activate && python3 server.py ${CLI_ARGS}

@loeken
Copy link
Contributor Author

loeken commented Apr 7, 2023

if you want to be able to edit the files in these folders while the container is running then i would add them to the dockerignore ( but also create similar volume: entries in the docker-compose ). I basically excluded the lora/models folder because the files in there are big and i wanted to exclude them from the image itself, to reduce size/build time

@loeken
Copy link
Contributor Author

loeken commented Apr 7, 2023

i havent edited files in the other folders a lot, but might be a good idea if others do

@loeken
Copy link
Contributor Author

loeken commented Apr 7, 2023

want me to add those in as volumes and add to dockerignore too?

@oobabooga
Copy link
Owner

if you want to be able to edit the files in these folders while the container is running then i would add them to the dockerignore ( but also create similar volume: entries in the docker-compose ). I basically excluded the lora/models folder because the files in there are big and i wanted to exclude them from the image itself, to reduce size/build time

I don't get it. Aren't those folders already mounted when the image is started, so that I can for instance drop a new yaml character in my characters folder and it will appear inside the image?

@loeken
Copy link
Contributor Author

loeken commented Apr 7, 2023

oh sorry sleepy brain, yes we already have those folders mapped as volumes in the docker-compose. ignore my last statements on this. adding those other files to the .dockerignore wont change much. we only need the entries for the folders containing large files so they dont get added to the image

@oobabooga oobabooga merged commit 08b9d1b into oobabooga:main Apr 7, 2023
@oobabooga
Copy link
Owner

Thanks for the PR, @loeken! And for the patience in convincing me of the relevance of docker compose. Also thanks @deece for the feedback.

I will work on improving the documentation in the coming days. If you want to make any further chances to the docker compose files, feel free to submit a new PR.

Just one more question: is it possible to get this running on WSL?

@deece
Copy link
Contributor

deece commented Apr 7, 2023

@oobabooga I don't see any reason why it shouldn't work on WSL

@loeken
Copy link
Contributor Author

loeken commented Apr 7, 2023

@oobabooga yeah does work on windows.

#874 i ve also raised a separate issue on introducing dependency pinning, which is sort of related to this. pinning all versions would improve stability/experience for the users here.

@loeken loeken deleted the dockerize branch April 7, 2023 16:52
Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants