-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for MPS support on Apple Silicon #393
Conversation
Does it not work if you install pytorch with
as recommended here? https://pytorch.org/get-started/locally/ |
I get this error with your conda instructions:
|
Thanks for the information and the PR. That was really helpful! |
I will provide my full instructions in a sec. |
I have Python 3.10.10 installed.
note that you will get this error:
ignore it and run:
|
I have linked this thread in the README for MacOS users to refer to. |
commit 0cbe2dd Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat Mar 18 12:24:54 2023 -0300 Update README.md commit 36ac7be Merge: d2a7fac 705f513 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat Mar 18 11:57:10 2023 -0300 Merge pull request oobabooga#407 from ThisIsPIRI/gitignore Add loras to .gitignore commit d2a7fac Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat Mar 18 11:56:04 2023 -0300 Use pip instead of conda for pytorch commit 705f513 Author: ThisIsPIRI <thisispiri@gmail.com> Date: Sat Mar 18 23:33:24 2023 +0900 Add loras to .gitignore commit a0b1a30 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat Mar 18 11:23:56 2023 -0300 Specify torchvision/torchaudio versions commit c753261 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat Mar 18 10:55:57 2023 -0300 Disable stop_at_newline by default commit 7c945cf Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat Mar 18 10:55:24 2023 -0300 Don't include PeftModel every time commit 86b9900 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat Mar 18 10:27:52 2023 -0300 Remove rwkv dependency commit a163807 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Sat Mar 18 03:07:27 2023 -0300 Update README.md commit a7acfa4 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 22:57:46 2023 -0300 Update README.md commit bcd8afd Merge: dc35861 e26763a Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 22:57:28 2023 -0300 Merge pull request oobabooga#393 from WojtekKowaluk/mps_support Fix for MPS support on Apple Silicon commit e26763a Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 22:56:46 2023 -0300 Minor changes commit 7994b58 Author: Wojtek Kowaluk <wojtek@Wojteks-MacBook-Pro.local> Date: Sat Mar 18 02:27:26 2023 +0100 clean up duplicated code commit dc35861 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 21:05:17 2023 -0300 Update README.md commit 30939e2 Author: Wojtek Kowaluk <wojtek@Wojteks-MacBook-Pro.local> Date: Sat Mar 18 00:56:23 2023 +0100 add mps support on apple silicon commit 7d97da1 Author: Wojtek Kowaluk <wojtek@Wojteks-MacBook-Pro.local> Date: Sat Mar 18 00:17:05 2023 +0100 add venv paths to gitignore commit f2a5ca7 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 20:50:27 2023 -0300 Update README.md commit 8c8286b Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 20:49:40 2023 -0300 Update README.md commit 0c05e65 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 20:25:42 2023 -0300 Update README.md commit adc2003 Merge: 20f5b45 66e8d12 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 20:19:33 2023 -0300 Merge branch 'main' of github.com:oobabooga/text-generation-webui commit 20f5b45 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 20:19:04 2023 -0300 Add parameters reference oobabooga#386 oobabooga#331 commit 66e8d12 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 19:59:37 2023 -0300 Update README.md commit 9a87111 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 19:52:22 2023 -0300 Update README.md commit d4f38b6 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 18:57:48 2023 -0300 Update README.md commit ad7c829 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 18:55:01 2023 -0300 Update README.md commit 4426f94 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 18:51:07 2023 -0300 Update the installation instructions. Tldr use WSL commit 9256e93 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 17:45:28 2023 -0300 Add some LoRA params commit 9ed2c45 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 16:06:11 2023 -0300 Use markdown in the "HTML" tab commit f0b2645 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 13:07:17 2023 -0300 Add a comment commit 7da742e Merge: ebef4a5 02e1113 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 12:37:23 2023 -0300 Merge pull request oobabooga#207 from EliasVincent/stt-extension Extension: Whisper Speech-To-Text Input commit ebef4a5 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 11:58:45 2023 -0300 Update README commit cdfa787 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 11:53:28 2023 -0300 Update README commit 3bda907 Merge: 4c13067 614dad0 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 11:48:48 2023 -0300 Merge pull request oobabooga#366 from oobabooga/lora Add LoRA support commit 614dad0 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 11:43:11 2023 -0300 Remove unused import commit a717fd7 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 11:42:25 2023 -0300 Sort the imports commit 7d97287 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 11:41:12 2023 -0300 Update settings-template.json commit 29fe7b1 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 11:39:48 2023 -0300 Remove LoRA tab, move it into the Parameters menu commit 214dc68 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 11:24:52 2023 -0300 Several QoL changes related to LoRA commit 4c13067 Merge: ee164d1 53b6a66 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Fri Mar 17 09:47:57 2023 -0300 Merge pull request oobabooga#377 from askmyteapot/Fix-Multi-gpu-GPTQ-Llama-no-tokens Update GPTQ_Loader.py commit 53b6a66 Author: askmyteapot <62238146+askmyteapot@users.noreply.github.com> Date: Fri Mar 17 18:34:13 2023 +1000 Update GPTQ_Loader.py Correcting decoder layer for renamed class. commit 0cecfc6 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 21:35:53 2023 -0300 Add files commit 104293f Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 21:31:39 2023 -0300 Add LoRA support commit ee164d1 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 18:22:16 2023 -0300 Don't split the layers in 8-bit mode by default commit 0a2aa79 Merge: dd1c596 e085cb4 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 17:27:03 2023 -0300 Merge pull request oobabooga#358 from mayaeary/8bit-offload Add support for memory maps with --load-in-8bit commit e085cb4 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 13:34:23 2023 -0300 Small changes commit dd1c596 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 12:45:27 2023 -0300 Update README commit 38d7017 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 12:44:03 2023 -0300 Add all command-line flags to "Interface mode" commit 83cb20a Author: awoo <awoo@awoo> Date: Thu Mar 16 18:42:53 2023 +0300 Add support for --gpu-memory witn --load-in-8bit commit 23a5e88 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 11:16:17 2023 -0300 The LLaMA PR has been merged into transformers huggingface/transformers#21955 The tokenizer class has been changed from "LLaMATokenizer" to "LlamaTokenizer" It is necessary to edit this change in every tokenizer_config.json that you had for LLaMA so far. commit d54f3f4 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 10:19:00 2023 -0300 Add no-stream checkbox to the interface commit 1c37896 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 10:18:34 2023 -0300 Remove unused imports commit a577fb1 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Thu Mar 16 00:46:59 2023 -0300 Keep GALACTICA special tokens (oobabooga#300) commit 25a00ea Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 23:43:35 2023 -0300 Add "Experimental" warning commit 599d313 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 23:34:08 2023 -0300 Increase the reload timeout a bit commit 4d64a57 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 23:29:56 2023 -0300 Add Interface mode tab commit b501722 Merge: ffb8986 d3a280e Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 20:46:04 2023 -0300 Merge branch 'main' of github.com:oobabooga/text-generation-webui commit ffb8986 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 20:44:34 2023 -0300 Mini refactor commit d3a280e Merge: 445ebf0 0552ab2 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 20:22:08 2023 -0300 Merge pull request oobabooga#348 from mayaeary/feature/koboldai-api-share flask_cloudflared for shared tunnels commit 445ebf0 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 20:06:46 2023 -0300 Update README.md commit 0552ab2 Author: awoo <awoo@awoo> Date: Thu Mar 16 02:00:16 2023 +0300 flask_cloudflared for shared tunnels commit e9e76bb Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 19:42:29 2023 -0300 Delete WSL.md commit 09045e4 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 19:42:06 2023 -0300 Add WSL guide commit 9ff5033 Merge: 66256ac 055edc7 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 19:37:26 2023 -0300 Merge pull request oobabooga#345 from jfryton/main Guide for Windows Subsystem for Linux commit 66256ac Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 19:31:27 2023 -0300 Make the "no GPU has been detected" message more descriptive commit 055edc7 Author: jfryton <35437877+jfryton@users.noreply.github.com> Date: Wed Mar 15 18:21:14 2023 -0400 Update WSL.md commit 89883a3 Author: jfryton <35437877+jfryton@users.noreply.github.com> Date: Wed Mar 15 18:20:21 2023 -0400 Create WSL.md guide for setting up WSL Ubuntu Quick start guide for Windows Subsystem for Linux (Ubuntu), including port forwarding to enable local network webui access. commit 67d6247 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 18:56:26 2023 -0300 Further reorganize chat UI commit ab12a17 Merge: 6a1787a 3028112 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 18:31:39 2023 -0300 Merge pull request oobabooga#342 from mayaeary/koboldai-api Extension: KoboldAI api commit 3028112 Author: awoo <awoo@awoo> Date: Wed Mar 15 23:52:46 2023 +0300 KoboldAI api commit 6a1787a Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 16:55:40 2023 -0300 CSS fixes commit 3047ed8 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 16:41:38 2023 -0300 CSS fix commit 87b84d2 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 16:39:59 2023 -0300 CSS fix commit c1959c2 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 16:34:31 2023 -0300 Show/hide the extensions block using javascript commit 348596f Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 15:11:16 2023 -0300 Fix broken extensions commit c5f14fb Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 14:19:28 2023 -0300 Optimize the HTML generation speed commit bf812c4 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 14:05:35 2023 -0300 Minor fix commit 658849d Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 13:29:00 2023 -0300 Move a checkbutton commit 05ee323 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 13:26:32 2023 -0300 Rename a file commit 40c9e46 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 13:25:28 2023 -0300 Add file commit d30a140 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 13:24:54 2023 -0300 Further reorganize the UI commit ffc6cb3 Merge: cf2da86 3b62bd1 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 12:56:21 2023 -0300 Merge pull request oobabooga#325 from Ph0rk0z/fix-RWKV-Names Fix rwkv names commit cf2da86 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 12:51:13 2023 -0300 Prevent *Is typing* from disappearing instantly while streaming commit 4146ac4 Merge: 1413931 29b7c5a Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 12:47:41 2023 -0300 Merge pull request oobabooga#266 from HideLord/main Adding markdown support and slight refactoring. commit 29b7c5a Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 12:40:03 2023 -0300 Sort the requirements commit ec972b8 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 12:33:26 2023 -0300 Move all css/js into separate files commit 693b53d Merge: 63c5a13 1413931 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 12:08:56 2023 -0300 Merge branch 'main' into HideLord-main commit 1413931 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 12:01:32 2023 -0300 Add a header bar and redesign the interface (oobabooga#293) commit 9d6a625 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Wed Mar 15 11:04:30 2023 -0300 Add 'hallucinations' filter oobabooga#326 This breaks the API since a new parameter has been added. It should be a one-line fix. See api-example.py. commit 3b62bd1 Author: Forkoz <59298527+Ph0rk0z@users.noreply.github.com> Date: Tue Mar 14 21:23:39 2023 +0000 Remove PTH extension from RWKV When loading the current model was blank unless you typed it out. commit f0f325e Author: Forkoz <59298527+Ph0rk0z@users.noreply.github.com> Date: Tue Mar 14 21:21:47 2023 +0000 Remove Json from loading no more 20b tokenizer commit 128d18e Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 17:57:25 2023 -0300 Update README.md commit 1236c7f Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 17:56:15 2023 -0300 Update README.md commit b419dff Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 17:55:35 2023 -0300 Update README.md commit 72d207c Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 16:31:27 2023 -0300 Remove the chat API It is not implemented, has not been tested, and this is causing confusion. commit afc5339 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 16:04:17 2023 -0300 Remove "eval" statements from text generation functions commit 5c05223 Merge: b327554 87192e2 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 08:05:24 2023 -0300 Merge pull request oobabooga#295 from Zerogoki00/opt4-bit Add support for quantized OPT models commit 87192e2 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 08:02:21 2023 -0300 Update README commit 265ba38 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 07:56:31 2023 -0300 Rename a file, add deprecation warning for --load-in-4bit commit 3da73e4 Merge: 518e5c4 b327554 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 07:50:36 2023 -0300 Merge branch 'main' into Zerogoki00-opt4-bit commit b327554 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue Mar 14 00:18:13 2023 -0300 Update bug_report_template.yml commit 33b9a15 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 23:03:16 2023 -0300 Delete config.yml commit b5e0d3c Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 23:02:25 2023 -0300 Create config.yml commit 7f301fd Merge: d685332 02d4075 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 22:41:21 2023 -0300 Merge pull request oobabooga#305 from oobabooga/dependabot/pip/accelerate-0.17.1 Bump accelerate from 0.17.0 to 0.17.1 commit 02d4075 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:40:42 2023 +0000 Bump accelerate from 0.17.0 to 0.17.1 Bumps [accelerate](https://github.com/huggingface/accelerate) from 0.17.0 to 0.17.1. - [Release notes](https://github.com/huggingface/accelerate/releases) - [Commits](huggingface/accelerate@v0.17.0...v0.17.1) --- updated-dependencies: - dependency-name: accelerate dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> commit d685332 Merge: 481ef3c df83088 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 22:39:59 2023 -0300 Merge pull request oobabooga#307 from oobabooga/dependabot/pip/bitsandbytes-0.37.1 Bump bitsandbytes from 0.37.0 to 0.37.1 commit 481ef3c Merge: a0ef82c 715c3ec Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 22:39:22 2023 -0300 Merge pull request oobabooga#304 from oobabooga/dependabot/pip/rwkv-0.4.2 Bump rwkv from 0.3.1 to 0.4.2 commit df83088 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:36:18 2023 +0000 Bump bitsandbytes from 0.37.0 to 0.37.1 Bumps [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) from 0.37.0 to 0.37.1. - [Release notes](https://github.com/TimDettmers/bitsandbytes/releases) - [Changelog](https://github.com/TimDettmers/bitsandbytes/blob/main/CHANGELOG.md) - [Commits](https://github.com/TimDettmers/bitsandbytes/commits) --- updated-dependencies: - dependency-name: bitsandbytes dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> commit 715c3ec Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Mar 14 01:36:02 2023 +0000 Bump rwkv from 0.3.1 to 0.4.2 Bumps [rwkv](https://github.com/BlinkDL/ChatRWKV) from 0.3.1 to 0.4.2. - [Release notes](https://github.com/BlinkDL/ChatRWKV/releases) - [Commits](https://github.com/BlinkDL/ChatRWKV/commits) --- updated-dependencies: - dependency-name: rwkv dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> commit a0ef82c Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 22:35:28 2023 -0300 Activate dependabot commit 3fb8196 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 22:28:00 2023 -0300 Implement "*Is recording a voice message...*" for TTS oobabooga#303 commit 0dab2c5 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 22:18:03 2023 -0300 Update feature_request.md commit 79e519c Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 20:03:08 2023 -0300 Update stale.yml commit 1571458 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 19:39:21 2023 -0300 Update stale.yml commit bad0b0a Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 19:20:18 2023 -0300 Update stale.yml commit c805843 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 19:09:06 2023 -0300 Update stale.yml commit 60cc7d3 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:53:11 2023 -0300 Update stale.yml commit 7c17613 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:47:31 2023 -0300 Update and rename .github/workflow/stale.yml to .github/workflows/stale.yml commit 47c941c Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:37:35 2023 -0300 Create stale.yml commit 511b136 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:29:38 2023 -0300 Update bug_report_template.yml commit d6763a6 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:27:24 2023 -0300 Update feature_request.md commit c6ecb35 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:26:28 2023 -0300 Update feature_request.md commit 6846427 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:19:07 2023 -0300 Update feature_request.md commit bcfb7d7 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:16:18 2023 -0300 Update bug_report_template.yml commit ed30bd3 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:14:54 2023 -0300 Update bug_report_template.yml commit aee3b53 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:14:31 2023 -0300 Update bug_report_template.yml commit 7dbc071 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:09:58 2023 -0300 Delete bug_report.md commit 69d4b81 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:09:37 2023 -0300 Create bug_report_template.yml commit 0a75584 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 18:07:08 2023 -0300 Create issue templates commit 02e1113 Author: EliasVincent <riesyeti@outlook.de> Date: Mon Mar 13 21:41:19 2023 +0100 add auto-transcribe option commit 518e5c4 Author: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Mon Mar 13 16:45:08 2023 -0300 Some minor fixes to the GPTQ loader commit 8778b75 Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 22:11:40 2023 +0300 use updated load_quantized commit a6a6522 Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 22:11:32 2023 +0300 determine model type from model name commit b6c5c57 Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 22:11:08 2023 +0300 remove default value from argument commit 63c5a13 Merge: 683556f 7ab45fb Author: Alexander Hristov Hristov <polimonom@gmail.com> Date: Mon Mar 13 19:50:08 2023 +0200 Merge branch 'main' into main commit e1c952c Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 20:22:38 2023 +0300 make argument non case-sensitive commit b746250 Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 20:18:56 2023 +0300 Update README commit 3c9afd5 Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 20:14:40 2023 +0300 rename method commit 1b99ed6 Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 20:01:34 2023 +0300 add argument --gptq-model-type and remove duplicate arguments commit edbc611 Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 20:00:38 2023 +0300 use new quant loader commit 345b6de Author: Ayanami Rei <wennadocta@protonmail.com> Date: Mon Mar 13 19:59:57 2023 +0300 refactor quant models loader and add support of OPT commit 48aa528 Author: EliasVincent <riesyeti@outlook.de> Date: Sun Mar 12 21:03:07 2023 +0100 use Gradio microphone input instead commit 683556f Author: HideLord <polimonom@gmail.com> Date: Sun Mar 12 21:34:09 2023 +0200 Adding markdown support and slight refactoring. commit 3b41459 Merge: 1c0bda3 3375eae Author: Elias Vincent Simon <riesyeti@outlook.de> Date: Sun Mar 12 19:19:43 2023 +0100 Merge branch 'oobabooga:main' into stt-extension commit 1c0bda3 Author: EliasVincent <riesyeti@outlook.de> Date: Fri Mar 10 11:47:16 2023 +0100 added installation instructions commit a24fa78 Author: EliasVincent <riesyeti@outlook.de> Date: Thu Mar 9 21:18:46 2023 +0100 tweaked Whisper parameters commit d5efc06 Merge: 00359ba 3341447 Author: Elias Vincent Simon <riesyeti@outlook.de> Date: Thu Mar 9 21:05:34 2023 +0100 Merge branch 'oobabooga:main' into stt-extension commit 00359ba Author: EliasVincent <riesyeti@outlook.de> Date: Thu Mar 9 21:03:49 2023 +0100 interactive preview window commit 7a03d0b Author: EliasVincent <riesyeti@outlook.de> Date: Thu Mar 9 20:33:00 2023 +0100 cleanup commit 4c72e43 Author: EliasVincent <riesyeti@outlook.de> Date: Thu Mar 9 12:46:50 2023 +0100 first implementation
It's a little fussy to get set up properly, but runs amazingly well once it's functional. I'm using the Pyg-6b model on an M2 Mac mini with 64 GB of RAM. The response times with the default settings are pretty quick as you can see below: Also, another important note is that if you're using the interface on a different Mac with the --listen option, Safari will error out for some reason. Using Chrome will work fine in that case. Also, the --no-stream option will error out with a CUDA error so stay away from that for now. |
@oobabooga would instructions from here https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model also work for MPS / Silicon? General thinking:
So, I guess my question comes down to: is the hugging face transformers library in principle all that is needed? Assuming "hugging face transformers" + most recent pytorch w/ support for MPS 64bit (coming in macOS 13.3 (next week))? Or is there a hard dependency on cuda, since:
in other words: is this step a convenience for installing cuda related dependencies, or does integration of https://github.com/qwopqwop200/GPTQ-for-LLaMa into oobabooga/text-generation-webui have the same dependency on many thanks in advance! |
Yes, there is this dependency as far as I understand. GPTQ-for-LLaMa uses a custom CUDA kernel that must be compiled with The only low level requirements for this project in the end are |
ah, yes: "custom CUDA kernel" - I think this is the crucial part here, which then makes it non-portable to AMD or Apple/MPS thank you! |
I am not sure, where I made a mistake. I simply followed the instructions given by @WojtekKowaluk. But my mac always shows 100% CPU activity and basically no GPU activity. I got a M2-Pro Macbook 14" |
Using the latest from on an M1 Max I get this issue:
Have not found instructions on how to compile it for MacOS. EDIT: This stable diffusion project had a similar issue: d8ahazard/sd_dreambooth_extension#1103 and they seemed to have worked around the issue. |
Any plans on fixing this or should Mac users look elsewhere? Thanks!! |
@dogjamboree Fixing what exactly? It works well on an M1/M2 Mac using both GPU and CPU cores. Anything that requires CUDA specific calls will not work on a Mac. If you can give more details on what isn't working for you, maybe someone can provide more specific help. |
Not everything is supported on Mac yet, there is work to do on lower level tools and libs, e.g.
@cannin There is no workaround. As for now bitsandbytes is NVIDIA specific optimisation that lets you run in 8bit and 4bit mode, without it you have to run in 16bits mode and this is already supported. Other thing you can do is to run 4bit/8bit mode on CPU only and this is already provided with build-in llama.cpp support. |
What is the correct way to run it? It works really slow on my Mac M1. |
@GaidamakUA Yes, for me it is also slower than standalone llama.cpp, trying to figure out why. OK, I have figured it out. We need to set correctly numbers of threads for Apple Silicon. Basically you should set it to number of Performance cores (P cores) on your CPU to get best performance. Use M1/M2: |
Thank you @WojtekKowaluk for your help. I followed your instructions, but I'm getting the following error trying to load the
|
I installed it just now as follows (I use ⇒ git clone git@github.com:oobabooga/text-generation-webui.git
# ..snip..
⇒ cd text-generation-webui
# ..snip..
⇒ pyenv install miniconda3-latest
# ..snip..
⇒ pyenv local miniconda3-latest
# ..snip..
⇒ conda create -n textgen python=3.10.9
# ..snip..
⇒ conda activate textgen
# ..snip..
⇒ pyenv local miniconda3-latest/envs/textgen
# ..snip..
⇒ pip3 install torch torchvision torchaudio
# ..snip..
⇒ pip install -r requirements.txt
# ..snip.. Then downloaded the following model:
⇒ python download-model.py anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g
# ..snip.. But then when I try to run it, I get the following error ( ⇒ python server.py --auto-devices --chat --wbits 4 --groupsize 128 --model anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
dlopen(/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so, 0x0006): tried: '/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (no such file), '/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file)
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
dlopen(/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so, 0x0006): tried: '/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (no such file), '/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file)
/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Loading anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g...
Traceback (most recent call last):
File "/Users/devalias/dev/AI/text-generation-webui/server.py", line 302, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "/Users/devalias/dev/AI/text-generation-webui/modules/models.py", line 100, in load_model
from modules.GPTQ_loader import load_quantized
File "/Users/devalias/dev/AI/text-generation-webui/modules/GPTQ_loader.py", line 14, in <module>
import llama_inference_offload
ModuleNotFoundError: No module named 'llama_inference_offload' Googling for that module led me to the following: So I tried installing the requirements for that into the same conda environment, but just ended up with another error (
Edit: Ok, it seems you can just install
Note that I also re-setup my ⇒ python --version
Python 3.9.16 Though that now just leads me into this rabbithole of issues :(
We can see that
When loading a quantized model that sets
And we can see the logic it uses to try and find the relevant model file to load here (which has some special handling for So we can work around that by making a symlink of the model file (or renaming it), as follows: ⇒ cd models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
# ..snip..
⇒ ln -s gpt-x-alpaca-13b-native-4bit-128g.pt anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g-4bit.pt Then we can re-run ⇒ cd ../..
# ..snip..
⇒ python server.py --auto-devices --chat --wbits 4 --groupsize 128 --model anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
..snip..
Loading anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g...
Found models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g-4bit.pt
Loading model ...
Done.
Traceback (most recent call last):
File "/Users/devalias/dev/AI/text-generation-webui/server.py", line 302, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "/Users/devalias/dev/AI/text-generation-webui/modules/models.py", line 102, in load_model
model = load_quantized(model_name)
File "/Users/devalias/dev/AI/text-generation-webui/modules/GPTQ_loader.py", line 153, in load_quantized
model = model.to(torch.device('cuda:0'))
File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1888, in to
return super().to(*args, **kwargs)
File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled Seems it doesn't like running with ⇒ python server.py --cpu --chat --wbits 4 --groupsize 128 --model anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
..snip..
Loading anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g...
Found models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g-4bit.pt
Loading model ...
Done.
Loaded the model in 7.46 seconds.
Loading the extension "gallery"... Ok.
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`. Success! Edit2: Almost success.. it was enough to get the webui running, but it still fails when trying to generate a prompt with a
|
@mozzipa For the For the |
@0xdevalias , Thanks for your advice.
Following error occurs after prompt typing under
While I have another 13b model, I have downloaded I'd like to do your another way with @WojtekKowaluk advice. But, it was not able to find difference with what i did. I got same error as previous.
|
I decided to write summary, because this thread became very chaotic:
thank you 0xdevalias for listing related issues, this is very useful. |
Probably M1 with miniconda is not able to use.. |
It seems the --auto-devices not working with model directly. error on this, Apple silicon not support CUDA. This will be extremely slow
still slow , event slower than sd-web-ui with sd2.1 model.
Test base on m2 max 96G. I'm not sure why it's so slow. I'll try cpp next. Does anyone test the speed? How fast it can be on Mac? How to optimize it ? @WojtekKowaluk any suggest ? |
Fix for MPS support on Apple Silicon
In my case, it has been solved with below.
|
mozzipa would you provide complete setup for the M1 Max? |
in my case I was on python 3.11 and backing down to 3.10 made the app run after the nightly torch install. |
I'm running this on an M1 MacAir. I've downloaded a wizardLM model and put this command: python3 server.py --cpu --chat --wbits 4 --groupsize 128 --model WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors Then this is the error message I get Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): |
did you managed to fix your issue? |
You can follow this instruction to get GPU acceleration with llama.cpp:
I get following results with MacBook Pro M1:
So seems there is no performance improvement for Q4_0 at this time, but over 3x improvement for Q6_K. |
Prompt Engineering has compiled most of the issues above into a video. |
I‘ll give it a try! Thank you for the link. |
I see this issue when loading an AutoGPTQ model:
|
Using this over the past few months, I've been impressed with what this does, but it definitely does not use the M1/M2 GPU and unified memory. I think I may have tracked it down and am working on a fix once and for all on this. I'm keeping track of how I am progressing and hope to have a push to the repository in the next couple of days. The problem seems to be that the tensor processing falls back to CPU when the device gets changed (I think) to CUDA and not finding it in the ARM64 M2 I have, falls back not to using MPS, but falls back to using CPU. Stress testing of my python modules validated I can use 96% of my M2 GPU's and learned way more than I ever wanted to about the ARM64 M2 Max and unified memory. So far I have completely rebuilt all of the Python modules and stress tested them with complete success. So, I know it's not my modules or environment. I built my own OpenBLAS and LAPACK, I have instructions for that, but I think all can be solved easier than picking apart module builds and dependencies. I am a complete GitHub Open Source contribution virgin, so if someone would like to reach out as a contact in case I have any questions about the process, I don't want to screw the repository up making my changes. Fingers crossed, I'll have MPS working with all this soon and fixes uploaded with instructions to follow. Cheers! |
Since the main installation README.md for oobabooga on Mac links to this thread, which is quite long and mostly several months old, and refers to developer builds of a Mac OS that has now been released, it would be really nice if someone could give a concise, up-to date set of instructions summarizing how to enable MPS when installing oobabooga on Mac. Or even nicer if they could add it to the main installation README.md instead of linking to this thread.. |
I tried to do it like the link above (video https://youtu.be/btmVhRuoLkc), but I'm still struggling. I started to try my luck with lm-sys/FastChat Edit: And for my usecase FastChat works as intended. |
https://github.com/unixwzrd/oobabooga-macOS UPDATE: Created a repository for oobabooga-macOS with my raw text notes, will get more added to it soon, but figured to publish my notes as they might help someone now. UPDATE: Made it through a lot of this and created a very comprehensive guide to the tools you will need, using Conda and vent's, and have everything working now for LLaMA models. The GPU is processing, I can actually see it when monitoring performance. There;'s still something hitting the CPU and Gadio chews up CPU and GPU in Firefox (like the orange "working" bar, about 30% of the GPU when visible) , gotta be a better way to do some of this stuff. Discovered several other modules which need extra attention along the way and I did have a little trouble with the request module after re-basing and merging my local woking copy of oobabooga to current, not sure why, but tinkering with it git it working. Fun fact, on a 30B with no quantization and 3k context, I'm getting about 4-6 tokens per second. I'm sure there is some tuning I could do, but this is pretty good and very usable.
I've been sifting through the codebase here and turned up inconsistencies in how Torch devices are handled and trying to get a handle on that. I have got llama.cpp working standalone and built for MPS, but for some reason, I am having difficulty with the llama_cpp_python module which on a simple level, just a python wrapper around llama-cpp (actually, it's a dylan so you can have its functionality in Python). The Python module seems to have refused to build with metal support the last time I installed it, so either something changed, or I botched the installation.
|
MacBook Pro (Retina, 15-inch, Late 2013)
Traceback (most recent call last):
Traceback (most recent call last): 求解决方案,谢谢 |
Getting this error when running: python server.py --thread 4 Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions. I am trying to run model Llama-2-7B-Chat-GGML |
Is this approach still valid even though torch is now at 2.0.1 Stable> |
The latest command for PyTorch includes the following note, suggesting that we might no longer require adherence to this previously suggested approach. If feasible, could you please confirm this, @WojtekKowaluk ? Your confirmation would be greatly appreciated. Thank you!
|
@clearsitedesigns @manikantapothuganti: While PyTorch 2.0.1 has some cumsum op support, you will get this error (it is different error than with 2.0.0):
|
Hmm. Tried this and am getting this error? GGML_ASSERT: /private/var/folders/hl/v2rnh0fx3hjcctc41tzk7rkw0000gn/T/pip-install-mdqd4bv2/llama-cpp-python_dff92b844a124189bb952ea0fbc93386/vendor/llama.cpp/ggml-metal.m:738: false && "not implemented" |
FYI, I have created a new general thread for discussing Mac/Metal setup: #3760 It is now pinned at the top of the Issues section. |
First you need to install macOS Ventura 13.3 Beta (or later), it does not work on 13.2.
Then you have to install torch dev version, it does not work on 2.0.0.
pip install -U --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
Then it should work with changes from this PR.
Tested with facebook/opt-2.7b model.