You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Windows 11 Docker 25.03 with wsl2 backend
Kernel Version: 5.15.133.1-microsoft-standard-WSL2
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 15.62GiB
GPU NVidia 3060Ti 8GB VRAM
Describe the bug
Running intfloat/multilingual-e5-base with transformer backend with cuda: true fail with RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) in logs To Reproduce
Request embedding from AnythingLLM with the following embedding configuration
#1775 and fix: Transformer backend error on CUDA #1774 (#1823)
* fixes#1775 and #1774
Add BitsAndBytes Quantization and fixes embedding on CUDA devices
* Manage 4bit and 8 bit quantization
Manage different BitsAndBytes options with the quantization: parameter in yaml
* fix compilation errors on non CUDA environment
…for Openvino and CUDA (#1892)
* fixes#1775 and #1774
Add BitsAndBytes Quantization and fixes embedding on CUDA devices
* Manage 4bit and 8 bit quantization
Manage different BitsAndBytes options with the quantization: parameter in yaml
* fix compilation errors on non CUDA environment
* OpenVINO draft
First draft of OpenVINO integration in transformer backend
* first working implementation
* Streaming working
* Small fix for regression on CUDA and XPU
* use pip version of optimum[openvino]
* Update backend/python/transformers/transformers_server.py
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
LocalAI version:
quay.io/go-skynet/local-ai:master-cublas-cuda12-ffmpeg
Environment, CPU architecture, OS, and Version:
Windows 11 Docker 25.03 with wsl2 backend
Kernel Version: 5.15.133.1-microsoft-standard-WSL2
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 15.62GiB
GPU NVidia 3060Ti 8GB VRAM
Describe the bug
Running intfloat/multilingual-e5-base with transformer backend with cuda: true fail with
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) in logs
To Reproduce
Request embedding from AnythingLLM with the following embedding configuration
Expected behavior
Generate Embedding
Logs
Additional context
I've implemented a fix locally and opened this Issue to track it.
The text was updated successfully, but these errors were encountered: