-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add c4ai-command-r-v01 Support #5762
Comments
Waiting for Bump llama-cpp-python to 0.2.57, i think. |
It just updated to a new version. Looked like llama updated too, but it still doesn't work. Strange... Koboldcpp also won't load the model. |
You can run with ollama, check this. |
Someone posted tips how to run GGUF in oobabooga (I have not tried this personally so dunno if it works): https://old.reddit.com/r/LocalLLaMA/comments/1bpfx92/commandr_on_textgenerationwebui/ If you have 24 or more VRAM, you can run exl2. On 4090 Win11, I'm able to run exl2 3.0bpw. A maximum of 7168 context length fits into VRAM with 4bit cache. Need to manually update to exl2 0.0.16 (https://github.com/turboderp/exllamav2/releases).
The quants are here: |
I can run it on my AMD machines but not on Intel. All 64GB RAM + 12GB Nvidia UPDATE: I reinstalled Ollama. Works now for me. |
They now also released a larger, 104B parameter model: C4AI Command R+ |
c4ai-command-r-v01 now loads in Ooba |
I unfortunately get a Segmentation fault every time with the new llama_cpp_python .. |
Yeah, same here, and not just on Command-r, so I've reverted llama-cpp-python for now. Should be fixed according to abetlen but I guess it isn't after all. |
I get a segfault on exllama2[_hf] too loading command-r-plus exl2. Other exl2 models work fine. (Edit: both on the dev branch) So that points to a shared dependency of the command-r architecture (I still know nothing about architectures). Edit 2: building exllama2 upstream works quite well. A hint of repetition with default instruct settings on divine intellect, perhaps. I'll try GGUF next. Edit 3: Got GGUF working as well. Built llama-cpp-python with the latest changes from ggerganov/llama.cpp#6491, |
c4ai-command-r-plus works if you bump Exllamav2 up to 0.0.18. May also fix support for c4ai-command-r-v01. Notably I've not able to get either models working inside text-generation-webui with regular transformers. Whilst it will load, it only output gibberish for me (repeated words). I'm running
Chat template:
Update. c4ai-command-r-v01 does not work due to tokenizer config clash:
Also I suspect there is something funky going on with the context memory for c4ai-command-r-plus in text-generation-webui. Was only able to achieve a context of 9k with the 5.0bpw exl2 quant. |
@divine-taco Same here, repeats gibberish using transformers 8-bit and 4-bit, I have tried a lot of different settings and parameter changes. I can load it via transformers, but it will only output gibberish. |
Command R+ support was just added to llama.cpp: ggerganov/llama.cpp#6491 |
anyone got the 35b to run with oobabooga on mac? |
PSA: the dev branch now has a fix for this. For the gibberish make sure to use the (now default) min_p preset. |
Hmm, I guess the fix was just for the 35b version not the plus version? I grabbed the dev version and tried it out without any change to the output for the c4ai-command-r-plus model. |
@RandomInternetPreson sorry I was probably too eager. I only retested the llama.cpp quants (exl2 was already working fine especially since the min_p update). You are using the full model quantized on-the-fly with bitsandbytes, right? I'll try to reproduce the issue, but I'm not sure if I have enough memory. (Edit: I have only been testing command-r plus, I haven't gotten around to the 35b model nor the new mistral models yet) |
@randoentity np, I figured out my issue #5838 (comment) Perhaps it will help others encountering this in the future. |
It's a little annoying that I still can't run the SOTA 35B model in the most popular webUI. |
I don't think it is any issue with textgen it's ans issue with transformers. You can update to the dev version like I did and see if that fixes your issue. |
I'm running it just fine. Let me know if you need any help. The only thing I can't figure out is how to increase rope_freq_base to 8,000,000 in the GUI. It still runs fine though |
On my mac, I recently pulled the latest oobabooga version and tried running this model, this is the only model that made the entire laptop freeze and I had to forcefully restart it. Is it working for other mac users? I tried the 35b, I could run 70b models on this laptop with the same cli arguments, but maybe this model requires different cli flags or something? |
I couldn't get it running on Linux and a 7900xtx, tried both transformers and llamaCPP. |
I have it running on Linux and a 4090, llama.cpp through ooba. Good luck with amd though |
Yeah I can't get it to run for some reason. (even in dev branch) |
I can answer any specific questions you might have |
Are you using GGUF or exllama? |
GGUF |
I reinstalled webui and still get the same error. I downloaded a new GGUF and the result is the same
|
I asked llama-3-70b-instruct and it basically said it's a common, generic error. It said try running it on CPU or do you have enough memory? |
Even with yesterday's version (snapshot-2024-04-28) it still doesn't work for me.
|
I was getting the same error, and at least for me lowering the context length solved it prior to loading the model. |
Command-R doesn't have GQA, so the KV cache footprint is much larger than other models. I personally go OOM and crash at around 5k context, which is kind of absurd considering Miqu will give me the full 32k using a quant of similar size. |
Command-r 35b takes up more than 64 GB of RAM with a context of only 32K. So just put ~2K and increase it to your needs and capabilities. |
Why I still can't run command-r in webui even though its support was added to the main cpp branch?
original model
GGUF model
log
The text was updated successfully, but these errors were encountered: