Skip to content

Releases: Nexesenex/croco.cpp

Croco.Cpp_FrankenFork_v1.80002_b4229

08 Dec 19:36
Compare
Choose a tag to compare

New IQ_K quants of Ikawrakow available for inference on Cuda.

  • IQ2_K, IQ3_K, IQ4_K, IQ5_K and IQ6_K.
    Almost no models, if any, are quantized with it and shared on HF.
    But it's one step ahead.
    The newer quants of IK are a bit harder to implement for me (I can't use the .c files of Llama.CPP and need to plainly integrate IK's work (in C++), so it'll take a bit longer, I learn as I do it basically.

It works on Python, I'm compiling an .exe for Pascal, Turing, and beyond right now.

Edit : I can't make an working .exe right now. I'll see what's up later.

What you can try if you don't know better :
Download the source, put the dll in the repository, install the requirements with the Install requirements.bat, then launch with Croco.Cpp_python_launch.bat

Non Cuda users, use the previous version. No IQ_K quants there yet, though.

I joined a compiled version of IK_LLAMA_CPP, with some edits of mine. Credits go to Ikawrakow.

Croco.Cpp_FrankenFork_v1.80001_b4229

04 Dec 18:14
Compare
Choose a tag to compare

The usual, plus :

  • Q6_0 quants supported, included the MMQ mode in Cuda (thanks Ikawrakow)
  • KV cache (Flash attention) mode K q6_0 / V q5_0 warmly recommended, very close to q8_0/q5_1 in terms of quality, and vastly superior to the previous best compromise q5_1/q5_0. (thanks Ikawrakow)
  • Image generation works again (Cuda and Vulkan tested), it was broken on previous Croco versions.

Full Changelog: v1.78003_b4067...v1.80001_b4229

Most credits go to Concedo, for KoboldCPP, and to the LlamaCPP team.

Croco.Cpp_FrankenFork_v1.78003_b4067

13 Nov 09:20
Compare
Choose a tag to compare

Test release, for the adventurous minds.

Croco.Cpp_FrankenFork_v1.77009_b3972

24 Oct 22:20
Compare
Choose a tag to compare

Contextshift doesn't depend on Noshift anymore, that parameter is gone.
Just tick the button in GUI like before, or try to use --contextshift in CLI.

Full Changelog: v1.77008_b3962...v1.77009_b3962

Croco.Cpp_FrankenFork_v1.77008_b3972

24 Oct 20:57
Compare
Choose a tag to compare

Lazy release with Concedo's merge ordeal included.

Cuda backend problems with K cache non-FA are fixed, but beware of certain K non-FA / KV FA quants with Qwen 2.5 models (at least the 1.5b), see GUI's KV slider for infos.

Full Changelog: v1.77005_b3962...v1.77008_b3962

Croco.Cpp_FrankenFork_v1.77006_b3972

24 Oct 16:46
Compare
Choose a tag to compare

Test release for Cuda bugfix.

Needs more testing, feedback appreciated.

-> Your CPU, GPU, RAM, OS, and the model tested.

If you're motivated, here's my bucket list among which you can pick :

-> In full Cuda offload : does it work?
-> In full Cuda offload with lowvram argument : does it work?
-> In full Cuda offload with mmq argument : does it work?
-> In full Cuda offload with mmq AND lowvram argument : does it work?
-> In partial offload : does it work?
-> In cuda mode but without layers on the GPU : does it work?

-> Each with KV quant mode 0 (FA, non FA), 1, 9, 14 (FA), 16 and 17 (No FA).

Turing, Ampere, and Ada supported for now, Pascal and Maxwell to come later.

Croco.Cpp_FrankenFork_v1.77005_b3962

23 Oct 07:37
Compare
Choose a tag to compare
Pre-release

Bugfix for 1.77004, with :

  • K q8_0 V F16 FA quant dropped.
  • Non FA Quants dropped for now, only K q6_0 V F16 works (and it's the best anyway).
  • Algo to pass the quant, FA, and no-shift fixed.
  • Up KLite to 182.

Croco.Cpp_FrankenFork_v1.77004_b3962

23 Oct 00:28
Compare
Choose a tag to compare
Pre-release

New lazy release with :

  • @ikawrakow's recent work on KV Quants integrated (new KV quant IQ4_NL to replace Q4_0, with -1% PPL if both K and V, Q6_0 close to Q8_0 in the couple K Q6_0 / V Q5_0).
  • Use the GUI to discover the new modes.

  • Some of Ikawrakow's work on Cuda (on the top of some of his work for CPU inference).
  • Cuda Graph caching PR of Agray3.
  • Some bugfixes (aka, the bugs created by yours truly). ^^
  • Note : Llava users, be careful, it might not work or simply crash.

Second release : with the help fixed.

I'll make a longer readme when motivated.

Full Changelog: v1.76005_b3906...v1.77004_b3962

Croco.Cpp_FrankenFork_v1.77002_b3934

20 Oct 04:15
Compare
Choose a tag to compare
Pre-release
KVQ27 (iq4_nl) doesn't work, I leave it for further testing.
I left the equivalences for the deleted KV quants to the closest equal or inferior bpw, so the config files keep working as they are, until a stable KVQ cocktail of quants is chosen.

Croco.Cpp_FrankenFork_v1.76007_b3917

14 Oct 22:29
Compare
Choose a tag to compare