Replies: 3 comments 1 reply
-
We really need this! More info about QTIP at: https://arxiv.org/pdf/2406.11235 |
Beta Was this translation helpful? Give feedback.
-
This was already asked before you posted this in the topic about sota quants, but I guess the extra visibility can't hurt. I'm curious to see whether this is truly easy to integrate and also if it can surpass the current quants without any downsides. |
Beta Was this translation helpful? Give feedback.
-
what is the architecture for quantization in llama.cpp? I need to find documentation :D I am curious if we can bring this in and still apply (vs ignore) importance matrices |
Beta Was this translation helpful? Give feedback.
-
introduce QTIP, a new LLM quantization algorithm that uses trellis coded quantization and incoherence processing to achieve a state of the art combination of speed and quantization quality.
https://www.reddit.com/r/LocalLLaMA/comments/1ggwrx6/new_quantization_method_qtip_quantization_with/
Quote
" It should be pretty easy to integrate QTIP into llama.cpp. QTIP replaces the vector quantizer in QuIP# with a trellis quantizer. Llama.cpp's vector quantizer is based off of QuIP#'s E8P vector quantizer, so it should be straightforward to swap QTIP's trellis quantizer in instead."
Beta Was this translation helpful? Give feedback.
All reactions