weight only int4 is slower than cutlass int4 #362

zhoutianzi666 · 2024-03-19T05:01:25Z

https://github.com/ModelTC/lightllm/blob/main/lightllm/common/basemodel/triton_kernel/dequantize_gemm_int4.py

The algorithm in the above file implements weight only int4, but its speed is only 50% of cutpass int4. How can this be resolved?

hiworldwzj · 2024-03-19T06:06:14Z

You can compile your own operator interface implemented with pybind, and accelerate the inference by modifying the source code to replace the implementation used during inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weight only int4 is slower than cutlass int4 #362

weight only int4 is slower than cutlass int4 #362

zhoutianzi666 commented Mar 19, 2024 •

edited

Loading

hiworldwzj commented Mar 19, 2024

weight only int4 is slower than cutlass int4 #362

weight only int4 is slower than cutlass int4 #362

Comments

zhoutianzi666 commented Mar 19, 2024 • edited Loading

hiworldwzj commented Mar 19, 2024

zhoutianzi666 commented Mar 19, 2024 •

edited

Loading