Q4_2 quantization with rmse-optimized scale and quants #1062

ikawrakow · 2023-04-19T15:15:55Z

For quantize-stats we get
q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012

For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks.

Quantization is slow (~90 seconds on my Mac for 7B) as not multi-threaded as in PR #896.

For quantize-stats we get q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012 For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks. Quantization is slow (~90 seconds on my Mac for 7B) as not multi-threaded as in PR #896.

Not sure why this makes them fail

ggml.c

Green-Sky · 2023-04-19T18:11:34Z

ggml.c

@@ -1123,12 +1124,94 @@ static void quantize_row_q4_2_reference(const float * restrict x, block_q4_2 * r
    }
 }

+static inline int nearest_int(float fval) {


this inline does not do anything here. the static is all you need.

hm actually, after looking at cppref, i am not sure that C and C++ are the same here.

ikawrakow requested a review from ggerganov April 19, 2023 15:15

ggerganov approved these changes Apr 19, 2023

View reviewed changes

ggml : satisfy the sanitizer builds

6d36a51

Not sure why this makes them fail

sw reviewed Apr 19, 2023

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

sw reviewed Apr 19, 2023

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

Kawrakow added 2 commits April 19, 2023 18:52

Better follow ggml conventions for function names

49beb2c

Fixed type as per reviewer comment

96d8443

Green-Sky reviewed Apr 19, 2023

View reviewed changes

ikawrakow merged commit f7d0509 into master Apr 19, 2023

ikawrakow deleted the quantize-q4-2-rmse branch April 19, 2023 18:20

ggerganov mentioned this pull request Apr 22, 2023

Use full range for q4_0 quantization #729

Merged

MarcioPais mentioned this pull request Apr 22, 2023

Investigate alternative approach for Q4 quantization #397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q4_2 quantization with rmse-optimized scale and quants #1062

Q4_2 quantization with rmse-optimized scale and quants #1062

ikawrakow commented Apr 19, 2023

Green-Sky Apr 19, 2023

Green-Sky Apr 19, 2023

Q4_2 quantization with rmse-optimized scale and quants #1062

Q4_2 quantization with rmse-optimized scale and quants #1062

Conversation

ikawrakow commented Apr 19, 2023

Green-Sky Apr 19, 2023

Choose a reason for hiding this comment

Green-Sky Apr 19, 2023

Choose a reason for hiding this comment