-
Notifications
You must be signed in to change notification settings - Fork 4.2k
quantized int8 inference
To support int8 model deployment on mobile devices,we provide the universal post training quantization tools which can convert the float32 model to int8 model.
Example with mobilenet, just need three steps.
./ncnnoptimize mobilenet.param mobilenet.bin mobilenet-opt.param mobilenet-opt.bin 0
We suggest that using the verification dataset for calibration, which is more than 5000 images.
Some imagenet sample images here https://github.com/nihui/imagenet-sample-images
find images/ -type f > imagelist.txt
./ncnn2table mobilenet-opt.param mobilenet-opt.bin imagelist.txt mobilenet.table mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[224,224,3] pixel=BGR thread=8 method=kl
- mean and norm are the values you passed to
Mat::substract_mean_normalize()
- shape is the blob shape of your model, [w,h] or [w,h,c]
* if w and h both are given, image will be resized to exactly size.
* if w and h both are zero or negative, image will not be resized.
* if only h is zero or negative, image's width will scaled resize to w, keeping aspect ratio.
* if only w is zero or negative, image's height will scaled resize to h
- pixel is the pixel format of your model, image pixels will be converted to this type before
Extractor::input()
- thread is the CPU thread count that could be used for parallel inference
- method is the post training quantization algorithm, kl and aciq are currently supported
If your model has multiple input nodes, you can use multiple list files and other parameters
./ncnn2table mobilenet-opt.param mobilenet-opt.bin imagelist-bgr.txt,imagelist-depth.txt mobilenet.table mean=[104,117,123],[128] norm=[0.017,0.017,0.017],[0.0078125] shape=[224,224,3],[224,224,1] pixel=BGR,GRAY thread=8 method=kl
./ncnn2int8 mobilenet-opt.param mobilenet-opt.bin mobilenet-int8.param mobilenet-int8.bin mobilenet.table
If you don’t need static quantization, ncnn supports RNN/LSTM/GRU dynamic quantization. In this case, you can omit the table file.
./ncnn2int8 rnn-model.param rnn-model.bin rnn-model-int8.param rnn-model-int8.bin
the ncnn library would use int8 inference automatically, nothing changed in your code
ncnn::Net mobilenet;
mobilenet.load_param("mobilenet-int8.param");
mobilenet.load_model("mobilenet-int8.bin");
Before quantize your model, comment the layer weight scale line in table file, then the layer will do the float32 inference
conv1_param_0 156.639840536
#conv1_param_0 156.639840536