[Documentation] Difficulty using trt_int8_use_native_calibration_table option in ONNX Runtime #22059
Labels
documentation
improvements or additions to documentation; typically submitted using template
stale
issues that have not been addressed in a while; categorized by a bot
Hello,
I'm trying to figure out how to use the trt_int8_use_native_calibration_table option, but I can't find any examples or documentation that explain how to do so.
I generated a TensorRT INT8 calibration table using the IInt8EntropyCalibrator2 class provided by the TensorRT library. The output file is created using the write_calibration_cache method from this class.
Here is the content of the generated cache (model.cache):
I then try to use this cache in ONNX Runtime as follows:
When I run the code, the inference speed and output are the same as when I was using FP16. It seems like the INT8 calibration is not being applied, or it's not working correctly.
Could you provide guidance or examples on how to properly use the trt_int8_use_native_calibration_table option? Is there any additional configuration needed to ensure that INT8 optimization is applied in this scenario?
I'm currently using this method because the ONNX Runtime quantization tools take many hours to calibrate to an INT8 model, while TensorRT calibration only takes minutes (see issue and issue). I also noticed that TensorRT created a repository for model optimization, so I tried it out. However, I found that under the hood, it's calling ONNX Runtime quantization tools. I'm now trying to figure out the most efficient way to generate an INT8 model quickly and use it in ONNX Runtime. Thank you
Page / URL
No response
The text was updated successfully, but these errors were encountered: