add GPU quantization support

pommedeterresautee released this 08 Dec 22:46

· 161 commits to main since this release

support int-8 GPU quantization
add a tuto to perform quantization end to end
add QDQRoberta model
switch to ONNX opset 13
refactoring in the TensorRT engine creation
fix bugs
add auth token (for private HF repo)

What's Changed

Update triton by @pommedeterresautee in #11
fix README.md by @pommedeterresautee in #13
Fix install errors by @sam-writer in #20
Add auth token by @sam-writer in #19
Support GPU INT-8 quantization by @pommedeterresautee in #15

New Contributors

@sam-writer made their first contribution in #20

Full Changelog: v0.1.1...v0.2.0

Contributors

pommedeterresautee and sam-writer

Assets 2