Intel® Neural Compressor v3.0 Release
- Highlights
- Features
- Improvements
- Examples
- Bug Fixes
- Documentations
- Validated Configurations
Highlights
- FP8 quantization and INT4 model loading support on Intel® Gaudi® AI accelerator
- Framework extension API for quantization, mixed-precision and benchmarking
- Accuracy-aware FP16 mixed precision support on Intel® Xeon® 6 Processors
- Performance optimizations and usability improvements on client-side quantization
Features
- [Quantization] Support FP8 quantization on Gaudi (95197d)
- [Quantization] Support INC and Hugging Face model loading on framework extension API for PyTorch (0eced1, bacc16)
- [Quantization] Support Weight-only Quantization on framework extension API for PyTorch (34f0a9, de43d8, 4a4509, 1a4509, a3a065, 1386ac, a0dee9, 503d9e, 84d705, 099b7a, e3c736, e87c95, 2694bb, ec49a2, e7b4b6, a9bf79, ac717b, 915018, 8447d7, dc9328)
- [Quantization] Support static and dynamic quantization in PT2E path (7a4715, 43c358, 30b36b, 1f58f0, 02958d)
- [Quantization] Support SmoothQuant and static quantization in IPEX path with framework extension API (53e6ee, 72fbce, eaa3a5, 95e67e, 855c10, 9c6102, 5dafe5, a5e5f5, 191383, 776645)
- [Quantization] Support Layer-wise Quantization for RTN/GPTQ on framework extension API for PyTorch (649e6b)
- [Quantization] Support Post Training Quantization on framework extension API for Tensorflow (6c27c1, e22c61, f21afb, 3882e9, 2627d3)
- [Quantization] Support Post Training Quantization on Keras3 (f67e86, 047560)
- [Quantization] Support Weight-only Quantization on Gaudi2 (4b9b44, 14868c, 0a3d4b)
- [Quantization] Improve performance and usability of quantization procedure on client side (16a7b1)
- [Quantization] Support auto-device detection on framework extension API for PyTorch (368ba5, 4b9b44, e81a2d, 0a3d4b, 534300, 2a86ae)
- [Quantization] Support Microscaling(MX) Quant for PyTorch (4a24a6, 455f1e)
- [Quantization] Enable cross-devices Half-Quadratic Quantization(HQQ) for LLMs support (db6164, 07f940)
- [Quantization] Support FP8 cast Weight-only Quantization (57ed61)
- [Mixed-Precision] Support FP16 mixed-precision on framework extension autotune API for PyTorch (2e1cdc)
- [Mixed-Precision] Support mixed
INT8
withFP16
in PT2E path (fa961e) - [AutoTune] Support accuracy-aware tuning on framework extension API (e97659, 7b8aec, 5a0374, a4675c, 3a254e, ac47d9, b8d98e, fb6142, fa8e66, d22df5, 09eb5d, c6a8fa)
- [Benchmarking] Implement
incbench
command for ease-of-use benchmark (2fc725)
Improvements
- [Quantization] Integrate AutoRound v0.3 (bfa27e, fd9685)
- [Quantization] Support auto_host2device on RTN and GPTQ(f75ff4)
- [Quantization] Support
transformers.Conv1D
WOQ quantization (b6237c) - [Quantization] support quant_lm_head argument in all WOQ configs (4ae2e8)
- [Quantization] Update fp4_e2m1 mapping list to fit neural_speed and qbits inference (5fde50)
- [Quantization] Enhance load_empty_model import (29471d)
- [Common] Add common logger to the quantization process (1cb844, 482f87, 83bc77, f50baf)
- [Common] Enhance the
set_local
for operator type (a58638) - [Common] Port more helper classes from 2.x (3b150d)
- [Common] Refine base config for 3.x API (be42d0, efea08)
- [Export] Migrate export feature to 2.x and 3.x from deprecated (794b27)
Examples
- Add save/load for PT2E example (0e724a)
- Add IPEX XPU example for framework extension API (6e1b1d)
- Enable TensorFlow yolov5 example for framework extension API (19024b)
- Update example for framework extension IPEX SmoothQuant (b35ff8)
- Add SDXL model example for framework extension API (000946)
- Add PyTorch mixed precision example (e106de, 9077b3)
- Add CV and LLM examples for PT2E quantization path (b401b0)
- Add Recommendation System examples for IPEX path (e470f6)
- Add TensorFlow examples for framework extension API (fb8577, 922b24)
- Add PyTorch Microscaling(MX) Quant examples (6733da)
- Add PyTorch SmoothQuant LLM examples for new framework extension API (137fa3)
- Add PyTorch GPTQ/RTN example for framework extension API (813d93)
- Add double quant example (ccd0c9)
Bug Fixes
- Fix ITREX qbits nf4/int8 training core dumped issue (190e6b)
- Fix unused pkgs import (437c8e)
- Remove Gelu Fusion for TensorFlow New API (5592ac)
- Fix GPTQ layer match issue (90fb43)
- Fix static quant regression issue on IPEX path (70a1d5)
- Fix config expansion with empty options (6b2738)
- Fix act_observer for IPEX SmoothQuant and static quantization (263450)
- Set automatic return_dict=False for GraphTrace (53e7df)
- Fix WOQ Linear pack slow issue (da1ada, daa143)
- Fix dtype of unpacked tensor (29fdec)
- Fix WeightOnlyLinear bits type when dtype="intx" (19ff13)
- Fix several issues for SmoothQuant and static quantization (7120dd)
- Fix IPEX examples failed with evaluate (e82674)
- Fix HQQ issue for group size of -1 (8dac9f)
- Fix bug in GPTQ g_idx (4f893c)
- Fix tune_cfg issue for static quant (ba1650)
- Add non-str
op_name
match workaround for IPEX (911ccd) - Fix opt GPTQ double quant example config (62aa85)
- Fix GPTQ accuracy issue in framework extension API example (c701ea)
- Fix bf16 symbolic_trace bug (3fe2fd)
- Fix
opt_125m_woq_gptq_int4_dq_ggml
issue (b99aba)
Documentations
- Update new API installation document (50eb6f, ff3740)
- Add new architecture diagram (d56075, 2c3556)
- Add new workflow diagram (96538c)
- Update documentation for framework extension API (0a5423)
- Add documents for framework extension API for PyTorch (ecffc2)
- Add documents for framework extension API for TensorFlow (4dbf71)
- Add documents for autotune API (853dc7, de3e94)
- Update for API 3.0 online doc (81a076, 87f02c, efcb29)
- Add docstring for API modules (aa42e5, 5767ae, 296c5d, 1ebf69, 0c52e1, b78794, 6b3020, 28578b)
- Add doc for client usage (d254d5, 305838)
- Remove 1.x API documents (705672, d32046)
- Add readme for framework extension API examples (385da7)
- Add version mapping between INC and Gaudi SW Stack (acd8f4)
Validated Configurations
- Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- PyTorch/IPEX 2.1, 2.2, 2.3
- TensorFlow 2.14, 2.15, 2.16
- ONNX Runtime 1.16, 1.17, 1.18