Compressed Tensors v0.5.0
What's Changed
- Add simple GHA workflow to run tests by @dbogunowicz in #2
- Define BaseModels for Quantization by @Satrat in #3
- Quantization refactor by @horheynm in #5
- Apply quantization config implementation by @bfineran in #4
- decorate fake quant with torch.no_grad by @bfineran in #8
- fix observer bugs by @bfineran in #9
- [lifecycle] docstrings + ux update to work with torch.apply by @bfineran in #11
- Fix Device Mismatch by @Satrat in #12
- Serialize Config from Model by @Satrat in #7
- [Observers] pull shared logic into a helper function by @bfineran in #13
- Rename the repo to
compressed-tensors
by @dbogunowicz in #14 - fix style post rename PR by @bfineran in #25
- Quantization Examples and Correctness Fixes by @Satrat in #26
- Fix failing GHA by @dbogunowicz in #29
- Pretrained Model Reload + SparseGPT Support by @Satrat in #31
- [Release 0.3.0] Basic Readme and user-facing pathways by @dbogunowicz in #30
- Quantization Fixes by @Satrat in #35
- Final details for package by @mgoin in #36
- bump version to 0.3.1 license an packaging updates by @bfineran in #37
- Dyanmic Quantization by @bfineran in #15
- [Release 0.3.2] Additional patches to enable compatibility with SparseML, UX changes by @Satrat in #43
- Update target match conditions; make public by @dsikka in #44
- [Lifecycle][Tests] Feature Branch by @horheynm in #38
- [Observers] group size + channel wise + per token by @horheynm in #32
- [BugFix] Update code to be compatible with py38 by @rahul-tuli in #48
- [Fix] Fix the messed-up test structure by @dbogunowicz in #49
- Bump the version before the release by @dbogunowicz in #50
- Compressed lifecycle implementation (INT8 only) by @bfineran in #33
- group size speedups + fixes by @bfineran in #51
- Group and Channelwise Compression Support by @Satrat in #52
- Int4 Packed Compressor by @Satrat in #47
- Fix for auto device map quantization by @Satrat in #54
- Enable generating
compressed-tensors-nightly
by @dbogunowicz in #53 - [BugFix][Again] Update code to be compatible with py38 by @dbogunowicz in #56
- Fix per_token slowdown by @Satrat in #57
- [GPTQ Modifier UX] Add default scheme by @rahul-tuli in #61
- fix group size min max tracking by adding tensor ids by @bfineran in #60
- Support for aliased scheme settings in quant config by @bfineran in #40
- Remove Symmetric Zero Point in Compressed Outputs by @Satrat in #59
- Misc Fixes by @Satrat in #55
- Fix for Symmetric Zero Point Reloading by @Satrat in #64
- Additional Symmetric ZP Fix by @Satrat in #65
- Make ZP int8 instead of int64 by @Satrat in #67
- Add a function to check if a string is a preset scheme by @rahul-tuli in #66
- Rename Packed Weights by @Satrat in #63
- Fixed Grouped Quantization Reload by @Satrat in #68
- Fix incorrect loading of dtype by @eldarkurtic in #70
- Fix Python 3.8 Compatability by @Satrat in #71
- Update nightly build to run at 6pm by @dsikka in #72
- Update time for the runner by @dsikka in #74
- Fixes to enable FSDP one-shot by @dbogunowicz in #58
- Update Compression Config for HfQuantizer Compatability by @Satrat in #73
- Remove version restriction on transformers by @mgoin in #76
- remove pydantic version cap by @bfineran in #80
- reduce appropriate dim by @horheynm in #75
- Marlin24 Compressor by @Satrat in #77
- Fix GPTQ Aliases by @Satrat in #81
- initial fixes for compatibility with HFQuantizer by @bfineran in #79
- bump version to 0.4.0 by @bfineran in #83
- import is_release from version.py by @horheynm in #85
- Add release build workflow by @dhuangnm in #89
- Assert correct device when dequantizing (like we do for quantizing) by @dbogunowicz in #90
- update default symmetry to True on presets by @bfineran in #92
- Fp8 Quantization Support by @Satrat in #62
- default W4A16 alias to use group_size=128 by @bfineran in #94
- [compressor] Add packed int8 support by @dsikka in #91
- Fix Decompress kwargs by @Satrat in #100
- [Quant KV Cache] Implementation by @dbogunowicz in #86
- Fix Transient Tests by @Satrat in #101
- Speed Up Packed Compression by @Satrat in #103
- [Fix] remove
tests/quantization
by @dbogunowicz in #99 - Allow creating compressor when
trust_remote_code=True
by @dbogunowicz in #104 - Update Quantization Presets by @Satrat in #105
- [MOE] Add a set of functionalities to support quantization of MOE models by @dbogunowicz in #46
- [BugFix]Fix Name Mangling Issue in
compressed_tensors.utils
by @rahul-tuli in #102 - Update Quantization Scheme Standards for better readability by @markurtz in #106
- quatization lifecycle - disable forward pass override + helper for weight quant param updates by @bfineran in #111
- Add FP8 Dynamic Scheme for Latest Llama3.1 Meta Models and Fix W4A8 Representation by @markurtz in #114
- Model Offloading Support by @Satrat in #113
- Fix Test to Account for Model Change by @Satrat in #116
- Make publish workflow manually triggerable by @rahul-tuli in #117
- bump version to 0.5.0 by @bfineran in #119
- [Cherry Pick] dont set quantization data on reload (#123) by @Satrat in #125
New Contributors
- @mgoin made their first contribution in #36
- @dsikka made their first contribution in #44
- @rahul-tuli made their first contribution in #48
- @eldarkurtic made their first contribution in #70
- @dhuangnm made their first contribution in #89
- @markurtz made their first contribution in #106
Full Changelog: https://github.com/neuralmagic/compressed-tensors/commits/0.5.0