Release Compressed Tensors v0.5.0 · neuralmagic/compressed-tensors

What's Changed

Add simple GHA workflow to run tests by @dbogunowicz in #2
Define BaseModels for Quantization by @Satrat in #3
Quantization refactor by @horheynm in #5
Apply quantization config implementation by @bfineran in #4
decorate fake quant with torch.no_grad by @bfineran in #8
fix observer bugs by @bfineran in #9
[lifecycle] docstrings + ux update to work with torch.apply by @bfineran in #11
Fix Device Mismatch by @Satrat in #12
Serialize Config from Model by @Satrat in #7
[Observers] pull shared logic into a helper function by @bfineran in #13
Rename the repo to compressed-tensors by @dbogunowicz in #14
fix style post rename PR by @bfineran in #25
Quantization Examples and Correctness Fixes by @Satrat in #26
Fix failing GHA by @dbogunowicz in #29
Pretrained Model Reload + SparseGPT Support by @Satrat in #31
[Release 0.3.0] Basic Readme and user-facing pathways by @dbogunowicz in #30
Quantization Fixes by @Satrat in #35
Final details for package by @mgoin in #36
bump version to 0.3.1 license an packaging updates by @bfineran in #37
Dyanmic Quantization by @bfineran in #15
[Release 0.3.2] Additional patches to enable compatibility with SparseML, UX changes by @Satrat in #43
Update target match conditions; make public by @dsikka in #44
[Lifecycle][Tests] Feature Branch by @horheynm in #38
[Observers] group size + channel wise + per token by @horheynm in #32
[BugFix] Update code to be compatible with py38 by @rahul-tuli in #48
[Fix] Fix the messed-up test structure by @dbogunowicz in #49
Bump the version before the release by @dbogunowicz in #50
Compressed lifecycle implementation (INT8 only) by @bfineran in #33
group size speedups + fixes by @bfineran in #51
Group and Channelwise Compression Support by @Satrat in #52
Int4 Packed Compressor by @Satrat in #47
Fix for auto device map quantization by @Satrat in #54
Enable generating compressed-tensors-nightly by @dbogunowicz in #53
[BugFix][Again] Update code to be compatible with py38 by @dbogunowicz in #56
Fix per_token slowdown by @Satrat in #57
[GPTQ Modifier UX] Add default scheme by @rahul-tuli in #61
fix group size min max tracking by adding tensor ids by @bfineran in #60
Support for aliased scheme settings in quant config by @bfineran in #40
Remove Symmetric Zero Point in Compressed Outputs by @Satrat in #59
Misc Fixes by @Satrat in #55
Fix for Symmetric Zero Point Reloading by @Satrat in #64
Additional Symmetric ZP Fix by @Satrat in #65
Make ZP int8 instead of int64 by @Satrat in #67
Add a function to check if a string is a preset scheme by @rahul-tuli in #66
Rename Packed Weights by @Satrat in #63
Fixed Grouped Quantization Reload by @Satrat in #68
Fix incorrect loading of dtype by @eldarkurtic in #70
Fix Python 3.8 Compatability by @Satrat in #71
Update nightly build to run at 6pm by @dsikka in #72
Update time for the runner by @dsikka in #74
Fixes to enable FSDP one-shot by @dbogunowicz in #58
Update Compression Config for HfQuantizer Compatability by @Satrat in #73
Remove version restriction on transformers by @mgoin in #76
remove pydantic version cap by @bfineran in #80
reduce appropriate dim by @horheynm in #75
Marlin24 Compressor by @Satrat in #77
Fix GPTQ Aliases by @Satrat in #81
initial fixes for compatibility with HFQuantizer by @bfineran in #79
bump version to 0.4.0 by @bfineran in #83
import is_release from version.py by @horheynm in #85
Add release build workflow by @dhuangnm in #89
Assert correct device when dequantizing (like we do for quantizing) by @dbogunowicz in #90
update default symmetry to True on presets by @bfineran in #92
Fp8 Quantization Support by @Satrat in #62
default W4A16 alias to use group_size=128 by @bfineran in #94
[compressor] Add packed int8 support by @dsikka in #91
Fix Decompress kwargs by @Satrat in #100
[Quant KV Cache] Implementation by @dbogunowicz in #86
Fix Transient Tests by @Satrat in #101
Speed Up Packed Compression by @Satrat in #103
[Fix] remove tests/quantization by @dbogunowicz in #99
Allow creating compressor when trust_remote_code=True by @dbogunowicz in #104
Update Quantization Presets by @Satrat in #105
[MOE] Add a set of functionalities to support quantization of MOE models by @dbogunowicz in #46
[BugFix]Fix Name Mangling Issue in compressed_tensors.utils by @rahul-tuli in #102
Update Quantization Scheme Standards for better readability by @markurtz in #106
quatization lifecycle - disable forward pass override + helper for weight quant param updates by @bfineran in #111
Add FP8 Dynamic Scheme for Latest Llama3.1 Meta Models and Fix W4A8 Representation by @markurtz in #114
Model Offloading Support by @Satrat in #113
Fix Test to Account for Model Change by @Satrat in #116
Make publish workflow manually triggerable by @rahul-tuli in #117
bump version to 0.5.0 by @bfineran in #119
[Cherry Pick] dont set quantization data on reload (#123) by @Satrat in #125

New Contributors

@mgoin made their first contribution in #36
@dsikka made their first contribution in #44
@rahul-tuli made their first contribution in #48
@eldarkurtic made their first contribution in #70
@dhuangnm made their first contribution in #89
@markurtz made their first contribution in #106

Full Changelog: https://github.com/neuralmagic/compressed-tensors/commits/0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed Tensors v0.5.0

What's Changed

New Contributors

Contributors