[Feature Branch] Quant modifier UX #2263

rahul-tuli · 2024-05-02T15:31:00Z

Quantization Modifier UX Update

Description

This PR refactors the quantization modifiers to enhance user experience and simplify the system architecture. It is based off of changes from ~~the sa/quant_mod_refactor~~ main branch, all subsequent changes will be merged as smaller bites sized PRs into this. Key updates include:

Decoupling Wanda and SparseGPT Split Wanda and SparseGPT #2266
Split SparseGPT and GPTQ modifiers #2272
- Decoupling SparseGPT and GPTQ
- Removing quantization features from SparseGPT
- Adding quantization features to GPTQ
- Preserve sparsity GPTQ #2281 Preserve base sparsity in GPTQ
  - Preserve sparsity SPARSEGPT #2282 Preserve sparsity mask in SparseGPT
  - [GPTQ Modifier UX] Update tests to use GPTQModifier for obcq style quantization #2294
GPTQ UX config groups support #2273
[GPTQ Modifier UX] Add default scheme compressed-tensors#61

Reference Documentation

For more detailed information about the changes and their impact, please refer to the documentation here.

Make sparsegpt not inherit from wanda modifier Decouple SparseGPTModifierPyTorch from WandaPruningModifier Fix docstrings

* Update OBCQ * Extract GPTQ Modifier

…antization (#2294) * Update OBCQ * Extract GPTQ Modifier * Update test recipes

* Update OBCQ * Extract GPTQ Modifier * Update test recipes * Add config_groups support to GPTQModifier * mask_structure preservation test (#2284) * test * Preserve weight sparsity if greater than threshold * Add argument to preserve sparsity mask in SPARSEGPT * fix case when mask is none * Add test to check mask_structure - initial mask structure should be preserved b/w consecutive runs; added test to check this * Update tensor_follows_mask_structure to check for atleast n zeros --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com> * PR comments --------- Co-authored-by: Sara Adkins <sara@neuralmagic.com>

Satrat

LGTM once the tests pass

rahul-tuli changed the base branch from main to sa/quant_mod_refactor May 2, 2024 15:31

Base automatically changed from sa/quant_mod_refactor to main May 6, 2024 20:02

rahul-tuli force-pushed the quant-modifier-ux branch 3 times, most recently from dfb3d7f to a55f50c Compare May 9, 2024 14:45

rahul-tuli requested review from Satrat, bfineran, dsikka, horheynm and dbogunowicz May 13, 2024 14:05

rahul-tuli marked this pull request as ready for review May 13, 2024 14:16

rahul-tuli changed the title ~~[WIP][Feature Branch] Quant modifier UX~~ [Feature Branch] Quant modifier UX May 13, 2024

rahul-tuli self-assigned this May 13, 2024

rahul-tuli force-pushed the quant-modifier-ux branch from 5b88ad3 to 4230fb7 Compare May 17, 2024 16:07

Split WandaPruningModifier and SparseGPTModifier

d4d85ff

Make sparsegpt not inherit from wanda modifier Decouple SparseGPTModifierPyTorch from WandaPruningModifier Fix docstrings

rahul-tuli force-pushed the quant-modifier-ux branch from 4230fb7 to d4d85ff Compare May 20, 2024 13:09

rahul-tuli and others added 6 commits May 20, 2024 14:56

Split SparseGPT and GPTQ modifiers (#2272)

5dd9985

* Update OBCQ * Extract GPTQ Modifier

[GPTQ Modifier UX] Update tests to use GPTQModifier for obcq style qu…

c695567

…antization (#2294) * Update OBCQ * Extract GPTQ Modifier * Update test recipes

Fix default case

227cf8e

Update test to use new vLLMQuantizationModifier

876e5ae

Style

a7f4eef

rahul-tuli mentioned this pull request May 21, 2024

[GPTQ Modifier UX] Add default scheme neuralmagic/compressed-tensors#61

Merged

Satrat approved these changes May 21, 2024

View reviewed changes

Merge branch 'main' into quant-modifier-ux

c062a6b

bfineran approved these changes May 22, 2024

View reviewed changes

bfineran merged commit c24e97f into main May 22, 2024
14 of 17 checks passed

bfineran deleted the quant-modifier-ux branch May 22, 2024 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Branch] Quant modifier UX #2263

[Feature Branch] Quant modifier UX #2263

rahul-tuli commented May 2, 2024 •

edited

Loading

Satrat left a comment

[Feature Branch] Quant modifier UX #2263

[Feature Branch] Quant modifier UX #2263

Conversation

rahul-tuli commented May 2, 2024 • edited Loading

Quantization Modifier UX Update

Description

Reference Documentation

Satrat left a comment

Choose a reason for hiding this comment

rahul-tuli commented May 2, 2024 •

edited

Loading