NV-ModelOPT INT4 quantization #1135

riyadshairi979 · 2024-05-03T22:57:28Z

Describe your changes

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
Is this PR including examples changes? If yes, please remember to update example documentation in a follow-up PR.

Tests

Unit test: pytest test/unit_test/passes/onnx/test_nvmo_quantization.py
Example: python -m olive.workflows.run --config bert_nvmo_ptq.json

(Optional) Issue link

olive/passes/onnx/nvmo_quantization.py

examples/bert/bert_nvmo_ptq.json

test/unit_test/passes/onnx/test_nvmo_quantization.py

riyadshairi979 · 2024-05-04T19:40:53Z

@microsoft-github-policy-service agree company="Nvidia"

olive/passes/onnx/nvmo_quantization.py

test/unit_test/passes/onnx/test_nvmo_quantization.py

olive/passes/onnx/nvmo_quantization.py

trajepl · 2024-05-06T10:31:51Z

/azp run

azure-pipelines · 2024-05-06T10:32:02Z

Azure Pipelines successfully started running 1 pipeline(s).

test/requirements-test.txt

jambayk · 2024-05-06T17:59:06Z

/azp run

azure-pipelines · 2024-05-06T17:59:16Z

Azure Pipelines successfully started running 1 pipeline(s).

riyadshairi979 · 2024-05-06T18:13:55Z

/azp run

azure-pipelines · 2024-05-06T18:14:01Z

Commenter does not have sufficient privileges for PR 1135 in repo microsoft/Olive

examples/bert/nv_user_script.py

jambayk · 2024-05-06T18:20:45Z

/azp run

azure-pipelines · 2024-05-06T18:20:55Z

Azure Pipelines successfully started running 1 pipeline(s).

guotuofeng · 2024-05-07T23:44:18Z

/azp run

azure-pipelines · 2024-05-07T23:44:27Z

Azure Pipelines successfully started running 1 pipeline(s).

olive/passes/onnx/nvmo_quantization.py

examples/bert/bert_nvmo_ptq.json

olive/passes/onnx/nvmo_quantization.py

examples/bert/bert_nvmo_ptq.json

jambayk · 2024-05-08T20:19:39Z

/azp run

azure-pipelines · 2024-05-08T20:19:53Z

Azure Pipelines successfully started running 1 pipeline(s).

jambayk

Thanks for the contribution! Will merge once the CI passes.

jambayk · 2024-05-08T20:22:50Z

/azp run

azure-pipelines · 2024-05-08T20:23:02Z

Azure Pipelines successfully started running 1 pipeline(s).

riyadshairi979 · 2024-05-08T21:22:06Z

examples/bert/README.md

@@ -99,6 +100,14 @@ This workflow performs BERT optimization on GPU with CUDA/TensorRT. It performs
 2. TensorRT: `TensorrtExecutionProvider`
    - *PyTorch Model -> Onnx Model -> ONNX Runtime performance tuning with trt_fp16_enable*
    Config file: [bert_trt_gpu.json](bert_trt_gpu.json)
+
+### BERT optimization with TensorRT-Model-Optimizer on CPU/GPU


@jambayk should I add something like below about deployment?
Updated:
Users can deploy the quantized ONNX model using TensorRT 10.x but that is not supported in ORT right now, stay tuned! or Deployment support for TensorRT-Model-Optimizer quantized models is coming soon in ORT, in the meantime try TensorRT 10.x

The comment about the deployment would probably be about ORT and not Olive? Since olive only produces the model and it gets deployed using ort, tensorrt or some other engine. I will let @EmmaNingMS comment on this.

Vote for this version if ORT support is in the plan. "Deployment support for TensorRT-Model-Optimizer quantized models is coming soon in ORT, in the meantime try TensorRT 10.x"

Does ORT with TRT EP support TensorRT-Model-Optimizer quantized models? It is expected that ORT-TRT has equivalent capabilities to the TRT engine. If not, what's the gap for ORT-TRT to run TensorRT-Model-Optimizer quantized models? Will Nvidia teams support that?

@EmmaNingMS I will ask my team about the roadmap regarding these supports. Thanks.

olive/passes/onnx/nvmo_quantization.py

jambayk · 2024-05-09T17:04:08Z

/azp run

azure-pipelines · 2024-05-09T17:04:18Z

Azure Pipelines successfully started running 1 pipeline(s).

jambayk

Thanks!

## Describe your changes ## Checklist before requesting a review - [x] Add unit tests for this change. - [x] Make sure all tests can pass. - [x] Update documents if necessary. - [x] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. - [ ] Is this PR including examples changes? If yes, please remember to update [example documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md) in a follow-up PR. ## Tests - Unit test: `pytest test/unit_test/passes/onnx/test_nvmo_quantization.py` - Example: `python -m olive.workflows.run --config bert_nvmo_ptq.json` ## (Optional) Issue link

jambayk reviewed May 3, 2024

View reviewed changes

olive/passes/onnx/nvmo_quantization.py Outdated Show resolved Hide resolved

jambayk reviewed May 3, 2024

View reviewed changes

olive/passes/onnx/nvmo_quantization.py Outdated Show resolved Hide resolved

jambayk reviewed May 3, 2024

View reviewed changes

examples/bert/bert_nvmo_ptq.json Outdated Show resolved Hide resolved

jambayk reviewed May 3, 2024

View reviewed changes

test/unit_test/passes/onnx/test_nvmo_quantization.py Show resolved Hide resolved

riyadshairi979 force-pushed the rislam/nv-ptq branch 3 times, most recently from 9b80683 to 803301e Compare May 4, 2024 19:05

riyadshairi979 force-pushed the rislam/nv-ptq branch 2 times, most recently from b8f6abc to 3a53972 Compare May 4, 2024 21:09

jambayk reviewed May 4, 2024

View reviewed changes

olive/passes/onnx/nvmo_quantization.py Outdated Show resolved Hide resolved

NV-ModelOPT INT4 quantization

7537fcf

riyadshairi979 force-pushed the rislam/nv-ptq branch from 3a53972 to 7537fcf Compare May 4, 2024 21:19

jambayk reviewed May 4, 2024

View reviewed changes

test/unit_test/passes/onnx/test_nvmo_quantization.py Show resolved Hide resolved

jambayk reviewed May 4, 2024

View reviewed changes

olive/passes/onnx/nvmo_quantization.py Outdated Show resolved Hide resolved

riyadshairi979 force-pushed the rislam/nv-ptq branch from 1cbe80d to fdee352 Compare May 6, 2024 07:15

jambayk reviewed May 6, 2024

View reviewed changes

test/requirements-test.txt Outdated Show resolved Hide resolved

riyadshairi979 force-pushed the rislam/nv-ptq branch from fdee352 to a9c3bf4 Compare May 6, 2024 17:40

riyadshairi979 force-pushed the rislam/nv-ptq branch from a9c3bf4 to e48924d Compare May 6, 2024 18:11

jambayk reviewed May 6, 2024

View reviewed changes

examples/bert/nv_user_script.py Show resolved Hide resolved

Addressing comments and cleanup

677794f

riyadshairi979 force-pushed the rislam/nv-ptq branch from e48924d to 677794f Compare May 6, 2024 18:24

riyadshairi979 force-pushed the rislam/nv-ptq branch from 281e97e to 813c288 Compare May 7, 2024 20:01

Update requirements

a1cc8bc

riyadshairi979 force-pushed the rislam/nv-ptq branch from 813c288 to a1cc8bc Compare May 7, 2024 20:22

Removing pass capability

421f879

trajepl reviewed May 8, 2024

View reviewed changes

olive/passes/onnx/nvmo_quantization.py Show resolved Hide resolved

jambayk reviewed May 8, 2024

View reviewed changes

examples/bert/bert_nvmo_ptq.json Outdated Show resolved Hide resolved

jambayk reviewed May 8, 2024

View reviewed changes

olive/passes/onnx/nvmo_quantization.py Show resolved Hide resolved

jambayk reviewed May 8, 2024

View reviewed changes

examples/bert/bert_nvmo_ptq.json Show resolved Hide resolved

riyadshairi979 force-pushed the rislam/nv-ptq branch from 36baee1 to 0989518 Compare May 8, 2024 20:18

jambayk previously approved these changes May 8, 2024

View reviewed changes

riyadshairi979 dismissed jambayk’s stale review via 339a3e9 May 8, 2024 20:20

riyadshairi979 force-pushed the rislam/nv-ptq branch from 0989518 to 339a3e9 Compare May 8, 2024 20:20

riyadshairi979 force-pushed the rislam/nv-ptq branch 2 times, most recently from d83156d to 171bfe0 Compare May 8, 2024 20:24

riyadshairi979 commented May 8, 2024

View reviewed changes

riyadshairi979 force-pushed the rislam/nv-ptq branch from 171bfe0 to d55689a Compare May 9, 2024 02:08

Minor fixes and readme update

c662435

riyadshairi979 force-pushed the rislam/nv-ptq branch from d55689a to c662435 Compare May 9, 2024 02:10

jambayk reviewed May 9, 2024

View reviewed changes

olive/passes/onnx/nvmo_quantization.py Show resolved Hide resolved

jambayk approved these changes May 9, 2024

View reviewed changes

jambayk merged commit 40845a3 into microsoft:main May 9, 2024
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NV-ModelOPT INT4 quantization #1135

NV-ModelOPT INT4 quantization #1135

riyadshairi979 commented May 3, 2024 •

edited

Loading

riyadshairi979 commented May 4, 2024

trajepl commented May 6, 2024

azure-pipelines bot commented May 6, 2024

jambayk commented May 6, 2024

azure-pipelines bot commented May 6, 2024

riyadshairi979 commented May 6, 2024

azure-pipelines bot commented May 6, 2024

jambayk commented May 6, 2024

azure-pipelines bot commented May 6, 2024

guotuofeng commented May 7, 2024

azure-pipelines bot commented May 7, 2024

jambayk commented May 8, 2024

azure-pipelines bot commented May 8, 2024

jambayk left a comment

jambayk commented May 8, 2024

azure-pipelines bot commented May 8, 2024

riyadshairi979 May 8, 2024 •

edited

Loading

jambayk May 8, 2024

EmmaNingMS May 8, 2024 •

edited

Loading

riyadshairi979 May 9, 2024 •

edited

Loading

jambayk commented May 9, 2024

azure-pipelines bot commented May 9, 2024

jambayk left a comment

NV-ModelOPT INT4 quantization #1135

NV-ModelOPT INT4 quantization #1135

Conversation

riyadshairi979 commented May 3, 2024 • edited Loading

Describe your changes

Checklist before requesting a review

Tests

(Optional) Issue link

riyadshairi979 commented May 4, 2024

trajepl commented May 6, 2024

azure-pipelines bot commented May 6, 2024

jambayk commented May 6, 2024

azure-pipelines bot commented May 6, 2024

riyadshairi979 commented May 6, 2024

azure-pipelines bot commented May 6, 2024

jambayk commented May 6, 2024

azure-pipelines bot commented May 6, 2024

guotuofeng commented May 7, 2024

azure-pipelines bot commented May 7, 2024

jambayk commented May 8, 2024

azure-pipelines bot commented May 8, 2024

jambayk left a comment

Choose a reason for hiding this comment

jambayk commented May 8, 2024

azure-pipelines bot commented May 8, 2024

riyadshairi979 May 8, 2024 • edited Loading

Choose a reason for hiding this comment

jambayk May 8, 2024

Choose a reason for hiding this comment

EmmaNingMS May 8, 2024 • edited Loading

Choose a reason for hiding this comment

riyadshairi979 May 9, 2024 • edited Loading

Choose a reason for hiding this comment

jambayk commented May 9, 2024

azure-pipelines bot commented May 9, 2024

jambayk left a comment

Choose a reason for hiding this comment

riyadshairi979 commented May 3, 2024 •

edited

Loading

riyadshairi979 May 8, 2024 •

edited

Loading

EmmaNingMS May 8, 2024 •

edited

Loading

riyadshairi979 May 9, 2024 •

edited

Loading