-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NV-ModelOPT INT4 quantization #1135
Conversation
9b80683
to
803301e
Compare
@microsoft-github-policy-service agree company="Nvidia" |
b8f6abc
to
3a53972
Compare
3a53972
to
7537fcf
Compare
1cbe80d
to
fdee352
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
fdee352
to
a9c3bf4
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
a9c3bf4
to
e48924d
Compare
/azp run |
Commenter does not have sufficient privileges for PR 1135 in repo microsoft/Olive |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
e48924d
to
677794f
Compare
281e97e
to
813c288
Compare
813c288
to
a1cc8bc
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
36baee1
to
0989518
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Will merge once the CI passes.
0989518
to
339a3e9
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
d83156d
to
171bfe0
Compare
@@ -99,6 +100,14 @@ This workflow performs BERT optimization on GPU with CUDA/TensorRT. It performs | |||
2. TensorRT: `TensorrtExecutionProvider` | |||
- *PyTorch Model -> Onnx Model -> ONNX Runtime performance tuning with trt_fp16_enable* | |||
Config file: [bert_trt_gpu.json](bert_trt_gpu.json) | |||
|
|||
### BERT optimization with TensorRT-Model-Optimizer on CPU/GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jambayk should I add something like below about deployment?
Updated:
Users can deploy the quantized ONNX model using TensorRT 10.x but that is not supported in ORT right now, stay tuned!
or Deployment support for TensorRT-Model-Optimizer quantized models is coming soon in ORT, in the meantime try TensorRT 10.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about the deployment would probably be about ORT and not Olive? Since olive only produces the model and it gets deployed using ort, tensorrt or some other engine. I will let @EmmaNingMS comment on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vote for this version if ORT support is in the plan. "Deployment support for TensorRT-Model-Optimizer quantized models is coming soon in ORT, in the meantime try TensorRT 10.x"
Does ORT with TRT EP support TensorRT-Model-Optimizer quantized models? It is expected that ORT-TRT has equivalent capabilities to the TRT engine. If not, what's the gap for ORT-TRT to run TensorRT-Model-Optimizer quantized models? Will Nvidia teams support that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EmmaNingMS I will ask my team about the roadmap regarding these supports. Thanks.
171bfe0
to
d55689a
Compare
d55689a
to
c662435
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
## Describe your changes ## Checklist before requesting a review - [x] Add unit tests for this change. - [x] Make sure all tests can pass. - [x] Update documents if necessary. - [x] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. - [ ] Is this PR including examples changes? If yes, please remember to update [example documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md) in a follow-up PR. ## Tests - Unit test: `pytest test/unit_test/passes/onnx/test_nvmo_quantization.py` - Example: `python -m olive.workflows.run --config bert_nvmo_ptq.json` ## (Optional) Issue link
Describe your changes
Checklist before requesting a review
lintrunner -a
Tests
pytest test/unit_test/passes/onnx/test_nvmo_quantization.py
python -m olive.workflows.run --config bert_nvmo_ptq.json
(Optional) Issue link