Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird Results After Exporting to TensorRT FP16 #104

Open
Gabriellm2003 opened this issue Oct 26, 2023 · 8 comments
Open

Weird Results After Exporting to TensorRT FP16 #104

Gabriellm2003 opened this issue Oct 26, 2023 · 8 comments

Comments

@Gabriellm2003
Copy link

I trained a model with a custom dataset using the PyTorch code from this repository. The training went well, and the Torch model worked as expected. After this test, I tried to export the model to ONNX. Again, everything went well, and the model worked as expected. Lastly, I tried to export the model to TensorRT. I exported two models, one using FP16 precision and the second using FP32 precision. There were no error logs during the export procedure.

When I tested the models, the FP32 one generated the same results as the ONNX model, while the FP16 one generated very distinct results compared to them. I noticed that the results from the FP16 model contained multiple (quite a few) bounding boxes for the same object. I found these differences between the models quite strange, considering that I did this procedure on a variety of different models, and the impact on the results was minimal. I suppose those differences could be removed by applying non-maximum suppression (NMS), but I didn't want to do that.

Does anyone know what might be causing this? Or at least how to fix it?

About some of the configurations that I used:

ONNX:

Exported using opset=16
onnx==1.14.0
onnxruntime==1.15.1
onnxsim==0.4.33
torch==2.0.1
torchvision==0.15.2

TensorRT:
I used the container nvcr.io/nvidia/tensorrt:23.01-py3, which includes:
TensorRT==8.5.2.2

@lyuwenyu
Copy link
Owner

I convert onnx to tensort engine, then using this code to do infernece, it works fine for me.

You also can try third-part tools to check your model firstly, see some resource in this discussion #95

@Gabriellm2003
Copy link
Author

Gabriellm2003 commented Oct 27, 2023

I ran additional tests here. Considering the models that were generated in the same training procedure, the issue of multiple detections using the FP16 TensorRT model does not occur with the models generated after the initial epochs.
I am curious if this might be related to the torch.amp.
Upon checking, I found that the parameters I used for it during training are:

use_amp: False

scaler:
  type: GradScaler
  enabled: True

I wonder if this problem would be solved if I activate the use_amp parameter.

@lyuwenyu
Copy link
Owner

use_amp only relate to traing processing.

@Gabriellm2003
Copy link
Author

Yes. In the case, I wonder if this problem would be solved if I activate the use_amp and re-train the model.

@lyuwenyu
Copy link
Owner

I think it does not.

@Gabriellm2003
Copy link
Author

Sorry for the delay.
I double-checked the logs from the TensorRT (trtexec) conversion and found these warnings.

[W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[W] [TRT] parsers/onnx/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[W] Dynamic dimensions required for input: orig_target_sizes, but no shapes were provided. Automatically overriding shape to: 1x2
[W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[W] [TRT] Check verbose logs for the list of affected weights.
[W] [TRT] - 1 weights are affected by this issue: Detected FP32 infinity values and converted them to corresponding FP16 infinity.
[W] [TRT] - 208 weights are affected by this issue: Detected subnormal FP16 values.
[W] [TRT] - 51 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[W] [TRT] - 6 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value.
[W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.

The warnings are probably related to the problem that I am facing.
Do you have any idea how I can solve this issue?

@shreejalt
Copy link

I convert onnx to tensort engine, then using this code to do infernece, it works fine for me.

You also can try third-part tools to check your model firstly, see some resource in this discussion #95

How can we do inference using the tensorrt model?

@shreejalt
Copy link

@lyuwenyu
How to do inference with the TRTInferer? Do we need to directly pass the image or do we need to preprocess the image before passing the blob?

Also if we need to preprocess the image can you give me the exact code which we need to use before passing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants