Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorrt_yolo sample yolov5 model throw error on inference #1647

Closed
3 tasks done
HaoruXue opened this issue Aug 22, 2022 · 6 comments · Fixed by #1749
Closed
3 tasks done

tensorrt_yolo sample yolov5 model throw error on inference #1647

HaoruXue opened this issue Aug 22, 2022 · 6 comments · Fixed by #1749
Assignees
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) type:bug Software flaws or errors. type:documentation Creating or refining documentation. (auto-assigned)

Comments

@HaoruXue
Copy link
Contributor

HaoruXue commented Aug 22, 2022

Checklist

  • I've read the contribution guidelines.
  • I've searched other issues and no duplicate issues were found.
  • I'm convinced that this is not my fault but a bug.

Description

The tensorrt_yolo package links to a couple of YoloV5 ONNX models. I converted the yolov5l model to .engine and run it, but the node throws error immediately:

[INFO] [1661150089.510299859] [tensorrt_yolo]: Found /home/haoru/autoware_interface/install/autoware_perception_launch/share/autoware_perception_launch/config/model/yolov5l.engine
[INFO] [1661150091.209141479] [tensorrt_yolo]: Inference engine prepared.
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaErrorIllegalAddress (700)@/home/haoru/autoware_interface/src/universe/autoware.universe/perception/tensorrt_yolo/lib/src/trt_yolo.cpp#L304: an illegal memory access was encountered

compute-sanitizer reports the illegal memory access comes from enqueueV2 on line 304. A quick Google around and this is a known old issue with Yolov5 and AutoShape:

The issue goes away when I download pre-trained models off PyTorch Hub and convert it to TensorRT using the scripts provided by the ultralytics repo:

python export.py --weights yolov5s.pt --include engine

Expected behavior

CUDA should not throw error

Actual behavior

CUDA throws illegal memory access error

Steps to reproduce

  1. Convert the yolov5l model linked in tensorrt_yolo to TensorRT trtexec --onnx=yolov5l.onnx --saveEngine=yolov5l.engine
  2. Run the tensorrt_yolo node
  3. Expect throw

Versions

  • OS: Ubuntu 20.04
  • TensorRT Version: 8.4.1-1+cuda11.6

Possible causes

I'm not a pro in TensorRT but here are a couple of potential causes:

  1. The model file was generated before the fix in AutoShape Usage ultralytics/yolov5#7128
  2. Both .engine and .onnx must be generated at the same time using the given method mentioned in AutoShape Usage ultralytics/yolov5#7128:
python export.py --weights yolov5s.pt --include engine

It would be great if someone can explain where the linked model comes from, and update it if necessary.

Also I'm not sure what I'm doing to the linked model is the right way to run inference. It would be great if more documentation could be linked on the model conversions.

Additional context

No response

@HaoruXue HaoruXue added type:bug Software flaws or errors. type:documentation Creating or refining documentation. (auto-assigned) labels Aug 22, 2022
@BonoloAWF
Copy link

@HaoruXue there was previously a discussion about prompting the user to download the necessary files required by any ML models or inference frameworks. A similar solution could be provided for this bug to prevent the CUDA error. Check autowarefoundation/autoware#2508

@mitsudome-r
Copy link
Member

I think .engine file will be automatically created if you specify the onnex file in the launch file.
@wep21 do you know how to solve this issue?

@HaoruXue
Copy link
Contributor Author

@mitsudome-r I tested not converting the onnx model in advance and now it works. Maybe it is something worth documenting that the package would download model directly and convert onnx upon first launch

@Shin-kyoto Shin-kyoto added the component:perception Advanced sensor data processing and environment understanding. (auto-assigned) label Aug 25, 2022
@HaoruXue
Copy link
Contributor Author

After discussing with Mitsudome-san I'll submit a PR for documentation changes in tensorrt_yolo.

@wep21
Copy link
Contributor

wep21 commented Aug 27, 2022

@HaoruXue @mitsudome-r
I added some patch to official yolov5 onnx to fit the current tensorrt_yolo implementation. Do you need a converter script?

@HaoruXue
Copy link
Contributor Author

@wep21 if my understanding is correct, currently you need to run the node once to convert the onnx to tensorrt engine. For the sake of deployment are there alternative ways to do it that makes this happen at a earlier stage? Maybe running a converter script in the build process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) type:bug Software flaws or errors. type:documentation Creating or refining documentation. (auto-assigned)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants