`tensorrt_yolo` sample yolov5 model throw error on inference #1647

HaoruXue · 2022-08-22T07:46:53Z

Checklist

I've read the contribution guidelines.
I've searched other issues and no duplicate issues were found.
I'm convinced that this is not my fault but a bug.

Description

The tensorrt_yolo package links to a couple of YoloV5 ONNX models. I converted the yolov5l model to .engine and run it, but the node throws error immediately:

[INFO] [1661150089.510299859] [tensorrt_yolo]: Found /home/haoru/autoware_interface/install/autoware_perception_launch/share/autoware_perception_launch/config/model/yolov5l.engine
[INFO] [1661150091.209141479] [tensorrt_yolo]: Inference engine prepared.
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaErrorIllegalAddress (700)@/home/haoru/autoware_interface/src/universe/autoware.universe/perception/tensorrt_yolo/lib/src/trt_yolo.cpp#L304: an illegal memory access was encountered

compute-sanitizer reports the illegal memory access comes from enqueueV2 on line 304. A quick Google around and this is a known old issue with Yolov5 and AutoShape:

The issue goes away when I download pre-trained models off PyTorch Hub and convert it to TensorRT using the scripts provided by the ultralytics repo:

python export.py --weights yolov5s.pt --include engine

Expected behavior

CUDA should not throw error

Actual behavior

CUDA throws illegal memory access error

Steps to reproduce

Convert the yolov5l model linked in tensorrt_yolo to TensorRT trtexec --onnx=yolov5l.onnx --saveEngine=yolov5l.engine
Run the tensorrt_yolo node
Expect throw

Versions

OS: Ubuntu 20.04
TensorRT Version: 8.4.1-1+cuda11.6

Possible causes

I'm not a pro in TensorRT but here are a couple of potential causes:

The model file was generated before the fix in AutoShape Usage ultralytics/yolov5#7128
Both .engine and .onnx must be generated at the same time using the given method mentioned in AutoShape Usage ultralytics/yolov5#7128:

python export.py --weights yolov5s.pt --include engine

It would be great if someone can explain where the linked model comes from, and update it if necessary.

Also I'm not sure what I'm doing to the linked model is the right way to run inference. It would be great if more documentation could be linked on the model conversions.

Additional context

No response

The text was updated successfully, but these errors were encountered:

BonoloAWF · 2022-08-22T15:07:06Z

@HaoruXue there was previously a discussion about prompting the user to download the necessary files required by any ML models or inference frameworks. A similar solution could be provided for this bug to prevent the CUDA error. Check autowarefoundation/autoware#2508

mitsudome-r · 2022-08-23T15:59:24Z

I think .engine file will be automatically created if you specify the onnex file in the launch file.
@wep21 do you know how to solve this issue?

HaoruXue · 2022-08-24T03:31:09Z

@mitsudome-r I tested not converting the onnx model in advance and now it works. Maybe it is something worth documenting that the package would download model directly and convert onnx upon first launch

HaoruXue · 2022-08-27T04:18:23Z

After discussing with Mitsudome-san I'll submit a PR for documentation changes in tensorrt_yolo.

wep21 · 2022-08-27T05:34:53Z

@HaoruXue @mitsudome-r
I added some patch to official yolov5 onnx to fit the current tensorrt_yolo implementation. Do you need a converter script?

HaoruXue · 2022-08-30T00:18:22Z

@wep21 if my understanding is correct, currently you need to run the node once to convert the onnx to tensorrt engine. For the sake of deployment are there alternative ways to do it that makes this happen at a earlier stage? Maybe running a converter script in the build process?

HaoruXue added type:bug Software flaws or errors. type:documentation Creating or refining documentation. (auto-assigned) labels Aug 22, 2022

Shin-kyoto added the component:perception Advanced sensor data processing and environment understanding. (auto-assigned) label Aug 25, 2022

HaoruXue mentioned this issue Sep 1, 2022

feat(tensorrt_yolo): update Readme about model conversion #1749

Merged

4 tasks

mitsudome-r self-assigned this Sep 13, 2022

xmfcx closed this as completed in #1749 Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tensorrt_yolo` sample yolov5 model throw error on inference #1647

`tensorrt_yolo` sample yolov5 model throw error on inference #1647

HaoruXue commented Aug 22, 2022 •

edited

Loading

BonoloAWF commented Aug 22, 2022

mitsudome-r commented Aug 23, 2022

HaoruXue commented Aug 24, 2022

HaoruXue commented Aug 27, 2022

wep21 commented Aug 27, 2022

HaoruXue commented Aug 30, 2022

tensorrt_yolo sample yolov5 model throw error on inference #1647

tensorrt_yolo sample yolov5 model throw error on inference #1647

Comments

HaoruXue commented Aug 22, 2022 • edited Loading

Checklist

Description

Expected behavior

Actual behavior

Steps to reproduce

Versions

Possible causes

Additional context

BonoloAWF commented Aug 22, 2022

mitsudome-r commented Aug 23, 2022

HaoruXue commented Aug 24, 2022

HaoruXue commented Aug 27, 2022

wep21 commented Aug 27, 2022

HaoruXue commented Aug 30, 2022

`tensorrt_yolo` sample yolov5 model throw error on inference #1647

`tensorrt_yolo` sample yolov5 model throw error on inference #1647

HaoruXue commented Aug 22, 2022 •

edited

Loading