models/export.py setting model.train() may be changed to false in onnx #3346

ChaofWang · 2021-05-26T09:17:42Z

🐛 Bug

I think you may want to have no grid construction in Detect layer by setting model.train() when export model to onnx with --train. But in "torch.onnx.export", model.training seems to be reset to mode.training=False by default.

In fact, the param 'training' in the "torch.onnx.export" can be set to "training=torch.onnx.TrainingMode.TRAINING", so that the model can be set to train for conversion instead of using model.train(). But it doesn't seem to be a good way to be recommended. And I found that this part was correct before version 4.0, but after version 5.0, the "export" was deleted in Detect.

To Reproduce (REQUIRED)

Input:

python models/export.py --weights yolov5s.pt --img 640 --batch 1 --train

Output:

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:

Expected behavior

This part should be skipped.

yolov5/models/yolo.py

Lines 50 to 62 in aad99b6

    
           if not self.training:  # inference 
        
               if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic: 
        
                   self.grid[i] = self._make_grid(nx, ny).to(x[i].device) 
        
               y = x[i].sigmoid() 
        
               if self.inplace: 
        
                   y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy 
        
                   y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh 
        
               else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953 
        
                   xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy 
        
                   wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2)  # wh 
        
                   y = torch.cat((xy, wh, y[..., 4:]), -1) 
        
               z.append(y.view(bs, -1, self.no))

Environment

If applicable, add screenshots to help explain your problem.

OS: Ubuntu18.04
GPU: 3080
PyTorch 1.8.0+cu111
onnx 1.9.0

The text was updated successfully, but these errors were encountered:

github-actions · 2021-05-26T09:18:24Z

👋 Hello @ChaofWang, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2021-05-26T10:17:39Z

@ChaofWang hi, thanks for the bug report! You are correct, ONNX export is not operating in train mode for some reason. It seems the ONNX exporter is forcing it back into eval mode.

glenn-jocher · 2021-05-26T10:28:50Z

@ChaofWang I've implemented a solution per your recommendation by adding two arguments to the onnx export function in export.py L100:

training=torch.onnx.TrainingMode.TRAINING if opt.train else torch.onnx.TrainingMode.EVAL,
do_constant_folding=not opt.train,

Results are here:

If this solution works for you please submit a PR with this update, thank you!

glenn-jocher · 2021-05-26T10:30:22Z

TODO: ONNX export in .train() mode fix

ChaofWang · 2021-05-27T03:00:38Z

@glenn-jocher hi, this solution works for me. I have submitted PR for this update

glenn-jocher · 2021-05-27T12:10:50Z

@ChaofWang good news 😃! Your original issue may now be fixed ✅ in PR #3362. To receive this update:

Git – git pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
PyTorch Hub – Force-reload with model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
Notebooks – View updated notebooks
Docker – sudo docker pull ultralytics/yolov5:latest to update your image

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

ChaofWang added the bug Something isn't working label May 26, 2021

glenn-jocher added the TODO label May 26, 2021

ChaofWang mentioned this issue May 27, 2021

ONNX export in .train() mode fix #3362

Merged

glenn-jocher linked a pull request May 27, 2021 that will close this issue

ONNX export in .train() mode fix #3362

Merged

glenn-jocher removed the TODO label May 27, 2021

glenn-jocher closed this as completed May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models/export.py setting model.train() may be changed to false in onnx #3346

models/export.py setting model.train() may be changed to false in onnx #3346

ChaofWang commented May 26, 2021

github-actions bot commented May 26, 2021 •

edited by glenn-jocher

Loading

glenn-jocher commented May 26, 2021

glenn-jocher commented May 26, 2021 •

edited

Loading

glenn-jocher commented May 26, 2021

ChaofWang commented May 27, 2021

glenn-jocher commented May 27, 2021

models/export.py setting model.train() may be changed to false in onnx #3346

models/export.py setting model.train() may be changed to false in onnx #3346

Comments

ChaofWang commented May 26, 2021

🐛 Bug

To Reproduce (REQUIRED)

Expected behavior

Environment

github-actions bot commented May 26, 2021 • edited by glenn-jocher Loading

Requirements

Environments

Status

glenn-jocher commented May 26, 2021

glenn-jocher commented May 26, 2021 • edited Loading

glenn-jocher commented May 26, 2021

ChaofWang commented May 27, 2021

glenn-jocher commented May 27, 2021

github-actions bot commented May 26, 2021 •

edited by glenn-jocher

Loading

glenn-jocher commented May 26, 2021 •

edited

Loading