Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add features to the YOLO model from the latest YOLO variants #817

Merged
merged 103 commits into from
May 29, 2023
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
9adb664
Added features from latest YOLO versions
senarvi Jan 20, 2022
476adb9
Fixed ONNX export
senarvi Jan 25, 2022
8d70ca1
meshgrid() call made future-proof by using the indexing argument
senarvi Jan 27, 2022
35a98ba
torch.jit.script fails with a lambda function
senarvi Jan 27, 2022
a91793c
YOLOV4Tiny, YOLOV5, and YOLOX network architectures in plain PyTorch
senarvi Mar 7, 2022
b237099
Improvements to YOLO
senarvi Mar 28, 2022
82f4de1
YOLO output layer name includes the number of outputs
senarvi Mar 29, 2022
8a201a0
Complete type hints
senarvi Apr 14, 2022
2737fa4
Updated CHANGELOG.
senarvi Apr 14, 2022
26987eb
Torchvision import made conditional
senarvi Apr 14, 2022
09bce80
Use expand() instead of broadcast_to() for backward compatibility
senarvi Apr 28, 2022
471107f
Merge branch 'master' into yolo-update
senarvi Apr 28, 2022
a80536e
Use pytorch_lightning.utilities.distributed if pytorch_lightning.util…
senarvi Apr 28, 2022
b1b8db3
YOLOV4P6 network architecture
senarvi Jun 8, 2022
f499b76
Merge branch 'master' into yolo-update
senarvi Jun 8, 2022
5661dba
Fixed document generation, when MeanAveragePrecision is not available
senarvi Jun 8, 2022
0ad5867
Use arxiv URL to avoid a too long line
senarvi Jun 8, 2022
84a949f
Use torch.div() instead of //
senarvi Jun 20, 2022
cf1646c
remove under_review decorators
redleaf-kim Aug 2, 2022
fecf88c
add yolo cfg with giou & update related test function
redleaf-kim Aug 2, 2022
b5abc8f
add serveral yolo config & layers function test
redleaf-kim Aug 2, 2022
198ebc1
Merge branch 'Lightning-AI:master' into yolo_review
redleaf-kim Aug 2, 2022
08f17f7
remove unused import & variable
redleaf-kim Aug 3, 2022
db2601a
add type hints
redleaf-kim Aug 9, 2022
fe38bb7
remove and merge duplicated test
redleaf-kim Aug 9, 2022
7da9d4a
improve readability
redleaf-kim Aug 9, 2022
8c3ed4e
Merge remote-tracking branch 'origin/yolo_review' into yolo_review
redleaf-kim Aug 9, 2022
8f69419
Merge branch 'master' into yolo_review
otaj Aug 12, 2022
63c7eef
Use distance_box_iou(), complete_box_iou() and the corresponding loss…
senarvi Aug 19, 2022
f6af0a4
Merge branch 'master' into yolo-update
senarvi Aug 19, 2022
b25a864
Merge branch 'master' into yolo_review
otaj Sep 15, 2022
9ff86ab
Merge branch 'master' into yolo_review
otaj Sep 16, 2022
17fab64
add catch_warning fixture
Sep 19, 2022
353f119
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 19, 2022
a3445ac
fix pytest error; indexing argument will be required to pass in upcom…
redleaf-kim Sep 19, 2022
189346c
fix pytest catch_warnings; MisconfigurationException error
redleaf-kim Sep 19, 2022
a1d97b6
fix pytest error
redleaf-kim Sep 19, 2022
d5b5fb9
Merge remote-tracking branch 'origin/yolo_review' into yolo_review
redleaf-kim Sep 19, 2022
0b4eca4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 19, 2022
b52ab5b
Merge branch 'master' into yolo_review
Borda Sep 19, 2022
eb9930e
Fix most obvious CI failings
Sep 19, 2022
fdf38fb
fix test with a missing warning
Sep 19, 2022
28813ac
Refactoring
senarvi Sep 21, 2022
9db3f9d
Merge branch 'master' into yolo-update
senarvi Sep 21, 2022
a42cdec
resolve accidentally introduced errors
Sep 21, 2022
d0c68ed
Merge branch 'master' into yolo-update
senarvi Oct 6, 2022
1d324e5
infer() returns the model to the previous mode
senarvi Oct 6, 2022
374a3ec
CLI YOLO application uses the YOLOv4 architecture, if a Darknet confi…
senarvi Oct 6, 2022
737ec64
Minor documentation improvements
senarvi Oct 6, 2022
d534cfa
add catch_warnings
Oct 11, 2022
57c9baf
Merge branch 'master' into yolo_review
Oct 11, 2022
390578d
Merge branch 'yolo-update' into yolo_review
senarvi Oct 12, 2022
7bd8d34
Fixed a typo
senarvi Oct 12, 2022
ddc4a46
Fixed unit tests and added catch_warnings to all tests.
senarvi Oct 12, 2022
0b6a6d0
Merge branch 'yolo-update' of github.com:groke-technologies/pytorch-l…
senarvi Oct 12, 2022
37a3f7c
Added a README and documentation for YOLO
senarvi Oct 14, 2022
de69cb2
YOLO tests use giou loss, which is available in Torchvision 0.12
senarvi Oct 15, 2022
ff2a521
Fixed type annotation
senarvi Oct 15, 2022
a040ffd
Removed unused import
senarvi Oct 15, 2022
2bc8a31
Check typing for YOLO
senarvi Oct 15, 2022
cc21337
Fixed hyperlinks
senarvi Oct 17, 2022
91ed315
Merge branch 'master' into yolo-update
senarvi Oct 23, 2022
4083bd2
Fixed mypy errors
senarvi Oct 23, 2022
8457fee
Removed iou and giou metrics and losses, as these are provided by Tor…
senarvi Oct 24, 2022
06e9399
Merge branch 'master' into yolo-update
senarvi Oct 28, 2022
207cfc1
Merge branch 'master' into yolo-update
senarvi Oct 31, 2022
cfc3a4a
Merge branch 'master' into yolo-update
senarvi Nov 3, 2022
568f390
Merge branch 'master' into yolo-update
senarvi Nov 7, 2022
8d4a3b4
Fixed by mdformat
senarvi Nov 7, 2022
d57055a
Avoid using a lambda function
senarvi Nov 7, 2022
9e0a33f
Avoid local functions
senarvi Nov 7, 2022
fa8ad41
Avoid lambda functions
senarvi Nov 7, 2022
54f1eb0
Avoid a lambda function
senarvi Nov 7, 2022
997b6a8
Merge branch 'master' into yolo-update
senarvi Nov 17, 2022
b95caf2
Use sync_dist=True and don't fail if there are no step outputs
senarvi Dec 15, 2022
b50ee2a
Merge branch 'master' into yolo-update
senarvi Jan 16, 2023
e4cb505
Added documentation
senarvi Jan 23, 2023
606acb1
Fixed an off-by-one bug when reading YOLOv4 backbone depths
senarvi Feb 1, 2023
d840f5a
Merge branch 'master' into yolo-update
senarvi Feb 1, 2023
b95f026
YOLOv7 network with deep supervision
senarvi Feb 24, 2023
4bc4198
Merge branch 'master' into yolo-update
senarvi Feb 24, 2023
3e3bad5
Fixed a too long line
senarvi Feb 24, 2023
d9d64ea
Avoid using "input" as a variable name
senarvi Feb 24, 2023
49d9709
Fixed type annotations
senarvi Feb 24, 2023
8e4afc9
SimOTA uses also size ratio for "center prior" filtering
senarvi Mar 16, 2023
a9c72e3
Fixed docstrings
senarvi Mar 17, 2023
cc57b60
Fixed LRScheduler import for PyTorch 2.0
senarvi Mar 17, 2023
e6f6cc1
Added support for label smoothing
senarvi Apr 3, 2023
2e55236
Added unit tests for YOLOv7 and box_size_ratio()
senarvi Apr 4, 2023
c509285
Speeded up YOLO unit tests (NMS) considerably by using a higher confi…
senarvi Apr 5, 2023
5fa9d56
Merge branch 'master' into yolo-update
senarvi Apr 5, 2023
614f7e6
detection_boxes is now called detections in MeanAveragePrecision
senarvi Apr 5, 2023
2242736
Use giou in YOLO tests to allow them to pass also with older versions…
senarvi Apr 6, 2023
4807f3c
Use double underscores in links in the docstring to avoid duplicate n…
senarvi Apr 6, 2023
64a3e47
Check that targets are given in training mode
senarvi Apr 6, 2023
26c9a7c
Fixed docstring formatting
senarvi Apr 8, 2023
c891ca4
Merge branch 'master' into yolo-update
senarvi May 19, 2023
e119ee9
Merge branch 'master' into yolo-update
senarvi May 23, 2023
98e4c2e
Add
senarvi May 23, 2023
bf6295b
Removed files that popped up back in the merge
senarvi May 23, 2023
9042812
Code formatting fixed by ruff
senarvi May 24, 2023
a399bed
Work around a problem with mypy and if-else ternary operator
senarvi May 24, 2023
c976147
Fixed bcefunc assignment so that both ruff and mypy are happy
senarvi May 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Improved YOLO model includes YOLOv4, YOLOv5, and YOLOX networks and training algorithms ([#552](https://github.com/PyTorchLightning/pytorch-lightning-bolts/pull/817))


### Deprecated

Expand Down
24 changes: 21 additions & 3 deletions pl_bolts/models/detection/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,31 @@
from pl_bolts.models.detection import components
from pl_bolts.models.detection.faster_rcnn import FasterRCNN
from pl_bolts.models.detection.retinanet import RetinaNet
from pl_bolts.models.detection.yolo.yolo_config import YOLOConfiguration
from pl_bolts.models.detection.yolo.darknet_network import DarknetNetwork
from pl_bolts.models.detection.yolo.torch_networks import (
YOLOV4Backbone,
YOLOV4Network,
YOLOV4P6Network,
YOLOV4TinyBackbone,
YOLOV4TinyNetwork,
YOLOV5Backbone,
YOLOV5Network,
YOLOXNetwork,
)
from pl_bolts.models.detection.yolo.yolo_module import YOLO

__all__ = [
"components",
"FasterRCNN",
"YOLOConfiguration",
"YOLO",
"RetinaNet",
"DarknetNetwork",
"YOLOV4Backbone",
"YOLOV4Network",
"YOLOV4P6Network",
"YOLOV4TinyBackbone",
"YOLOV4TinyNetwork",
"YOLOV5Backbone",
"YOLOV5Network",
"YOLOXNetwork",
"YOLO",
]
63 changes: 63 additions & 0 deletions pl_bolts/models/detection/yolo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# YOLO

The YOLO model has evolved quite a bit, since the original publication in 2016. The original source code was written in C, using a framework called [Darknet](https://github.com/pjreddie/darknet). The final revision by the original author was called YOLOv3 and described in an [arXiv paper](https://arxiv.org/abs/1804.02767). Later various other authors have written implementations that improve various different aspects of the model or the training procedure. [YOLOv4 implementation](https://github.com/AlexeyAB/darknet) was still based on Darknet and [YOLOv5](https://github.com/ultralytics/yolov5) was written using PyTorch. Most other implementations are based on these.

This PyTorch Lightning implementation combines features from some of the notable YOLO implementations. The most important papers are:

- *YOLOv3*: [https://arxiv.org/abs/1804.02767](Joseph Redmon and Ali Farhadi)
- *YOLOv4*: [https://arxiv.org/abs/2004.10934>](Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao)
- *Scaled-YOLOv4*: [https://arxiv.org/abs/2011.08036](Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao)
- *YOLOX*: [https://arxiv.org/abs/2107.08430](Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun)


## Network Architecture

Any network can be used with YOLO detection heads as long as it produces feature maps with the correct number of features. Typically the network consists of a CNN backbone combined with a [Feature Pyramid Network](https://arxiv.org/abs/1612.03144) or a [Path Aggregation Network](https://arxiv.org/abs/1803.01534). Backbone layers reduce the size of the feature map and the network may contain multiple detection heads that operate at different resolutions.

The user can write the network architecture in PyTorch, or construct a computational graph based on a Darknet configuration file using the [`DarknetNetwork`](https://github.com/Lightning-AI/lightning-bolts/tree/master/pl_bolts/models/detection/yolo/darknet_network.py) class. The network object is passed to the YOLO constructor in the `network` argument. `DarknetNetwork` is also able to read weights from a Darknet model file.

There are several network architectures included in the [`torch_networks`](https://github.com/Lightning-AI/lightning-bolts/tree/master/pl_bolts/models/detection/yolo/torch_networks.py) module (YOLOv4, YOLOv5, YOLOX). Larger and smaller variants of these models can be created by varying the `width` and `depth` arguments.


## Anchors

A detection head can try to detect objects at each of the anchor points that are spaced evenly across the image in a grid. The size of the grid is determined by the width and height of the feature map. There can be a number of anchors (typically three) per grid cell. The number of features predicted per grid cell has to be `(5 + num_classes) * anchors_per_cell`.

The width and the height of a bounding box is detected relative to a prior shape. `anchors_per_cell` prior shapes per detection head are defined in the network configuration. That is, if the network uses three detection heads, and each head detects three bounding boxes per grid cell, nine prior shapes need to be defined. They are defined in the Darknet configuration file or provided to the network class constructor. The defaults values have been obtained by clustering bounding box shapes in the COCO dataset. Note that if you use a different image size, you probably want to scale the prior shapes too.

With the exception of the SimOTA matching algorithm, the prior shapes are also used for matching the ground-truth targets to anchors during training. In this case targets are matched only to anchors from the closest grid cell. The prior shapes are used to determine, to which anchors from that cell the target is matched. The losses are computed between the targets boxes and the predictions that correspond to their matched anchors. Different matching rules have been implemented:

- *maxiou*: The original matching rule that matches a target to the prior shape that gives the highest IoU.
- *iou*: Matches a target to an anchor, if the IoU between the target and the prior shape is above a threshold. Multiple anchors may be matched to the same target, and the loss will be computed from a number of pairs that is generally not the same as number of ground-truth boxes.
- *size*: Calculates the ratio between the width and height of the target box to the prior width and height. If both the width and the height are close enough to the prior shape, matches the target to the anchor.
- *simota*: The SimOTA matching algorithm from YOLOX. Targets can be matched not only to anchors from the closest grid cell, but to any anchors that are inside the target bounding box. The matching algorithm is based on Optimal Transport and uses the training loss between the target and the predictions as the cost. That is, the prior shapes are not used for matching, but the predictions corresponding to the anchors.


## Input Data

The model input is expected to be a list of images. Each image is a tensor with shape `[channels, height, width]`. The images from a single batch will be stacked into a single tensor, so the sizes have to match. Different batches can have different image sizes. The feature pyramid network introduces another constraint on the image size: the width and the height have to be divisible by the ratio in which the network downsamples the input.

During training, the model expects both the image tensors and a list of targets. Each target is a dictionary containing the following tensors:

- *boxes*: `(x1, y1, x2, y2)` coordinates of the ground-truth boxes in a matrix with shape `[N, 4]`.
- *labels*: Either integer class labels in a vector of size `N` or a class mask for each ground-truth box in a boolean matrix with shape `[N, classes]`


## Training

The command line application demonstrates how to train a YOLO model using PyTorch Lightning. The first step is to create a network, either from a Darknet configuration file, or using one of the included PyTorch networks. The network is passed to the YOLO model constructor.

The data module needs to resize the data to a suitable size, in addition to any augmenting transforms. For example, YOLOv4 network requires that the width and the height are multiples of 32.


## Inference

During inference, the model requires only the input images. `forward()` method receives a mini-batch of images in a tensor with shape `[N, channels, height, width]`.

Every detection head predicts a bounding box at every anchor point. `forward()` returns the predictions from all detection heads in a tensor with shape `[N, anchors, classes + 5]`, where `anchors` is the total number of anchors in all detection heads. The predictions are `x1`, `y1`, `x2`, `y2`, width, height, confidence, and the probability for each class. The coordinates are scaled to the input image size.

`infer()` method filters and processes the predictions. A class-specific score is obtained by multiplying the class probability with the detection confidence. Only detections with a high enough score are kept. YOLO does not use `softmax` to normalize the class probabilities, but each probability is normalized individually using `sigmoid`. Consequently, one object can be assigned to multiple categories. If more than one class has a score that is above the confidence threshold, these will be split into multiple detections. Then the detections are filtered using non-maximum suppression. The processed output is returned in a dictionary containing the following tensors:

- *boxes*: a matrix of predicted bounding box `(x1, y1, x2, y2)` coordinates in image space
- *scores*: a vector of detection confidences
- *labels*: a vector of predicted class labels
Loading