Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump version to 3.3.0 #11347

Merged
merged 23 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
1fb2ac3
Fix docstring for dimension of targets argument (#11058)
AwePhD Oct 18, 2023
88dfe48
Fix Grounding DINO nan when class tokens exceeds 256 (#11066)
Divadi Oct 20, 2023
223dc5d
Fix broken links in README.md (#11060)
aaronzs Oct 20, 2023
0eb502f
Fixed the issue where the directory level of the youtubevis2coco… (#1…
LRJKD Oct 26, 2023
ee6d03d
Fixed the issue where type and by_epoch in loop_cfg exist simult… (#1…
LRJKD Oct 26, 2023
627e00c
Support ODinW and evaluate (#11105)
hhaAndroid Nov 6, 2023
47063a1
[MMSIG#357] Add new configs for panoptic_fpn (#11109)
Crescent-Saturn Nov 6, 2023
4a516c3
[Feature] Add optional score threshold option to coco_error_analysis.…
guyleaf Nov 7, 2023
51f8aee
Support LVIS chunked evaluation and image chunked inference of GLIP (…
hhaAndroid Nov 9, 2023
5a02a0a
Replace partially weighted download links with OpenXLab for the Faste…
keyhsw Nov 14, 2023
24bb129
add odinw configs and evaluation results of GLIP(#11175)
Cycyes Nov 15, 2023
ee2e542
Add GroundingDINO on ODinW results, and support caption prompt of Gro…
Cycyes Nov 21, 2023
dfffb99
MMGroundingDINO-A replicable and more comprehensive GroundingDINO (#1…
hhaAndroid Dec 18, 2023
63713c9
finetune MM-GDINO on ov_coco and ov_lvis (#11304)
xushilin1 Dec 22, 2023
d2b238e
[Feature] Add RTMDet Swin / ConvNeXt (#11259)
okotaku Dec 22, 2023
c5f6ea5
Fix bug in `convert_coco_format` (#11251)
ImJaewooChoi Dec 22, 2023
63a4bb8
Fix CO-DETR load_from url in config (#11220)
returnL Dec 22, 2023
e5f9f35
Update README and refine of MM-GDINO (#11298)
hhaAndroid Dec 26, 2023
b98f372
Fixed mask shape after Albu postprocess (#11280)
ilcopione Dec 27, 2023
46b10b1
update English version of md (#11336)
Cycyes Jan 3, 2024
aeb4647
Fix one of the CO-DETR config files (#11325)
adnan-mujagic Jan 3, 2024
10ae0b3
replace '.jpg' instead of 'jpg' to guarantee replacing file ending (#…
R-Fehler Jan 4, 2024
436d488
Bump version to 3.3.0 (#11338)
hhaAndroid Jan 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 5 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,50 +103,16 @@ Apart from MMDetection, we also released [MMEngine](https://github.com/open-mmla

### Highlight

**v3.2.0** was released in 12/10/2023:
**v3.3.0** was released in 5/1/2024:

**1. Detection Transformer SOTA Model Collection**
(1) Supported four updated and stronger SOTA Transformer models: [DDQ](configs/ddq/README.md), [CO-DETR](projects/CO-DETR/README.md), [AlignDETR](projects/AlignDETR/README.md), and [H-DINO](projects/HDINO/README.md).
(2) Based on CO-DETR, MMDet released a model with a COCO performance of 64.1 mAP.
(3) Algorithms such as DINO support `AMP/Checkpoint/FrozenBN`, which can effectively reduce memory usage.
**[MM-Grounding-DINO: An Open and Comprehensive Pipeline for Unified Object Grounding and Detection](https://arxiv.org/abs/2401.02361)**

**2. [Comprehensive Performance Comparison between CNN and Transformer](<(projects/RF100-Benchmark/README.md)>)**
RF100 consists of a dataset collection of 100 real-world datasets, including 7 domains. It can be used to assess the performance differences of Transformer models like DINO and CNN-based algorithms under different scenarios and data volumes. Users can utilize this benchmark to quickly evaluate the robustness of their algorithms in various scenarios.
Grounding DINO is a grounding pre-training model that unifies 2d open vocabulary object detection and phrase grounding, with wide applications. However, its training part has not been open sourced. Therefore, we propose MM-Grounding-DINO, which not only serves as an open source replication version of Grounding DINO, but also achieves significant performance improvement based on reconstructed data types, exploring different dataset combinations and initialization strategies. Moreover, we conduct evaluations from multiple dimensions, including OOD, REC, Phrase Grounding, OVD, and Fine-tune, to fully excavate the advantages and disadvantages of Grounding pre-training, hoping to provide inspiration for future work.

<div align=center>
<img src="https://github.com/open-mmlab/mmdetection/assets/17425982/86420903-36a8-410d-9251-4304b9704f7d"/>
</div>

**3. Support for [GLIP](configs/glip/README.md) and [Grounding DINO](configs/grounding_dino/README.md) fine-tuning, the only algorithm library that supports Grounding DINO fine-tuning**
The Grounding DINO algorithm in MMDet is the only library that supports fine-tuning. Its performance is one point higher than the official version, and of course, GLIP also outperforms the official version.
We also provide a detailed process for training and evaluating Grounding DINO on custom datasets. Everyone is welcome to give it a try.

| Model | Backbone | Style | COCO mAP | Official COCO mAP |
| :----------------: | :------: | :-------: | :--------: | :---------------: |
| Grounding DINO-T | Swin-T | Zero-shot | 48.5 | 48.4 |
| Grounding DINO-T | Swin-T | Finetune | 58.1(+0.9) | 57.2 |
| Grounding DINO-B | Swin-B | Zero-shot | 56.9 | 56.7 |
| Grounding DINO-B | Swin-B | Finetune | 59.7 | |
| Grounding DINO-R50 | R50 | Scratch | 48.9(+0.8) | 48.1 |

**4. Support for the open-vocabulary detection algorithm [Detic](projects/Detic_new/README.md) and multi-dataset joint training.**
**5. Training detection models using [FSDP and DeepSpeed](<(projects/example_largemodel/README.md)>).**

| ID | AMP | GC of Backbone | GC of Encoder | FSDP | Peak Mem (GB) | Iter Time (s) |
| :-: | :-: | :------------: | :-----------: | :--: | :-----------: | :-----------: |
| 1 | | | | | 49 (A100) | 0.9 |
| 2 | √ | | | | 39 (A100) | 1.2 |
| 3 | | √ | | | 33 (A100) | 1.1 |
| 4 | √ | √ | | | 25 (A100) | 1.3 |
| 5 | | √ | √ | | 18 | 2.2 |
| 6 | √ | √ | √ | | 13 | 1.6 |
| 7 | | √ | √ | √ | 14 | 2.9 |
| 8 | √ | √ | √ | √ | 8.5 | 2.4 |

**6. Support for the [V3Det](configs/v3det/README.md) dataset, a large-scale detection dataset with over 13,000 categories.**
code: [mm_grounding_dino/README.md](configs/mm_grounding_dino/README.md)

<div align=center>
<img width=960 src="https://github.com/open-mmlab/mmdetection/assets/17425982/9c216387-02be-46e6-b0f2-b856f80f6d84"/>
<img src="https://github.com/open-mmlab/mmdetection/assets/17425982/fb14d1ee-5469-44d2-b865-aac9850c429c"/>
</div>

We are excited to announce our latest work on real-time object recognition tasks, **RTMDet**, a family of fully convolutional single-stage detectors. RTMDet not only achieves the best parameter-accuracy trade-off on object detection from tiny to extra-large model sizes but also obtains new state-of-the-art performance on instance segmentation and rotated object detection tasks. Details can be found in the [technical report](https://arxiv.org/abs/2212.07784). Pre-trained models are [here](configs/rtmdet).
Expand Down
45 changes: 6 additions & 39 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,51 +102,18 @@ MMDetection 是一个基于 PyTorch 的目标检测开源工具箱。它是 [Ope

### 亮点

**v3.2.0** 版本已经在 2023.10.12 发布:
**v3.3.0** 版本已经在 2024.1.5 发布:

**1. 检测 Transformer SOTA 模型大合集**
(1) 支持了 [DDQ](configs/ddq/README.md)、[CO-DETR](projects/CO-DETR/README.md)、[AlignDETR](projects/AlignDETR/README.md) 和 [H-DINO](projects/HDINO/README.md) 4 个更新更强的 SOTA Transformer 模型
(2) 基于 CO-DETR, MMDet 中发布了 COCO 性能为 64.1 mAP 的模型
(3) DINO 等算法支持 AMP/Checkpoint/FrozenBN,可以有效降低显存
**MM-Grounding-DINO: 轻松涨点,数据到评测全面开源**

**2. [提供了全面的 CNN 和 Transformer 的性能对比](projects/RF100-Benchmark/README_zh-CN.md)**
RF100 是由 100 个现实收集的数据集组成,包括 7 个域,可以验证 DINO 等 Transformer 模型和 CNN 类算法在不同场景不同数据量下的性能差异。用户可以用这个 Benchmark 快速验证自己的算法在不同场景下的鲁棒性。
Grounding DINO 是一个统一了 2d 开放词汇目标检测和 Phrase Grounding 的检测预训练模型,应用广泛,但是其训练部分并未开源,为此提出了 MM-Grounding-DINO。其不仅作为 Grounding DINO 的开源复现版,MM-Grounding-DINO 基于重新构建的数据类型出发,在探索了不同数据集组合和初始化策略基础上实现了 Grounding DINO 的性能极大提升,并且从多个维度包括 OOD、REC、Phrase Grounding、OVD 和 Finetune 等方面进行评测,充分挖掘 Grounding 预训练优缺点,希望能为后续工作提供启发。

<div align=center>
<img src="https://github.com/open-mmlab/mmdetection/assets/17425982/86420903-36a8-410d-9251-4304b9704f7d"/>
</div>

**3. 支持了 [GLIP](configs/glip/README.md) 和 [Grounding DINO](configs/grounding_dino/README.md) 微调,全网唯一支持 Grounding DINO 微调**
MMDet 中的 Grounding DINO 是全网唯一支持微调的算法库,且性能高于官方 1 个点,当然 GLIP 也比官方高。
我们还提供了详细的 Grounding DINO 在自定义数据集上训练评估的流程,欢迎大家试用。

| Model | Backbone | Style | COCO mAP | Official COCO mAP |
| :----------------: | :------: | :-------: | :--------: | :---------------: |
| Grounding DINO-T | Swin-T | Zero-shot | 48.5 | 48.4 |
| Grounding DINO-T | Swin-T | Finetune | 58.1(+0.9) | 57.2 |
| Grounding DINO-B | Swin-B | Zero-shot | 56.9 | 56.7 |
| Grounding DINO-B | Swin-B | Finetune | 59.7 | |
| Grounding DINO-R50 | R50 | Scratch | 48.9(+0.8) | 48.1 |

**4. 支持开放词汇检测算法 [Detic](projects/Detic_new/README.md) 并提供多数据集联合训练可能**

**5. 轻松使用 [FSDP 和 DeepSpeed 训练检测模型](projects/example_largemodel/README_zh-CN.md)**

| ID | AMP | GC of Backbone | GC of Encoder | FSDP | Peak Mem (GB) | Iter Time (s) |
| :-: | :-: | :------------: | :-----------: | :--: | :-----------: | :-----------: |
| 1 | | | | | 49 (A100) | 0.9 |
| 2 | √ | | | | 39 (A100) | 1.2 |
| 3 | | √ | | | 33 (A100) | 1.1 |
| 4 | √ | √ | | | 25 (A100) | 1.3 |
| 5 | | √ | √ | | 18 | 2.2 |
| 6 | √ | √ | √ | | 13 | 1.6 |
| 7 | | √ | √ | √ | 14 | 2.9 |
| 8 | √ | √ | √ | √ | 8.5 | 2.4 |
arxiv 技术报告:https://arxiv.org/abs/2401.02361

**6. 支持了 [V3Det](configs/v3det/README.md) 1.3w+ 类别的超大词汇检测数据集**
代码地址: [mm_grounding_dino/README.md](configs/mm_grounding_dino/README.md)

<div align=center>
<img width=960 src="https://github.com/open-mmlab/mmdetection/assets/17425982/9c216387-02be-46e6-b0f2-b856f80f6d84"/>
<img src="https://github.com/open-mmlab/mmdetection/assets/17425982/fb14d1ee-5469-44d2-b865-aac9850c429c"/>
</div>

我们很高兴向大家介绍我们在实时目标识别任务方面的最新成果 RTMDet,包含了一系列的全卷积单阶段检测模型。 RTMDet 不仅在从 tiny 到 extra-large 尺寸的目标检测模型上实现了最佳的参数量和精度的平衡,而且在实时实例分割和旋转目标检测任务上取得了最先进的成果。 更多细节请参阅[技术报告](https://arxiv.org/abs/2212.07784)。 预训练模型可以在[这里](configs/rtmdet)找到。
Expand Down
Loading