resolve conflicts

open-mmlab · Mar 29, 2023 · 972e523 · 972e523
2 parents 2294135 + b481efc
commit 972e523
Show file tree

Hide file tree

Showing 4 changed files with 103 additions and 2 deletions.
diff --git a/configs/minkunet/README.md b/configs/minkunet/README.md
@@ -0,0 +1,43 @@
+# 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
+
+> [4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks](https://arxiv.org/abs/1904.08755)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, those 3D-videos are processed frame-by-frame either through 2D convnets or 3D perception algorithms. In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions. For this, we adopt sparse tensors and propose the generalized sparse convolution that encompasses all discrete convolutions. To implement the generalized sparse convolution, we create an open-source auto-differentiation library for sparse tensors that provides extensive functions for high-dimensional convolutional neural networks. We create 4D spatio-temporal convolutional neural networks using the library and validate them on various 3D semantic segmentation benchmarks and proposed 4D datasets for 3D-video perception. To overcome challenges in the 4D space, we propose the hybrid kernel, a special case of the generalized sparse convolution, and the trilateral-stationary conditional random field that enforces spatio-temporal consistency in the 7D space-time-chroma space. Experimentally, we show that convolutional neural networks with only generalized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods by a large margin. Also, we show that on 3D-videos, 4D spatio-temporal convolutional neural networks are robust to noise, outperform 3D convolutional neural networks and are faster than the 3D counterpart in some cases.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/72679458/225243534-cd0ed738-4224-4e7c-bcac-4f4c8d89f3a9.png" width="800"/>
+</div>
+
+## Introduction
+
+We implement MinkUNet with [TorchSparse](https://github.com/mit-han-lab/torchsparse) backend and provide the result and checkpoints on SemanticKITTI datasets.
+
+## Results and models
+
+### SemanticKITTI
+
+|    Method    | Lr schd | Mem (GB) | mIoU |                                                                                                                                                                          Download                                                                                                                                                                           |
+| :----------: | :-----: | :------: | :--: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| MinkUNet-W16 |   15e   |   3.4    | 60.3 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w16_8xb2-15e_semantickitti/minkunet_w16_8xb2-15e_semantickitti_20230309_160737-0d8ec25b.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w16_8xb2-15e_semantickitti/minkunet_w16_8xb2-15e_semantickitti_20230309_160737.log) |
+| MinkUNet-W20 |   15e   |   3.7    | 61.6 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w20_8xb2-15e_semantickitti/minkunet_w20_8xb2-15e_semantickitti_20230309_160718-c3b92e6e.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w20_8xb2-15e_semantickitti/minkunet_w20_8xb2-15e_semantickitti_20230309_160718.log) |
+| MinkUNet-W32 |   15e   |   4.9    | 63.1 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w32_8xb2-15e_semantickitti/minkunet_w32_8xb2-15e_semantickitti_20230309_160710-7fa0a6f1.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w32_8xb2-15e_semantickitti/minkunet_w32_8xb2-15e_semantickitti_20230309_160710.log) |
+
+**Note:** We follow the implementation in SPVNAS original [repo](https://github.com/mit-han-lab/spvnas) and W16\\W20\\W32 indicates different number of channels.
+
+**Note:** Due to TorchSparse backend, the model performance is unstable with TorchSparse backend and may fluctuate by about 1.5 mIoU for different random seeds.
+
+## Citation
+
+```latex
+@inproceedings{choy20194d,
+  title={4d spatio-temporal convnets: Minkowski convolutional neural networks},
+  author={Choy, Christopher and Gwak, JunYoung and Savarese, Silvio},
+  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
+  pages={3075--3084},
+  year={2019}
+}
+```
diff --git a/configs/minkunet/metafile.yml b/configs/minkunet/metafile.yml
@@ -0,0 +1,57 @@
+Collections:
+  - Name: MinkUNet
+    Metadata:
+      Training Techniques:
+        - AdamW
+      Architecture:
+        - MinkUNet
+    Paper:
+      URL: https://arxiv.org/abs/1904.08755
+      Title: '4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks'
+    README: configs/minkunet/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmdetection3d/blob/1.1/mmdet3d/models/segmentors/minkunet.py#L13
+      Version: v1.1.0rc4
+
+Models:
+  - Name: minkunet_w16_8xb2-15e_semantickitti
+    In Collection: MinkUNet
+    Config: configs/minkunet/minkunet_w16_8xb2-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 3.4
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIoU: 60.3
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w16_8xb2-15e_semantickitti/minkunet_w16_8xb2-15e_semantickitti_20230309_160737-0d8ec25b.pth
+
+  - Name: minkunet_w20_8xb2-15e_semantickitti
+    In Collection: MinkUNet
+    Config: configs/minkunet/minkunet_w20_8xb2-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 3.7
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIoU: 61.6
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w20_8xb2-15e_semantickitti/minkunet_w20_8xb2-15e_semantickitti_20230309_160718-c3b92e6e.pth
+
+  - Name: minkunet_w32_8xb2-15e_semantickitti
+    In Collection: MinkUNet
+    Config: configs/minkunet/minkunet_w32_8xb2-15e_semantickitti.py
+    Metadata:
+      Training Data: SemanticKITTI
+      Training Memory (GB): 4.9
+      Training Resources: 8x A100 GPUs
+    Results:
+      - Task: 3D Semantic Segmentation
+        Dataset: SemanticKITTI
+        Metrics:
+          mIoU: 63.1
+    Weights: https://download.openmmlab.com/mmdetection3d/v1.1.0_models/minkunet/minkunet_w32_8xb2-15e_semantickitti/minkunet_w32_8xb2-15e_semantickitti_20230309_160710-7fa0a6f1.pth
diff --git a/configs/spvcnn/README.md b/configs/spvcnn/README.md
@@ -22,8 +22,8 @@ We implement SPVCNN with [TorchSparse](https://github.com/mit-han-lab/torchspars
 
 |   Method   | Lr schd | Mem (GB) | mIoU |                                                                                                                                                                    Download                                                                                                                                                                     |
 | :--------: | :-----: | :------: | :--: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| SPVCNN-W16 |   15e   |   3.9    | 61.9 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645-a2734d85.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645.log) |
-| SPVCNN-W20 |   15e   |   4.2    | 62.7 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649-519e7eff.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649.log) |
+| SPVCNN-W16 |   15e   |   3.9    | 61.8 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645-a2734d85.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w16_8xb2-15e_semantickitti/spvcnn_w16_8xb2-15e_semantickitti_20230321_011645.log) |
+| SPVCNN-W20 |   15e   |   4.2    | 62.6 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649-519e7eff.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w20_8xb2-15e_semantickitti/spvcnn_w20_8xb2-15e_semantickitti_20230321_011649.log) |
 | SPVCNN-W32 |   15e   |   5.4    | 64.3 | [model](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/spvcnn_w32_8xb2-15e_semantickitti/spvcnn_w32_8xb2-15e_semantickitti_20230308_113324-f7c0c5b4.pth) \| [log](https://download.openmmlab.com/mmdetection3d/v1.1.0_models/spvcnn/pvcnn_w32_8xb2-15e_semantickitti/spvcnn_w32_8xb2-15e_semantickitti_20230308_113324.log)  |
 
 **Note:** We follow the implementation in SPVNAS original [repo](https://github.com/mit-han-lab/spvnas) and W16\\W20\\W32 indicates different number of channels.

diff --git a/model-index.yml b/model-index.yml
@@ -23,6 +23,7 @@ Import:
   - configs/smoke/metafile.yml
   - configs/ssn/metafile.yml
   - configs/votenet/metafile.yml
+  - configs/minkunet/metafile.yml
   - configs/cylinder3d/metafile.yml
   - configs/pv_rcnn/metafile.yml
   - configs/fcaf3d/metafile.yml