-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Conversation
Update the matching part in retinanet_loss.
Need to check it again to see if is there any room for speed improvement.
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
This is really awesome, thanks a lot for the PR! I'll have a closer look at it next week, let us know the result of the training! |
Finishing the training. |
No worries about X_101_32x8d training, we can do it on our side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again thanks a lot for this awesome PR!
This is not a complete review yet.
One question I have is that I think we might want to move _C.RETINANET
into _C.MODEL.RETINANET
, but let's wait until @rbgirshick comment on that.
@chengyangfu nice work! Do you know what implementation differences might have caused the improvement in box AP relative to the Detectron implementation? I'm also curious if you need to use a C++ implementation of sigmoid focal loss or if you can simply use a Python implementation using |
Generate Empty BoxLists instead of [] in retinanet_infer
@rbgirshick The following is the python version of Focal Loss I tested. def forward(self, inputs, targets):
N = inputs.size(0)
C = inputs.size(1)
class_mask = inputs.new_zeros((N, C))
ids = targets.view(-1, 1)
class_mask.scatter_(1, ids, 1.)
class_mask = class_mask[:, 1:]
inputs = inputs[:, 1:]
P = torch.sigmoid(inputs)
PC = P*class_mask + (1-P)*(1-class_mask)
alpha = self.alpha * class_mask + (1 - self.alpha) * (1 - class_mask)
focal_weight = alpha * (1 - PC).pow(self.gamma)
loss = F.binary_cross_entropy_with_logits(inputs, class_mask,
focal_weight)
return loss |
Add NUM_DETECTIONS_PER_IMAGE
hi @laibe |
Hi @chengyangfu , Thanks for the benchmark! I believe this is a consequence of the operations not being fused in PyTorch 1.0.0. I think there have been some improvements recently that made it better, but I'd need to check. cc @ailzhang for the performance timings and memory Let's keep the CUDA implementation for now then, and dispatch to the Python implementation if we the tensor is on CPU, how does that sound? Then this will be ready to be merged. Once again, thanks a lot for this awesome contribution! |
@zimenglan-sysu-512 what's your training setup? I was using: |
thanks @laibe after using cuda version, |
The numbers might make sense given the current fusion logic in jit, @waochaol @zou3519 could you also help check on the JIT numbers? Thanks! |
@fmassa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
That's an awesome contribution @chengyangfu , thanks a lot for all your effort! |
@chengyangfu thks a lot |
Ah... I finally realized why model zoo does not have these trained weights... Removing OUT_CHANNELS: 256 from backbone destroyed trained networks? I hope someone update/convert these weights :( Edit : OK, never mind this comment. It was just giving scores lower than 0.7(on the whole image). Try predeictions on COCO_val2014_000000355257.jpg. |
could you please support Sigmoid Focal Loss cuda implementation to run on FP16? Thank you |
Can I use this model to train my custom dataset as in #521 ? |
supplement performance of retinanet_r101fpn_2x on COCO minival: |
Hi @chengyangfu :)
|
* Add RetinetNet parameters in cfg. * hot fix. * Add the retinanet head module now. * Add the function to generate the anchors for RetinaNet. * Add the SigmoidFocalLoss cuda operator. * Fix the bug in the extra layers. * Change the normalizer for SigmoidFocalLoss * Support multiscale in training. * Add retinannet training script. * Add the inference part of RetinaNet. * Fix the bug when building the extra layers in retinanet. Update the matching part in retinanet_loss. * Add the first version of the inference of RetinaNet. Need to check it again to see if is there any room for speed improvement. * Remove the retinanet_R-50-FPN_2x.yaml first. * Optimize the retinanet postprocessing. * quick fix. * Add script for training RetinaNet with ResNet101 backbone. * Move cfg.RETINANET to cfg.MODEL.RETINANET * Remove the variables which are not used. * revert boxlist_ops. Generate Empty BoxLists instead of [] in retinanet_infer * Remove the not used commented lines. Add NUM_DETECTIONS_PER_IMAGE * remove the not used codes. * Move retinanet related files under Modeling/rpn/retinanet * Add retinanet_X_101_32x8d_FPN_1x.yaml script. This model is not fully validated. I only trained it around 5000 iterations and everything is fine. * set RETINANET.PRE_NMS_TOP_N as 0 in level5 (p7), because previous setting may generate zero detections and could cause the program break. This part is used in original Detectron setting. * Fix the rpn only bug when the training ends. * Minor improvements * Comments and add Python-only implementation * Bugfix and remove commented code * keep the generalized_rcnn same. Move the build_retinanet inside build_rpn. * Add USE_C5 in the MODEL.RETINANET * Add two configs using P5 to generate P6. * fix the bug when loading the Caffe2 ImageNet pretrained model. * Reduce the code depulication of RPN loss and RetinaNet loss. * Remove the comment which is not used. * Remove the hard coded number of classes. * share the foward part of rpn inference. * fix the bug in rpn inference. * Remove the conditional part in the inference. * Bug fix: add the utils file for permute and flatten of the box prediction layers. * Update the comment. * quick fix. Adding import cat. * quick fix: forget including import. * Adjust the normalization part according to Detectron's setting. * Use the bbox reg normalization term. * Clean the code according to recent review. * Using CUDA version for training now. And the python version for training on cpu. * rename the directory to retinanet. * Make the train and val datasets are consistent with mask r-cnn setting. * add comment.
Hi,
This PR contains the RetinaNet implementation. The following table contains the models which use ResNet50 and ResNet101 as the backbones.
GPU : Pascal Titan X
PyTorch : v1.0.0
Inference time is measured when setting batch size as 1.
Add _C.TEST.DETECTIONS_PER_IMG = 100.
After using DETECTIONS_PER_IMG, the mAP drops 0.1.
Not Implemented parts.
Updated 02/02/2018
Identify the reason why this branch gets higher AP.
19.7/39.9/49.0
add *4 in classification loss normalization
19.6/39.3/48.2
Updated 01/30/2018
After updating PyTorch to v1.0.0, the inference time reduced around 15~20%.
Update the inference time in the table.
Updated 01/26/2018
Add RetinaNet_X-101-32x8d-FPN_1x model.
AP : 39.8
Inferece time : 0.200 second.
Updated 01/25/2018
In my first version, I accidentally used P5 to generate P6 instead of C5 which was used in Detectron and paper.
The following table compares the performances in these two settings.
Updated 01/23/2018
Train the model without "divide by 4" in the regression loss.
Performance:
Updated 11/20/2018
The matching part is slightly different from the Detectron version.
In Detectron matching, anchors with IOU >= 0.5 are considered as positive examples, and anchors with IOU <=0.4 are negative examples. Then for the low qualities matches(best prediction for each gt), Detectron only uses the low-quality examples >= 0.4.
P.S.In Detectron, there are some cases occur in both fg_inds and bg_inds. Although in Line230, Detectron removes all the low-qualities positive examples < 0.4. I think the in Line231, num_fg calculation is not correct.
I also test the threshold used for low-qualities positive examples from 0.5 to 0.0.