The SSD for object detection on Fluid. #7402

qingqing01 · 2018-01-10T09:34:48Z

The details of SSD algorithm for object detection is not introduced here.

The implementation comparison among Paddle, Caffe, and TensorFlow.

At first, compare the SSD implementation of the three frameworks and the corresponding relationship is as follows. TensorFlow Object Detection API is fine-grained, flexible, but maybe a litter complex. And, the SSD, Faster-RCNN and R-FCN share some implementation, like box encoder/decoder, non_max_suppression, region similarity calculator and so on. The loss implementation in Caffe and Paddle is a coarse-grained operator. It's a little hard to read codes and understand the overall algorithm by the codes. So, maybe it's better to split the loss into several sub-operations.

Paddle	Caffe	TensorFlow
PriorBoxLayer	PriorBoxLayer	anchor_generators
MultiBoxLossLayer	MultiBoxLossLayer Transpose Flatten Concat	1. box_coder_builder: box encoder and box decoder 2. matcher_builder: argmax_matcher / bipartite_matcher 3. region_similarity_calculator_builder: iou/ioa/neg_sq_dist_similarity 4. losses_builder: softmax_loss, hard_example_miner, target_assigner, smoothL1 tf.image.non_max_suppression, tf.gather ... and so on.
DetectionOutputLayer	DetectionOutputLayer Transpose Flatten Concat	post_processing_builder: box decoder， batch_multiclass_non_max_suppression, multiclass_non_max_suppression, tf.image.non_max_suppression

SSD on Fluid.

1). anchor_box_op: generate anchors on the fly corresponding to one CNN layer.
- Input:
  - Input(1): the input image with shape [N, C1, H1, W1]
  - Input(2): the layer to generator anchor box with shape [N, C2, H2, W2]
- Attr: min_size(int), max_size(int), aspect_ratio(int), variance(int), flip(bool), clip(bool)
- Output: anchor boxes with shape [2, H, W, M, 4]. H * W * M is total number of anchor box for the Input(2).
2). Python API of prior_box_op: must handle multiple CNN layers.
- Args: the arguments in Section 2.2 in SSD paper
  - A list of CNN layers which to generate anchor boxes.
  - min_ratio, max_ratio, aspect_ratios, anchor_box_variance,
  - minimum dimension of input image
- Output:
  - anchor boxes, Tensor with shape [Np, 4], Np is the total number of anchor boxes for the multiple CNN layers.
  - the variance of anchor boxes, Tensor with shape [Np, 4]
3). iou_similarity_op: compute similarity based on Intersection over Union (IOU) metric.
- Input:
  - Input(1): the first box is ground-truth boxes, it is a LoDTensor with shape [Ng, 4], Ng is the total number of ground-truth boxes in the batch.
  - Input(2): the second box is generated anchor boxes, it is a LoDTensor with shape [Np, 4]
- Output: the output is IOU metric, it is LoDTensor with shape [Ng, Np]
4). bipartite_match_op
- Input:
  - Input(1): the IOU metricis, a LoDTensor with shape [Ng, Np]
  - Input(2): ground-truth boxes, a LoDTensor with shape [Ng, 4 ]
- Output:
  - Output(1): matched indices is a LoDTensor with shape [N, Np], N is the batch size and Ng>=N.
  - Output(2): matched IOU metric is a LoDTensor with shape [N, Np]
  - Output(3): matched target label is a LodTensor with shape [N, Np], the Output(1) saved the ground-truth box index, not the label, this output save the ground-truth label.
5). box_coder_op
- Support encoder and decoder. Here the input is anchor boxes and ground-truth boxes.
- The output is LoDTensor with shape [Ng, Np, 4]
6). softmax_with_loss_op: compute the confidence loss for each prior classification prediction
- Input:
  - Input(1): Classification prediction input with shape [N, Np, Nc], Nc is the class number.
  - Input(2): matched target label with shape [N, Np]
- Output:
  - classification loss [N, 1]
6). mine_hard_examples_op
- Input:
  - Input(1): classification loss wit shape [N, Np]
  - Input(2): localization loss if needed. Now the defalut demo in Caffe and TensorFlow only use classification loss.
  - Input(3): matched indices
- Output:
  - Negatives indices, LoDTensor, [Neg, 1]
  - The match indices will also be changed, the hard example indices will be labeled -1.
7). target_assign_op
- Input:
  - Input(1): localization predictions
  - Input(2): matched indices aftern mine_hard_examples_op
  - Input(3)：encoded ground-truth bboxes with shape [Ng, Np, 4]
  - Input(4): the variance of anchor boxes [Np, 4]
- Output:
  - The encoderd ground-truth bboxes for each localization offset
    prediction, it's a LoDTensor with shape [N, Np, 4]
8). smooth_l1_op
- Input:
  - Input(1): localization offset predictions
  - Input(2): encoded ground-truth bboxes for each localization offset prediction
- Output:
  - localization loss, Tensor wit shape [N, 1]
9). batch_multiclass_nms_op
- Input:
  - Input(1): decoded localization predictions after box_coder_op.
  - Input(2): classification prediction
- Output:
  - The output is a LoDTensor with shape [Ng, 6] (label, score, xmin, ymin, xmax, ymax)
10). transpose_op, concat_op, softmax_with_loss_op and smooth_l1_op. These operators have been implemented.

Data Struct

In the Caffe, since each input image may have different number of ground-truth boxes, and for the convenience of calculation, the input Blobs (similar to tensor) for the ground-truth boxes and anchor boxes are converted into std::map<int, vector<NormalizedBBox> > and std::vector<NormalizedBBox>. The NormalizedBBox is a struct.

But in Fluid, we have LoDTensor, maybe Tensor(or LoDTensor)-in/Tensor(or LoDTensor)-out for each operator is enough.

If there is any problem with the above descriptions, please help to correct it. Thank you.

The text was updated successfully, but these errors were encountered:

wanghaox · 2018-01-10T11:36:34Z

The output shape of anchor_box_op is [2, H, W, M, 4].
Which OP is used to generate negative indices in fluid.
Box_coder_op is designed to be OP, or just by using a function.
I think LoDTensor is ok for ground-truth boxes, and I used it in my code.
How to implement DetectionOutputLayer in fluid.
What is the role of mine_hard_examples_op and target_assign_op.

qingqing01 · 2018-01-11T12:30:11Z

@wanghaox Thanks for your review. I update above descriptions and add more comments.

Which OP is used to generate negative indices in fluid.

How to implement DetectionOutputLayer in fluid.

Box_coder_op is designed to be OP, or just by using a function.

The mine_hard_examples_op is used to generate negative indices.
The box_coder_op and batch_multiclass_nms_op are used to get detection outputs.
The smooth_l1_op is used to compute localization loss, one input is encoded ground-truth boxes. The batch_multiclass_nms_op is used to get detection outputs, and one input is decoded localization predictions. So maybe an operator for box_coder is better. Of course, we can think again in the realization.

Thank you.

qingqing01 mentioned this issue Jan 10, 2018

The image recognition and detection model on Fluid. #7253

Closed

qingqing01 assigned qingqing01 and wanghaox Jan 11, 2018

qingqing01 mentioned this issue Jan 12, 2018

The TODO lists for MobileNet-SSD model. #7488

Closed

25 tasks

wanghaox mentioned this issue Jan 17, 2018

Add IoU similarity operator. #7565

Closed

qingqing01 mentioned this issue Jan 17, 2018

Develop bipartite graph match operator. #7615

Closed

wanghaox mentioned this issue Jan 18, 2018

Add mine_hard_examples_op #7639

Closed

Noplz mentioned this issue Jan 23, 2018

Add Box Coder Op #7794

Closed

This was referenced Jan 25, 2018

Add target_location_assign_op #7843

Closed

Add target_confidence_assign_op #7844

Closed

This was referenced Feb 6, 2018

Develop target assigner operator for object detection (SSD algorithm). #8192

Closed

Implement multibox_loss_layer wrapper for SSD detection. #8252

Closed

chengduoZH mentioned this issue Feb 9, 2018

Wrap Python API for prior_box_op #8333

Closed

Noplz mentioned this issue Feb 11, 2018

Add detection output python api #8380

Closed

Noplz mentioned this issue Mar 4, 2018

add fluid mobilenet ssd PaddlePaddle/models#679

Merged

Noplz mentioned this issue Mar 21, 2018

add ssd latest data augmentation PaddlePaddle/models#751

Merged

qingqing01 closed this as completed Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The SSD for object detection on Fluid. #7402

The SSD for object detection on Fluid. #7402

qingqing01 commented Jan 10, 2018 •

edited

Loading

wanghaox commented Jan 10, 2018 •

edited

Loading

qingqing01 commented Jan 11, 2018

The SSD for object detection on Fluid. #7402

The SSD for object detection on Fluid. #7402

Comments

qingqing01 commented Jan 10, 2018 • edited Loading

The implementation comparison among Paddle, Caffe, and TensorFlow.

SSD on Fluid.

Data Struct

wanghaox commented Jan 10, 2018 • edited Loading

qingqing01 commented Jan 11, 2018

qingqing01 commented Jan 10, 2018 •

edited

Loading

wanghaox commented Jan 10, 2018 •

edited

Loading