-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The SSD for object detection on Fluid. #7402
Comments
|
@wanghaox Thanks for your review. I update above descriptions and add more comments.
Thank you. |
25 tasks
Closed
This was referenced Jan 25, 2018
This was referenced Feb 6, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The details of SSD algorithm for object detection is not introduced here.
The implementation comparison among Paddle, Caffe, and TensorFlow.
At first, compare the SSD implementation of the three frameworks and the corresponding relationship is as follows. TensorFlow Object Detection API is fine-grained, flexible, but maybe a litter complex. And, the SSD, Faster-RCNN and R-FCN share some implementation, like box encoder/decoder, non_max_suppression, region similarity calculator and so on. The loss implementation in Caffe and Paddle is a coarse-grained operator. It's a little hard to read codes and understand the overall algorithm by the codes. So, maybe it's better to split the loss into several sub-operations.
Transpose
Flatten
Concat
box encoder and box decoder
2. matcher_builder:
argmax_matcher / bipartite_matcher
3. region_similarity_calculator_builder:
iou/ioa/neg_sq_dist_similarity
4. losses_builder:
softmax_loss, hard_example_miner, target_assigner,
smoothL1 tf.image.non_max_suppression,
tf.gather ... and so on.
Transpose
Flatten
Concat
box decoder,
batch_multiclass_non_max_suppression,
multiclass_non_max_suppression,
tf.image.non_max_suppression
SSD on Fluid.
1). anchor_box_op: generate anchors on the fly corresponding to one CNN layer.
2). Python API of prior_box_op: must handle multiple CNN layers.
3). iou_similarity_op: compute similarity based on Intersection over Union (IOU) metric.
4). bipartite_match_op
5). box_coder_op
6). softmax_with_loss_op: compute the confidence loss for each prior classification prediction
6). mine_hard_examples_op
7). target_assign_op
prediction, it's a LoDTensor with shape [N, Np, 4]
8). smooth_l1_op
9). batch_multiclass_nms_op
10). transpose_op, concat_op, softmax_with_loss_op and smooth_l1_op. These operators have been implemented.
Data Struct
In the Caffe, since each input image may have different number of ground-truth boxes, and for the convenience of calculation, the input
Blobs
(similar to tensor) for the ground-truth boxes and anchor boxes are converted intostd::map<int, vector<NormalizedBBox> >
andstd::vector<NormalizedBBox>
. TheNormalizedBBox
is a struct.But in Fluid, we have LoDTensor, maybe Tensor(or LoDTensor)-in/Tensor(or LoDTensor)-out for each operator is enough.
If there is any problem with the above descriptions, please help to correct it. Thank you.
The text was updated successfully, but these errors were encountered: