Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the bilinear in your implementation. #1

Closed
tianzhi0549 opened this issue Mar 26, 2020 · 29 comments
Closed

On the bilinear in your implementation. #1

tianzhi0549 opened this issue Mar 26, 2020 · 29 comments

Comments

@tianzhi0549
Copy link

masks = interpolate(masks_per_image, size = (o_h,o_w), mode="bilinear", align_corners=False)

The default bilinear in PyTorch is not aligned, which might much degrade the performance, in particular for small objects.

Please try the aligned bilinear.

def aligned_bilinear(tensor, factor):
    assert tensor.dim() == 4
    assert factor >= 1
    assert int(factor) == factor

    if factor == 1:
        return tensor

    h, w = tensor.size()[2:]
    tensor = F.pad(tensor, pad=(0, 1, 0, 1), mode="replicate")
    oh = factor * h + 1
    ow = factor * w + 1
    tensor = F.interpolate(
        tensor, size=(oh, ow),
        mode='bilinear',
        align_corners=True
    )
    tensor = F.pad(
        tensor, pad=(factor // 2, 0, factor // 2, 0),
        mode="replicate"
    )

    return tensor[:, :, :oh - 1, :ow - 1]
@Epiphqny
Copy link
Owner

@tianzhi0549 Thanks for pointing out it, i will try and update the new result later.

@tianzhi0549
Copy link
Author

@Epiphqny I also note that it seems you are using absolute coordinates as the input to the mask heads, which is not correct. It is important to use relative coordinates here because we hope the generated filters are position-independent.

@Epiphqny
Copy link
Owner

@tianzhi0549 The coordinates in this implementation is ranged from -1 to 1, what do you mean by relative coordinates, should it be 0-1 instead?

@tianzhi0549
Copy link
Author

@Epiphqny aim-uofa/AdelaiDet#10. You can refer to the explanation here.

@Epiphqny
Copy link
Owner

@tianzhi0549 Ok, i will try that.

@Epiphqny
Copy link
Owner

@tianzhi0549 It sounds that the relative coordinates is in some way like the center-ness...but implements in different approach, just my opinion.

@tianzhi0549
Copy link
Author

@tianzhi0549 They may be similar in some aspects, but they are designed for totally different purposes ...

@Epiphqny
Copy link
Owner

@tianzhi0549 Yes, both are interesting ideas!

@Epiphqny
Copy link
Owner

Epiphqny commented Apr 3, 2020

@tianzhi0549 Hi, i replaced the original upsample with the aligned version and used the upsampled mask to calculate loss, now the AP is 37.1. But this is still the absolute coordinate version, i will update new results after the training of relative coordinate version finished.

@tianzhi0549
Copy link
Author

@Epiphqny Great! For the memory usage issue, you could limit the maximum number of samples used to compute masks during training. Using relative coordinates might also much boost the performance.

@Epiphqny
Copy link
Owner

Epiphqny commented Apr 7, 2020

@tianzhi0549 Perhaps there is some problem in my implementation of relative coordinates, it only achieves 36.9 mAP, which is worse than the absolute coordinate version.

@tianzhi0549
Copy link
Author

@Epiphqny if possible, you can push your code to a new branch of the repo. I can help check it.

@Epiphqny
Copy link
Owner

Epiphqny commented Apr 7, 2020

Hi @tianzhi0549, i have add the code in the relative_coordinate branch, thank you very much for the help!

@tianzhi0549
Copy link
Author

@Epiphqny Are you sure this line is correct?

x_range = torch.linspace(-1, 1, w, device=self.masks.device)

@Yuxin-CV
Copy link

Yuxin-CV commented Apr 8, 2020

@Epiphqny Hi~Thanks for sharing your code!
It seems that the setting of IMS_PER_BATCH and BASE_LR in your config is incorrect.

IMS_PER_BATCH: 4
BASE_LR: 0.01 # Note that RetinaNet uses a different default learning rate

IMS_PER_BATCH and BASE_LR should be changed together according to Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., IMS_PER_BATCH = 4 & BASE_LR = 0.0025.

I also find the similar problem in your YOLACT_FCOS repo:
https://github.com/Epiphqny/Yolact_fcos/blob/b131542a930499523343d3fd660088e7e372c317/configs/Yolact/Base-FCOS.yaml#L16-L18

Though changing IMS_PER_BATCH and BASE_LR according to Linear Scaling Rule cannot guarantee to reproduce the results in the paper, but I think it can help you obtain very close result. @tianzhi0549 @Epiphqny

@Epiphqny
Copy link
Owner

Epiphqny commented Apr 8, 2020

@Yuxin-CV Thank you very much for pointing out that, i will try the Linear Scaling Rule later.

@Epiphqny
Copy link
Owner

Epiphqny commented Apr 8, 2020

@tianzhi0549 sorry, can not find the problem in this line, could you point out it directly?

@tianzhi0549
Copy link
Author

@Epiphqny I would suggest that you compute all the coordinate transformation on the scale of the input image. After you get the final relative coordinates, you can normalize them by a constant scale. Please make sure even after normalization, the locations generating the filters should always be at (0, 0).

@Epiphqny
Copy link
Owner

Epiphqny commented Apr 8, 2020

@tianzhi0549 I have subtracted the center coordinate in

coords_feat = grid-offset_xy
, and the values of center locations are zero.

@Yuxin-CV
Copy link

Yuxin-CV commented Apr 8, 2020

@Yuxin-CV Thank you very much for pointing out that, i will try the Linear Scaling Rule later.

Personally, I think you can try R-50 1x lr_schedule with input_size = 800, batch_size = 16 first before using stronger backbone and longer lr_schedule. You can get the results in less than 1 day if you have access to 4 or 8 GPU.
Looking forward to your result! @Epiphqny

@Yuxin-CV
Copy link

Yuxin-CV commented Apr 8, 2020

BTW, I wonder how you @tianzhi0549 implement the forward_mask() part in the official code?
Do you simplely use a for loop just like @Epiphqny's implementation:

# for each image
for i in range(N):
inds = (im_idxes==i).nonzero().flatten()
ins_num = inds.shape[0]
if ins_num > 0:
controllers = controllers_pred[inds]
mask_feat = masks_feat[None, i]
weights1 = controllers[:, :80].reshape(-1,8,10).reshape(-1,10).unsqueeze(-1).unsqueeze(-1)
bias1 = controllers[:, 80:88].flatten()
weights2 = controllers[:, 88:152].reshape(-1,8,8).reshape(-1,8).unsqueeze(-1).unsqueeze(-1)
bias2 = controllers[:, 152:160].flatten()
weights3 = controllers[:, 160:168].unsqueeze(-1).unsqueeze(-1)
bias3 = controllers[:,168:169].flatten()
conv1 = F.conv2d(mask_feat,weights1,bias1).relu()
conv2 = F.conv2d(conv1, weights2, bias2, groups = ins_num).relu()
#masks_per_image = F.conv2d(conv2, weights3, bias3, groups = ins_num)[0].sigmoid()
masks_per_image = F.conv2d(conv2, weights3, bias3, groups = ins_num)
masks_per_image = aligned_bilinear(masks_per_image, self.strides[0])[0].sigmoid()
for j in range(ins_num):
ind = inds[j]
mask_gt = masks_t[i][matched_idxes[ind]].float()
mask_pred = masks_per_image[j]
mask_loss += self.dice_loss(mask_pred, mask_gt)

or some other highly optimized implementation, e.g., a CUDA kernel?

@Yuxin-CV
Copy link

Yuxin-CV commented Apr 8, 2020

Hi~@Epiphqny
I also find that the mask_loss's normalization factor N_pos in your code is not reduced.

if batch_ins > 0:
mask_loss = mask_loss / batch_ins

I think it is better to use num_pos_avg as the normalization factor, which is the average of all the positive samples across different GPUs.
pos_inds = torch.nonzero(labels != num_classes).squeeze(1)
num_pos_local = pos_inds.numel()
num_gpus = get_world_size()
total_num_pos = reduce_sum(pos_inds.new_tensor([num_pos_local])).item()
num_pos_avg = max(total_num_pos / num_gpus, 1.0)

mask_loss = mask_loss / num_pos_avg

@Yuxin-CV
Copy link

Yuxin-CV commented Apr 9, 2020

@tianzhi0549 I have subtracted the center coordinate in

coords_feat = grid-offset_xy

, and the values of center locations are zero.

@Epiphqny I think the rel. coord. should be location specific, just like:

For location (x, y) on input_img:
    x_range = torch.arange(W_mask)
    y_range = torch.arange(H_mask)
    y_grid, x_grid = torch.grid(y_range, x_range)
    y_rel_coord = (y_grid – y / mask_stride).normalize_to(-1, 1)
    x_rel_coord = (x_grid – x / mask_stride).normalize_to(-1, 1)
    rel_coord = torch.cat(x_rel_coord, y_rel_coord)

@tianzhi0549 Am I right? Could you provide the official code snippet of rel. coord.? Thanks!

@Epiphqny
Copy link
Owner

Epiphqny commented Apr 9, 2020

@Yuxin-CV Please modify the code and train the model, then report the result here. I will update if there is improvement. I don't have extra GPU to train the model now.

@Yuxin-CV
Copy link

Yuxin-CV commented Apr 9, 2020

@Yuxin-CV Please modify the code and train the model, then report the result here. I will update if there is improvement. I don't have extra GPU to train the model now.

OK

@tianzhi0549
Copy link
Author

@Epiphqny For your information. aim-uofa/AdelaiDet#23 (comment). Thank you:-).

@Epiphqny
Copy link
Owner

@tianzhi0549 Ok, thanks for providing the code.

@guangdongliang
Copy link

guangdongliang commented Apr 9, 2021

@tianzhi0549 I got the same result in your docker using "aligned_bilinear" and "F.interpolate" !
image

@chufengt
Copy link

@tianzhi0549
One question about aligned_bilinear:
I noticed that other interpolation operations in detectron2 and adet required align_corners =False (e.g. image and mask resize).
Should we change other align_corners to True when using CondInst?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants