Alternate depth normalization #10

jbrownkramer · 2024-03-04T23:02:32Z

The justification in the paper for using disparity is "scale normalization". I know that this comes from OmniVore and ImageBind.
However, this does not actually achieve scale normalization.

What could scale normalization mean? disparity images are not scale invariant in the way RGB images are: If you bring a thing closer it will have larger disparities, as opposed to RBG images where colors stay the same. Instead it must mean something like: two images with the same "disparity" should take up the same number of pixels.

To achieve this, you should use f/depth instead of bf/depth. This makes sense because b is an arbitrary value associated with the particular camera setup that you have, and it provides you no information about the geometry of the scene you are looking at. If you change b physically, the depth image does not change, but the disparity does.

One other suggested improvement: when you resize to 224, you're actually implicitly changing the focal length. So if h is the original height of the image, I would suggest computing "disparity" as

(224/h)f/depth

If normalization is having any positive affect, I bet this improved normalization will do better.

StanLei52 · 2024-03-05T02:03:40Z

Thank you so much for pointing this out and for your insightful suggestion! I will look into this and experiment with this normalization you mentioned in the depth-related experiments, to see whether it yields better performance.

jbrownkramer · 2024-03-05T19:31:44Z

Another comment is that the use of RandomResizedCrop in augmentation during training might largely break the connection between scale and the object being imaged. It might be good to maintain that information by applying the same (224/h) scaling factor during training (where h is the height of the crop). Actually, since w can be very different from h in RandomResizedCrop, something like sqrt((224/h)*(224/w)) might be better since it preserves information about the area of the object.

StanLei52 · 2024-03-08T11:48:45Z

@jbrownkramer Thank you for your comments. If possible, could you please provide your implementation as you mentioned so that I can find some time later to conduct experiments on this? Thanks.

jbrownkramer · 2024-03-08T18:08:11Z

Here you go. You should be able to replace RandomResizedCrop in RGBD_Processor_Train with this. It is untested code, FYI.

import torch
import torchvision.transforms.functional as F
from torchvision.transforms import RandomResizedCrop

class RandomResizedCropAndScale(RandomResizedCrop):
    """
    Crop a random portion of image and resize it to a given size. Scale it by the sqrt of the ratio in areas between the new image and the crop size in the original image, but only apply this scaling to the final channel.
    """

    def __init__(self, size, *args, **kwargs):
        super().__init__(size, *args, **kwargs)

    def forward(self, img):
        """
        Args:
            img (PIL Image or Tensor): Image to be cropped and resized.

        Returns:
            Tensor: Randomly cropped, resized, and scaled image.
        """
        i, j, h, w = self.get_params(img, self.scale, self.ratio)
        cropped_and_resized_img = F.resized_crop(img, i, j, h, w, self.size, self.interpolation, antialias=self.antialias)

        # Convert the cropped and resized image to a tensor if it's not already
        if not isinstance(cropped_and_resized_img, torch.Tensor):
            cropped_and_resized_img = F.to_tensor(cropped_and_resized_img)

        _, height, width = F.get_dimensions(cropped_and_resized_img)
        
        scale_factor = torch.sqrt((height * width) / (h * w))
        
        scaled_img = cropped_and_resized_img.clone()  # Clone to avoid in-place modification issues
        scaled_img[-1, :, :] *= scale_factor  # This applies scale_factor to the last channel


        return scaled_img

You should also be able to replace the __call__ function in RGBD_Processor_Eval with

    def __call__(self, depth):
        # here depth refers to disparity, in torch savefile format
        # note use ToTensor to scale image to [0,1] first
        img = torch.randn((3, 224, 224))

        if depth.ndim == 2:
            depth = depth.unsqueeze(0)
			
	scale = 224/depth.shape[0]

        rgbd = torch.cat([img, depth * scale], dim=0)
        transform_rgbd = self.rgbd_transform(rgbd)
        img = transform_rgbd[0:3, ...]
        depth = transform_rgbd[3:4, ...]
        
        return depth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternate depth normalization #10

Alternate depth normalization #10

jbrownkramer commented Mar 4, 2024 •

edited by StanLei52

Loading

StanLei52 commented Mar 5, 2024

jbrownkramer commented Mar 5, 2024

StanLei52 commented Mar 8, 2024

jbrownkramer commented Mar 8, 2024

Alternate depth normalization #10

Alternate depth normalization #10

Comments

jbrownkramer commented Mar 4, 2024 • edited by StanLei52 Loading

StanLei52 commented Mar 5, 2024

jbrownkramer commented Mar 5, 2024

StanLei52 commented Mar 8, 2024

jbrownkramer commented Mar 8, 2024

jbrownkramer commented Mar 4, 2024 •

edited by StanLei52

Loading