-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate depth normalization #10
Comments
Thank you so much for pointing this out and for your insightful suggestion! I will look into this and experiment with this normalization you mentioned in the depth-related experiments, to see whether it yields better performance. |
Another comment is that the use of RandomResizedCrop in augmentation during training might largely break the connection between scale and the object being imaged. It might be good to maintain that information by applying the same (224/h) scaling factor during training (where h is the height of the crop). Actually, since w can be very different from h in RandomResizedCrop, something like sqrt((224/h)*(224/w)) might be better since it preserves information about the area of the object. |
@jbrownkramer Thank you for your comments. If possible, could you please provide your implementation as you mentioned so that I can find some time later to conduct experiments on this? Thanks. |
Here you go. You should be able to replace RandomResizedCrop in RGBD_Processor_Train with this. It is untested code, FYI.
You should also be able to replace the
|
The justification in the paper for using disparity is "scale normalization". I know that this comes from OmniVore and ImageBind.
However, this does not actually achieve scale normalization.
What could scale normalization mean? disparity images are not scale invariant in the way RGB images are: If you bring a thing closer it will have larger disparities, as opposed to RBG images where colors stay the same. Instead it must mean something like: two images with the same "disparity" should take up the same number of pixels.
To achieve this, you should use f/depth instead of bf/depth. This makes sense because b is an arbitrary value associated with the particular camera setup that you have, and it provides you no information about the geometry of the scene you are looking at. If you change b physically, the depth image does not change, but the disparity does.
One other suggested improvement: when you resize to 224, you're actually implicitly changing the focal length. So if h is the original height of the image, I would suggest computing "disparity" as
(224/h)f/depth
If normalization is having any positive affect, I bet this improved normalization will do better.
The text was updated successfully, but these errors were encountered: