Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few questions #1

Open
nemtiax opened this issue May 7, 2021 · 2 comments
Open

A few questions #1

nemtiax opened this issue May 7, 2021 · 2 comments

Comments

@nemtiax
Copy link

nemtiax commented May 7, 2021

Really cool work! I'm going through the paper and the code and I had a few questions:

  1. Is there an easy way to run inference on an individual image? It looks like the answer is no, but it shouldn't too hard for me to patch something together using evaluate.py as a guide. I figure'd I'd check before going through the effort, though.

  2. I'm having a bit of trouble understanding the output format. After running evaluate.py, I get a bunch of pkl files in results/eval/test/gt//original/m6_p1_s4096_k1_ic10_rs1. Are these predictions or ground truth (unsure because of the 'gt' in the path)?

  3. When looking at a 9-tuple, like the ones found in the 'best_models' entry of the pkl files like this:

[ 0.8778091 , 1.0231334 , 0.01 , -1.4559864 , -1.8047982 , -0.76489663, -0.14193119, -0.21265107, 1.7678456 ]

My understanding is that the first three values are extents in x, y and z and the last three values are the translation (so effectively the center coordinates?). But I'm not entirely sure what to make of the middle three values - I think they're the rotation, but I was naively expecting four values for angle-axis format. I'd like to transform these into a rotation matrix, 3D geometry is not my strong suit, so I suspect I'm missing something obvious.

Thanks for any help you can provide!

@fkluger
Copy link
Owner

fkluger commented May 8, 2021

Thanks a lot for your feedback, always appreciated!

Modifying evaluate.py for inference on individual images should be relatively trivial, but I haven't done that yet. I am, however, planning to add a separate demo script for that purpose. (Might take a little while though, because of other priorities.) Right now the main purpose of the code is to reproduce the results from the paper on the NYU dataset.

The gt in the path indicates that the results were computed using the 'ground truth' depth, i.e. the depth maps provided by the NYU dataset. For RGB input, you have to run python evaluate.py --depth_model bts, and the results are then stored in results/eval/test/bts/.....

The 9-tuple consists of [a_x, a_y, a_z, r_1, r_2, r_3, t_x, t_y, t_z].
a = [a_x, a_y, a_z] is the size of the cuboid.
r = [r_1, r_2, r_3] (rotation) and t = [t_x, t_y, t_z] (translation) refers to the pose of the cuboid, with r being the angle-axis notation of the rotation, so you are basically right. The angle-axis notation only has three values, as r/norm(r) denotes the axis of rotation, and norm(r) is the angle.
You can convert r into a rotation matrix R using the torchgeometry lib for example:
R = tgm.angle_axis_to_rotation_matrix(r)
A 3D point x is then transformed into the cuboid centric coordinate frame via R(x-t), i.e. R @ (x-t) in Python.

@vibe007
Copy link

vibe007 commented Jun 21, 2021

It would be super helpful to see a side-by-side render of the test image and predicted cuboids when testing the code (for example, the images you have in the paper)! Right now I'm just getting pkls...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants