A few questions #1

nemtiax · 2021-05-07T20:59:42Z

Really cool work! I'm going through the paper and the code and I had a few questions:

Is there an easy way to run inference on an individual image? It looks like the answer is no, but it shouldn't too hard for me to patch something together using evaluate.py as a guide. I figure'd I'd check before going through the effort, though.
I'm having a bit of trouble understanding the output format. After running evaluate.py, I get a bunch of pkl files in results/eval/test/gt//original/m6_p1_s4096_k1_ic10_rs1. Are these predictions or ground truth (unsure because of the 'gt' in the path)?
When looking at a 9-tuple, like the ones found in the 'best_models' entry of the pkl files like this:

[ 0.8778091 , 1.0231334 , 0.01 , -1.4559864 , -1.8047982 , -0.76489663, -0.14193119, -0.21265107, 1.7678456 ]

My understanding is that the first three values are extents in x, y and z and the last three values are the translation (so effectively the center coordinates?). But I'm not entirely sure what to make of the middle three values - I think they're the rotation, but I was naively expecting four values for angle-axis format. I'd like to transform these into a rotation matrix, 3D geometry is not my strong suit, so I suspect I'm missing something obvious.

Thanks for any help you can provide!

fkluger · 2021-05-08T12:24:15Z

Thanks a lot for your feedback, always appreciated!

Modifying evaluate.py for inference on individual images should be relatively trivial, but I haven't done that yet. I am, however, planning to add a separate demo script for that purpose. (Might take a little while though, because of other priorities.) Right now the main purpose of the code is to reproduce the results from the paper on the NYU dataset.

The gt in the path indicates that the results were computed using the 'ground truth' depth, i.e. the depth maps provided by the NYU dataset. For RGB input, you have to run python evaluate.py --depth_model bts, and the results are then stored in results/eval/test/bts/.....

The 9-tuple consists of [a_x, a_y, a_z, r_1, r_2, r_3, t_x, t_y, t_z].
a = [a_x, a_y, a_z] is the size of the cuboid.
r = [r_1, r_2, r_3] (rotation) and t = [t_x, t_y, t_z] (translation) refers to the pose of the cuboid, with r being the angle-axis notation of the rotation, so you are basically right. The angle-axis notation only has three values, as r/norm(r) denotes the axis of rotation, and norm(r) is the angle.
You can convert r into a rotation matrix R using the torchgeometry lib for example:
R = tgm.angle_axis_to_rotation_matrix(r)
A 3D point x is then transformed into the cuboid centric coordinate frame via R(x-t), i.e. R @ (x-t) in Python.

vibe007 · 2021-06-21T22:22:12Z

It would be super helpful to see a side-by-side render of the test image and predicted cuboids when testing the code (for example, the images you have in the paper)! Right now I'm just getting pkls...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few questions #1

A few questions #1

nemtiax commented May 7, 2021

fkluger commented May 8, 2021

vibe007 commented Jun 21, 2021

A few questions #1

A few questions #1

Comments

nemtiax commented May 7, 2021

fkluger commented May 8, 2021

vibe007 commented Jun 21, 2021