How to use custom camera model and camera parameters using onnx inference #103

alpereninci · 2024-06-04T12:17:08Z

I have a custom camera with intrinsic calibration. I have problem with onnx inference. Should I give "cam_model" parameter to model or Is using post process enough?

I think post process is not complete in test_onnx.py.

I would like to inference with "metric3d_vit_small" model.

I see in "do_test.py"

ori_focal = (intrinsic[0] + intrinsic[1]) / 2
canonical_focal = canonical_space['focal_length']
cano_label_scale_ratio = canonical_focal / ori_focal
..
rgb, _, pad, resize_label_scale_ratio = resize_for_input(rgb, forward_size, canonical_intrinsic, [ori_h, ori_w], 1.0)
label_scale_factor = cano_label_scale_ratio * resize_label_scale_ratio

in vit.raft5.small.py config file

max_value = 200
# configs of the canonical space
data_basic=dict(
    canonical_space = dict(
        # img_size=(540, 960),
        focal_length=1000.0,
    ),
    depth_range=(0, 1),
    depth_normalize=(0.1, max_value),
    crop_size = (616, 1064),  # %28 = 0
     clip_depth_range=(0.1, 200),
    vit_size=(616,1064)
)

During onnx inference should I use canonical_space['focal_length'] = 1000 (comes from config) and normalize scale = 1 ( (comes from config)).
How to use cx and cy? Are there important parameters?

Also, what should I do if I change input resolution? Given input resolution is (H,W) = (616, 1064). What would happen and what should I do if I downsample the image resolution to (308, 532)

The text was updated successfully, but these errors were encountered:

YvanYin · 2024-06-06T02:29:03Z

@Owen-Liuyuxuan does onnx support a custom camera?

Owen-Liuyuxuan · 2024-06-06T04:40:31Z

@alpereninci cc: @YvanYin
For now, the onnx scripts in this repo and the provided onnx model do not directly support a custom camera. So we may have to compute the scale outside the onnx computation.

In ros2_vision_inferece, I make a demonstration of how to put camera matrix $P$ as another input to the onnx models and get scaled depth for any perspective cameras.

I trimmed down the projection and coordinate transform codes from ros2_vision_inference to showcase the changes we need:

## Change the model export script
class Metric3DExportModel(torch.nn.Module):
    def __init__(self, meta_arch, is_export_rgb=True):
        super().__init__()
        self.meta_arch = meta_arch
        self.register_buffer('rgb_mean', torch.tensor([123.675, 116.28, 103.53]).view(1, 3, 1, 1).cuda())
        self.register_buffer('rgb_std', torch.tensor([58.395, 57.12, 57.375]).view(1, 3, 1, 1).cuda())
        self.input_size = (616, 1064)

    def normalize_image(self, image):
        image = image - self.rgb_mean
        image = image / self.rgb_std
        return image

    def forward(self, image, P):
        original_image = image.clone()
        image = self.normalize_image(image)
        with torch.no_grad():
            pred_depth, confidence, output_dict = self.meta_arch.inference({'input': image})
            canonical_to_real_scale = (P[:, 0, 0, None, None] + P[:, 1, 1, None, None] ) / 2*1000.0 # 1000.0 is the focal length of canonical camera
            print(canonical_to_real_scale.shape, pred_depth.shape)
            pred_depth = pred_depth * canonical_to_real_scale # now the depth is metric
        return pred_depth

## In testing
dummy_P = np.zeros([1, 3, 4], dtype=np.float32)
outputs = ort_session.run(None, {"image": dummy_image, "P": dummy_P})

Any reshape operations before getting the (616, 1064) should be accompanied by changes in the camera matrix $P$.

But if you are considering changing the input size to the network, I have not succeeded in doing so myself, I am afraid there could be errors inside the ViT network? @YvanYin Any ideas on changing the input to the network?

alpereninci · 2024-06-06T07:03:54Z

Thanks for your reply @Owen-Liuyuxuan. Actually, I am considering changing the input size to network.

Owen-Liuyuxuan · 2024-06-06T07:56:30Z

By slightly changing the input shape to the network, the onnx model works (at least with no weird errors). However, I am not sure about the generalization ability and the metric accuracy. I will give it a test.

Owen-Liuyuxuan · 2024-06-06T08:24:16Z

I have tried on personal data. It works but the canonical camera focal length is not necessarily 500. I believe you could try to fine-tune the parameters for your usage.

For my scene, it is about 1000/sqrt(2), I don't know why though.

BTW, for TensorRT usage, we need to clean up the cache every time before constant parameter changes can apply, so I suggest doing tuning in GPU first.

xenova mentioned this issue Jun 20, 2024

Improved ONNX support with dynamic shapes #117

Closed

YvanYin closed this as completed Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use custom camera model and camera parameters using onnx inference #103

How to use custom camera model and camera parameters using onnx inference #103

alpereninci commented Jun 4, 2024 •

edited

Loading

YvanYin commented Jun 6, 2024

Owen-Liuyuxuan commented Jun 6, 2024

alpereninci commented Jun 6, 2024

Owen-Liuyuxuan commented Jun 6, 2024

Owen-Liuyuxuan commented Jun 6, 2024 •

edited

Loading

How to use custom camera model and camera parameters using onnx inference #103

How to use custom camera model and camera parameters using onnx inference #103

Comments

alpereninci commented Jun 4, 2024 • edited Loading

YvanYin commented Jun 6, 2024

Owen-Liuyuxuan commented Jun 6, 2024

alpereninci commented Jun 6, 2024

Owen-Liuyuxuan commented Jun 6, 2024

Owen-Liuyuxuan commented Jun 6, 2024 • edited Loading

alpereninci commented Jun 4, 2024 •

edited

Loading

Owen-Liuyuxuan commented Jun 6, 2024 •

edited

Loading