Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to return the depth map for just the ROI of the detected object. #125

Open
Luxonis-Brandon opened this issue Jun 1, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@Luxonis-Brandon
Copy link
Contributor

Luxonis-Brandon commented Jun 1, 2020

Start with the why:

As it stands now, DepthAI returns the XYZ location of the detected object using a distance (z) averaged over the padding_factor specified here (i.e. some subset area of the overall bounding box area).

This works for a bunch of usecases, and the padding_factor allows tuning this in for custom objects.

But, for some objects, there may be more information that would be of interest on the host side. And example is when the padding factor ends up averaging depth of an object, and something else that may be within this sub-region of interest.

An example of such an issue is shown below, where the padding factor (shown in blue) is partially over both the person's neck and the wall in the background, so the location of the person and the wall is averaged:

image

To correct the above issue, it is possible to run the inference directly on the right camera with the -cam right option, which will align the depth and object detector exactly. But in some cases, this is undesirable... for example, if the neural inference requires color information.

So as it stands now, if the user on the host wants the depth information corresponding to the region of interest they need to:

  • request the meta_out to get the region of interest
  • request the whole depth frame (1280x720x2bytes = 1.84MB/frame)
  • pull out the ROI from the whole depth frame.

The disadvantage of this is a couple-fold:

  • Some hosts are USB2 only, so 1.84MB/frame is too much to handle at 30FPS (as it's 55MB/s, and USB2 can only handle ~30MB/s) and also some hosts just can't keep up with that data.
  • So on these sorts of hosts, the framerate drops a ton if the user wants to get the depth information from the ROI to do more sophisticated algorithms for pulling out the object location (e.g. pulling out only the front-most depth results above a threshold).

So for the example of a person detector, the ROI is often only say 10% of the frame (or less), so then the bandwidth is now 5.5MB/s (instead of 55MB/s), so easily fitting in USB2 (and way less host-side CPU use).

So if we allowed the option to return the actual depth data for -only- the region of interest (ROI) output from the neural network, this would allow the host to process this depth data w/out the high-bandwidth/high-host-load (and potentially low-frame-rate) associated with having to pull the whole depth frame.

The what:

  • Implement an API option which returns the depth for the region(s) of interest(s) (ROIs) from the object detector.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants