You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As it stands now, DepthAI returns the XYZ location of the detected object using a distance (z) averaged over the padding_factor specified here (i.e. some subset area of the overall bounding box area).
This works for a bunch of usecases, and the padding_factor allows tuning this in for custom objects.
But, for some objects, there may be more information that would be of interest on the host side. And example is when the padding factor ends up averaging depth of an object, and something else that may be within this sub-region of interest.
An example of such an issue is shown below, where the padding factor (shown in blue) is partially over both the person's neck and the wall in the background, so the location of the person and the wall is averaged:
To correct the above issue, it is possible to run the inference directly on the right camera with the -cam right option, which will align the depth and object detector exactly. But in some cases, this is undesirable... for example, if the neural inference requires color information.
So as it stands now, if the user on the host wants the depth information corresponding to the region of interest they need to:
request the meta_out to get the region of interest
request the whole depth frame (1280x720x2bytes = 1.84MB/frame)
pull out the ROI from the whole depth frame.
The disadvantage of this is a couple-fold:
Some hosts are USB2 only, so 1.84MB/frame is too much to handle at 30FPS (as it's 55MB/s, and USB2 can only handle ~30MB/s) and also some hosts just can't keep up with that data.
So on these sorts of hosts, the framerate drops a ton if the user wants to get the depth information from the ROI to do more sophisticated algorithms for pulling out the object location (e.g. pulling out only the front-most depth results above a threshold).
So for the example of a person detector, the ROI is often only say 10% of the frame (or less), so then the bandwidth is now 5.5MB/s (instead of 55MB/s), so easily fitting in USB2 (and way less host-side CPU use).
So if we allowed the option to return the actual depth data for -only- the region of interest (ROI) output from the neural network, this would allow the host to process this depth data w/out the high-bandwidth/high-host-load (and potentially low-frame-rate) associated with having to pull the whole depth frame.
The what:
Implement an API option which returns the depth for the region(s) of interest(s) (ROIs) from the object detector.
The text was updated successfully, but these errors were encountered:
Start with the
why
:As it stands now, DepthAI returns the XYZ location of the detected object using a distance (z) averaged over the
padding_factor
specified here (i.e. some subset area of the overall bounding box area).This works for a bunch of usecases, and the
padding_factor
allows tuning this in for custom objects.But, for some objects, there may be more information that would be of interest on the host side. And example is when the padding factor ends up averaging depth of an object, and something else that may be within this sub-region of interest.
An example of such an issue is shown below, where the padding factor (shown in blue) is partially over both the person's neck and the wall in the background, so the location of the person and the wall is averaged:
To correct the above issue, it is possible to run the inference directly on the right camera with the
-cam right
option, which will align the depth and object detector exactly. But in some cases, this is undesirable... for example, if the neural inference requires color information.So as it stands now, if the user on the host wants the depth information corresponding to the region of interest they need to:
The disadvantage of this is a couple-fold:
So for the example of a person detector, the ROI is often only say 10% of the frame (or less), so then the bandwidth is now 5.5MB/s (instead of 55MB/s), so easily fitting in USB2 (and way less host-side CPU use).
So if we allowed the option to return the actual depth data for -only- the region of interest (ROI) output from the neural network, this would allow the host to process this depth data w/out the high-bandwidth/high-host-load (and potentially low-frame-rate) associated with having to pull the whole depth frame.
The
what
:The text was updated successfully, but these errors were encountered: