-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] Re-write KF Hungarian Node on C++ to avoid performance pitfalls #30
Comments
These are discrete objects / detections (e.g. lines, human models, etc) not laser scans, so there shouldn't be that many, but you're entirely right that this should be moved to C++ to get performance improvements since this algorithm is naturally going to be pretty slow as it is. There could still be dozens of obstacles being tracked in a given scene. I think its worth us discussing internally if this is still a sensible thing to work on this year given we've leading into November 1 with about ~7 weeks of useful time remaining in 2022 before the holidays (which I assume we're getting very little done during). I would say though converting this node to C++ would be a nice, discrete, and manageable project that could set us up well for when we revisit this to only need to worry about the detectors. Though it could be that we have other things we can work on instead that are more inline with what we've been doing so far in 2022 so we can start fresh in 2023. |
To be honest, I just tried this with some data we had. And it wasn't horribly non-performant, at least on my developer laptop. It took about ~10% of a core. I was expecting something drastically more. Especially since:
@AlexeyMerzlyakov how many "obstacles" did you have to make it take ~1 second per callback, if I might ask? |
Sure. For simple testing I did not use any objects detecting / ML front-end. Instead of this, I've just treated all laser scan points, each as obstacles. So, number of obstacles was around 340-360 per a frame. Yes, this is far from the situation where real obstacles will be detected by some detection front-end (connected to neural networks or something else), but anyway isn't reasonable to start from (even in rooms with lots of objects, where the number might be similar, I believe)? In this case, KF Hungarian algorithm will match nearest points as the same obstacle and then will do the dynamic detection through the Kalman filter.
This is not about CPU usage. Python as a translation language is working slowly, even not eating 100% of CPU. So, in this ticket we are talking about processing time of each frame, rather than CPU/RAM loads. @vinnnyr, may I ask you to share, which obstacle detectors are you using, how many obstacles (in average) do you operating with and what processing time per frame was on your system, to understand the extent of the problem in more real applications (comparing that I've been using)? |
Certainly, but also why we have set which classes of interest we track so that we only track moving things. So I think this makes alot of sense to have in C++. I've seen some forks where people are doing some development based on this repo recently and I think it would be a nice capability to offer even if we haven't formally finished the detectors yet, so I think its worth spending some time this year on a C++ version of this node in particular - for all the reasons @AlexeyMerzlyakov mentions and that it is getting some traction. This will be the foundation we build our dynamic pipeline empire from! |
So the obstacle detector I am using is a Euclidean Cluster off of a lidar point cloud (with a bunch of preprocessing). I think in a given frame, we only expect to have ~10 objects max. I haven't yet measured latency (this was just an experiment, not something we are actually running yet), the latency seemed acceptable but obviously more scrutiny is needed from my end. |
During the writing of simple laser scanner obstacles detector for KF Hungarian node, I've experienced very slow performance of main callback from
kf_hungarian_node.py
, which is around~1.3 seconds
per each callback call. Typically, laser scanner produces 360 points (obstacles) which makes the following part of code to be the performance bottleneck ofO(360*360)
complexity:This part executes average in
~1 second
on my PC. Allowing the simplification ofnp.linalg.norm()
heuristic toabs(delta_x) + abs(delta_y) + abs(delta_z)
decreases the execution time of the part of code above up to~0.3 seconds
which is better, but not enough for real-time processing of laser scanners, operating e.g. at 5-50Hz frequencies.The simple experiment repeating the part of hot code from above on both Python and C++ interfaces, attached to current ticket as hotcode_check.zip archive.
Execution estimates gives
~25x
speed increase when switching on C++ for this hot part of the code:Therefore, I am proposing (and ready to try) to re-write the KF Hungarian node on a C++ which is expected to make callbacks to operate much faster.
The following table shows the migration of base methods for Python->C++ KF node code:
What do you think about such activity?
The text was updated successfully, but these errors were encountered: