-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node topic statistics broken #1626
Comments
I experienced the same behavior (#1621 - got closed). Would be nice if it will be fixed. |
#1621 thanks, we found the same bug at the same time , lel |
…erformance optimization included
I proposed the same code change in #1627 as described in #1625. It would work bit that is not enough, the bigger picture is a performance issue aswell. (we are talking about a function that is called EVERY time) I tested it and as described in the images above, it will fix the 'CPU leaking" (ultrahigh performance hunger overload when used in a busy - topic heavy - environment) I understand that these are two separate things, I just want to get a better picture of the necessary steps towards this feature. |
I also investigated a bit more. So for my test:
Edit 1: This is without the patch proposed in #1625, im working on Ubuntu 16.04, Kernel 4.15.0-45-generic with x86 64 bit and use Im using another node to monitor values of processes. This information's are collected with the psutil library. I want to share the plots of the process information's since they show that there is a problem. The first plot is without statistics enabled: The second plot is with statistics enabled: One can see different process information of the subscriber-node "/Sub20/" over the time. These are described in detail in the psutil documentation (https://psutil.readthedocs.io/en/latest/). Note that the plots has different scaling's depending on the actual value's. I just want to point out the difference in CPU-usage - from around 6-10 % without statistics enabled up to 10-90 % with statistics enabled - and an increasing in physical memory usage over time with statistics enabled - see memory_info_rss. Surely not much but still visible. CPU usage is in the top left corner. Edit 2: The following plot shows on the x-axis the time and on the y-axis number of evaluated messages per statistic-message. The topic has a high frequency (100Hz) and one can see the effect of the dynamic time window in combination with wrong use of the frequency value leading to only 4 messages in 120 sec. There is a correlation between the time a statistic message was send and:
Edit 3: Checking out an other setup i changed the parameter /statistics_window_max_size to 10 seconds and this somehow fixes high cpu usage and increase in memory usage. There are still some differences in comparison to the plot without statistics enabled but nothing looks critical to me. What looks more of a problem to me is that there are still connections with only a few or no messages on statistics. This might be caused by the high frequency and large number of (20) subscriber's in one node. |
Hello,
after we tried testing the contents of the /statistics topic via 'rosparam set enable_statistics true' in our robot, our hosts cpu usage went nuts.
After that i tried to debug the roscpp statistics class ...
As can be seen in the image, the spikes occur when the class calculates and publishes it's contents. IMPORTANT: the time window is of the publishes is doubled over time (halfing the frequency each time)
I found the problem here (kinetic and melodic):
ros_comm/clients/roscpp/src/libros/statistics.cpp
Line 116 in 194c776
This line has to be changed to this:
if (stats_it->second.last_publish + ros::Duration(1/pub_frequency_) < received_time)
By doing that, we get a much more stable (and not so resource heavy) frequency in the timing analysis:
I didn't like the performance and the way this
std::map<std::string, struct StatData> map_
object was handled inside the callback ...so i tried to tweak some things (by only using the original iterator and directly changing the data structures instead of doing multiple (slow, binary tree) map lookups of the topic names. (i guess a
<unordered_map>
is not suitable because there are some code parts, where the order of the elements inside the map matters.The results, after messing around with the code and changing each access to the map to it's original iterator, can be seen here:
i can provide a merge request if needed, but i first want to get some feedback on this issue ...
I really dont want to deploy my own fork of roscpp, so i tried providing my results here.
Maybe there's something wrong with our system or i did a mistake, so please lecture me if i did a boo boo :) ...
thanks.
Nick Fiege
StreetScooter GmbH
Autonomous Logistics
The text was updated successfully, but these errors were encountered: