Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU load on Pixhawk 4 (fmu-v5) #11604

Closed
DanielePettenuzzo opened this issue Mar 8, 2019 · 18 comments
Closed

High CPU load on Pixhawk 4 (fmu-v5) #11604

DanielePettenuzzo opened this issue Mar 8, 2019 · 18 comments
Assignees
Labels
Board: Pixhawk fmu-v5 FMU-v5 including pixhawk 4, pixhack v5, etc bug

Comments

@DanielePettenuzzo
Copy link
Contributor

DanielePettenuzzo commented Mar 8, 2019

When flashing master I noticed very high CPU loads when arming the vehicle. Depending on my parameters, the CPU load can go between 85 and 100%. Also when disarmed the cpu load is way higher than usual.

Log: https://review.px4.io/plot_app?log=6d8f3085-a624-4cbb-8974-5ae992ba51d7

This log is from my bench setup with no external SPI or I2C sensors.

ph4_cpu_issue

FYI @LorenzMeier @dagar @davids5 @bkueng

@DanielePettenuzzo DanielePettenuzzo added bug Board: Pixhawk fmu-v5 FMU-v5 including pixhawk 4, pixhack v5, etc labels Mar 8, 2019
@DanielePettenuzzo
Copy link
Contributor Author

I did a bisect on master and the commit that causes this issue is the following: b40f8d5

It's when we disabled the d-cache on the F7.

@LorenzMeier
Copy link
Member

@davids5 is there a way to revert the patch if we added additional safeguards?

@davids5
Copy link
Member

davids5 commented Mar 8, 2019

@DanielePettenuzzo - can you provide more information:parameters, top running while armed etc.

@LorenzMeier - iirc original testing indicated +10-15% increase. So we should look for the root cause which may be something we can mitigate.

@DanielePettenuzzo
Copy link
Contributor Author

@davids5 yes I usually run top and then arm the vehicle. Regarding parameters I can see it also with all default parameters. The parameters from this log https://review.px4.io/plot_app?log=6d8f3085-a624-4cbb-8974-5ae992ba51d7 gave me the results in the plot above.

@DanielePettenuzzo
Copy link
Contributor Author

My bench pixhawk is powered with the power module and only connected to a companion computer with an FTDI on TELEM2. Nothing else is plugged into the pixhawk

@davids5
Copy link
Member

davids5 commented Mar 8, 2019

What is the data rate on TELM2?

@DanielePettenuzzo
Copy link
Contributor Author

The following is the mavlink configuration on telem2:
MAV_1_CONFIG = 102
MAV_1_MODE = onboard -> highest data rate mode (eg. imu data at 250 Hz)
MAV_1_RATE = 100000 B/s -> mavlink maximum data rate
SER_TEL2_BAUD = 1500000 -> uart baudrate

Moreover, we don't only have the regular onboard stream but we also enable log streaming to the companion computer on this serial port

Just this telemetry port gives us a ~10% cpu increase

@davids5
Copy link
Member

davids5 commented Mar 8, 2019

@DanielePettenuzzo

I am assuming there is very little data to the FMU. Is that a correct assumption?

The baud rate translates 6.67us per char, the data rate is 10 uS/char. I measured the F4 in the past and it was a ~2 us code path to send a char but spi interrupts would delay that out and limit the bandwidth.

The F7 is different from the F4 because of the d-cache on the rx buffers so I am not clear how this would be affected when off.

I am wondering if flow control is activated and the load increase is related it on the data that is accessed to manage flow control. To see if this affecting it we should measure the load under 3 conditions.

Please try the following: Breakout the the CTS line into the FMU. Leave it connected as normal then once the load has peaked.

  1. Tie it to gnd let it set for 2-3 seconds record the load.
  2. Tie it to 3.3V let it set for 2-3 seconds record the load.
  3. reconnect it.

Is there any effect on the load?

Do a measurement at 1Mbps and see what the load is.

@dagar
Copy link
Member

dagar commented Mar 8, 2019

@DanielePettenuzzo are all 3 mavlink instances being used? mavlink status

Convincing ourselves the d-cache is safe to re-enable might take some time. As a quick work around simply dropping unneeded streams from each mavlink instance will help.

If you're able to fully replicate this on the bench capturing top output at the cpu peak would be helpful. Is it the position controller/flight task changing significantly?

@dagar
Copy link
Member

dagar commented Mar 9, 2019

According to the log flow control wasn't active for the first mavlink instance (telemetry_status_0).

Here's a plot of cpuload (green), mavlink 0 tx rate (red), and navigation state (light blue).

Screen Shot 2019-03-09 at 1 01 54 PM

We should take a closer look at what the position controller (and/or navigator) are doing in the DESCEND nav state that requires nearly 20% of the cpu.

@dagar
Copy link
Member

dagar commented Mar 9, 2019

I'm not suggesting this as a direct fix to the issue, but #11176 will help reduce the mavlink cpu load.

@DanielePettenuzzo
Copy link
Contributor Author

@davids5 we are not using flow control going to the ftdi for the companion computer. I tried anyway what you asked for and it seems that when the vehicle is disarmed the load is about 46%, when I arm it it goes up to 90% and when I connect CTS to either GND or 3.3V it goes up to 98%

@DanielePettenuzzo
Copy link
Contributor Author

Yesterday we had a crash because all motors shut down. The following is a log of the crash: https://review.px4.io/plot_app?log=16f466ec-f02f-4e11-8133-92a7571c7efa I don't know if it's related to the cpu issue or not

@dagar
Copy link
Member

dagar commented Mar 14, 2019

Yesterday we had a crash because all motors shut down. The following is a log of the crash: https://review.px4.io/plot_app?log=16f466ec-f02f-4e11-8133-92a7571c7efa I don't know if it's related to the cpu issue or not

@DanielePettenuzzo this log is much worse with several oddities. Can you share more logs from this vehicle?

Missing sensor messages (not dropped) and estimator time slip throughout the entire log.

image
image

Only a preflight top was captured, but the cpu usage is quite unusual.

image

In particular cpu usage of these high priority processes is 2-3x worse than normal.

PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
278 px4io                       14385  6.893   880/ 1484 252 (252)  w:sem 11
457 ekf2                        51853 21.578  4568/ 6572 250 (250)  w:sem 20
459 mc_att_control              27966 12.187   944/ 1660 251 (251)  w:sem 18

I2C incredibly slow (not the culprit, but should be fixed)

ll40ls_i2c_read: 1472 events, 3456445us elapsed, 2348.13us avg, min 1380us max 14495us 1672.250us rms
ist8310_read: 3166 events, 4547831us elapsed, 1436.46us avg, min 859us max 13214us 1518.052us rms
ist8310_read: 3168 events, 4513333us elapsed, 1424.66us avg, min 861us max 11229us 1541.079us rms

@davids5
Copy link
Member

davids5 commented Mar 14, 2019

I have reviewed the systems ISR timing. I did not have the Lidarlite on I2C to test.

There are no issues with the driver reentering (Interrupt storms).
There are some spi transactions at > 100 uS. But the effects of running the TELEM 2 at 1.5 Mbps is using alot of CPU. 1.5 Mbps is 6.66 us per character. There are several level of ISR service (so the times quoted are less that total).

-common_isr (arm)
--up_doirq - This is instrumented
---usart isr (tx) - This is instrumented

The effects of the dache on the tx portion of the isr are significant. The time in the isr increase to ~2.5 times with the cash off. This starves the CPU.

  IRQ out in us  
No d-cache 100 1.84 4.84 6.68 14.97006
    28% 72%    
d-cache 70 4.8 1.875 6.675 10.48689
    72% 28%    
           

@bys1123
Copy link
Contributor

bys1123 commented Mar 31, 2019

I tried open MAVLink on TELEM2 for RC telemetry, and CPU load is around 60%, here is the log:
https://logs.px4.io/plot_app?log=47609dfb-3d5c-4c68-9e0b-2d549712b997
image

@LorenzMeier
Copy link
Member

Fixed by #11769 once it will get merged.

@thomasgubler
Copy link
Contributor

Fixed by #11769 once it will get merged.

#11769 was merged, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Board: Pixhawk fmu-v5 FMU-v5 including pixhawk 4, pixhack v5, etc bug
Projects
None yet
Development

No branches or pull requests

7 participants