Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PreFlightCheck: add checks for CPU load #14570

Merged
merged 1 commit into from
Apr 16, 2020
Merged

PreFlightCheck: add checks for CPU load #14570

merged 1 commit into from
Apr 16, 2020

Conversation

MaEtUgR
Copy link
Member

@MaEtUgR MaEtUgR commented Apr 2, 2020

Describe problem solved by this pull request
We had cases where people were accidentally taking of with a too highly loaded system even already when it was disarmed. There can be a lot of reasons like custom WIP changes or a combination of board and configuration that exceeds the expected load.

Describe your solution
I add a preflight check that will prevent arming depending on CPU load because it's a low hanging fruit and would have saved crashes before.

Test data / coverage
Untested so far, SITL doesn't provide cpuload, not even fake so I'll test it on a pixhawk 4.

if (hrt_elapsed_time(&cpuload_sub.get().timestamp) < 200_ms) {

// avionics rail
// Check avionics rail voltages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment

@dagar
Copy link
Member

dagar commented Apr 2, 2020

A few notes that immediately come to mind.

  1. For cpuload I think it should be an average to prevent any false positives from a small temporary cpu spike).
  2. Similarly we should add something that prevents arming if the logger watchdog had triggered (boosting the logger priority to max). If that happens prematurely it can really drag the rest of the system down. This is actually why I introduced the logger_status message.
  3. We should add a simple cpuload publisher for linux.
  4. Instead of the check silently succeeding when the message isn't present I think it would be better to have it fail. On platforms where it's not available (or not wanted) we can have a parameter to explicitly disable it.

@MaEtUgR
Copy link
Member Author

MaEtUgR commented Apr 2, 2020

@dagar Good points, thanks for your ideas. I'll try to cover them.
For 2. I'm not clear how that would look like since logger_status doesn't contain a "watchdog_triggered" flag and I don't know the details.

@dagar
Copy link
Member

dagar commented Apr 2, 2020

For 2. I'm not clear how that would look like since logger_status doesn't contain a "watchdog_triggered" flag and I don't know the details.

Right, it needs to be added. Similarly something could be added here to require an sd card for operation (optionally) and possibly the offboard logging scenario as well (#13200).

@MaEtUgR
Copy link
Member Author

MaEtUgR commented Apr 9, 2020

  1. For cpuload I think it should be an average to prevent any false positives from a small temporary cpu spike).

I just wanted to add a filter and checked the sampling of the cpu load in load_mon: It gets ampled with 1Hz https://github.com/PX4/Firmware/blob/74e99faedff29b863bb374f4e046e894dc6e4784/src/modules/load_mon/load_mon.cpp#L68 and already takes the average during that time: https://github.com/PX4/Firmware/blob/74e99faedff29b863bb374f4e046e894dc6e4784/src/modules/load_mon/load_mon.cpp#L192 I would have configured the filter with ~1 second time constant so it wouldn't make sense.

@dagar
Copy link
Member

dagar commented Apr 9, 2020

  1. For cpuload I think it should be an average to prevent any false positives from a small temporary cpu spike).

I just wanted to add a filter and checked the sampling of the cpu load in load_mon: It gets ampled with 1Hz

https://github.com/PX4/Firmware/blob/74e99faedff29b863bb374f4e046e894dc6e4784/src/modules/load_mon/load_mon.cpp#L68

and already takes the average during that time:
https://github.com/PX4/Firmware/blob/74e99faedff29b863bb374f4e046e894dc6e4784/src/modules/load_mon/load_mon.cpp#L192

I would have configured the filter with ~1 second time constant so it wouldn't make sense.

Ok, thanks for looking into it. I might look into getting a higher rate cpuload published (maybe with average included) later. Those occasional cpu spikes are now somewhat more concerning...

@MaEtUgR MaEtUgR force-pushed the cpu-load-check branch 2 times, most recently from c4567d8 to 17778ee Compare April 9, 2020 16:00
@MaEtUgR
Copy link
Member Author

MaEtUgR commented Apr 9, 2020

  1. We should add a simple cpuload publisher for linux.
  2. Instead of the check silently succeeding when the message isn't present I think it would be better to have it fail. On platforms where it's not available (or not wanted) we can have a parameter to explicitly disable it.

I added a parameter that lets you set the threshold and disable the check. So it doesn't silently pass but has to be disabled for SITL.

Can we iterate in a separate pr on adding the logger_status and SITL CPU load publishing?

@MaEtUgR MaEtUgR marked this pull request as ready for review April 9, 2020 16:08
@dagar
Copy link
Member

dagar commented Apr 9, 2020

Can we iterate in a separate pr on adding the logger_status and SITL CPU load publishing?

Yes.

@MaEtUgR MaEtUgR force-pushed the cpu-load-check branch 3 times, most recently from 1d076c5 to 21d8583 Compare April 15, 2020 12:10
@MaEtUgR
Copy link
Member Author

MaEtUgR commented Apr 15, 2020

I rebased because there was a conflict with #14628 and added an esccaped % to the error message. I tested it on a pixhawk 4 and it works like expected when I start hogging the CPU too much with unnecessary operations.

@dagar Can we get this in as a first iteration?

@MaEtUgR MaEtUgR changed the title PreFlightCheck: add checks for CPU and RAM load PreFlightCheck: add checks for CPU load Apr 15, 2020
@MaEtUgR
Copy link
Member Author

MaEtUgR commented Apr 16, 2020

I rebased again after a conflict with #14633. Still ready in my eyes.

julianoes
julianoes previously approved these changes Apr 16, 2020
src/modules/commander/commander_params.c Outdated Show resolved Hide resolved
Copy link
Contributor

@julianoes julianoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

       _~
    _~ )_)_~
    )_))_))_)
    _!__!__!_
    \______t/
  ~~~~~~~~~~~~~
  Ship it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants