-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sensors health reporting improvements #13025
Sensors health reporting improvements #13025
Conversation
…riptions or sensors enabled param changes
…cs and health flags
…tches the sensor instance. Report preflight metrics
…s. Addapt the health_flag_helper to this changes
…d by the sensors module, when performing the preflight checks
…e result to the vahicle_status
note: changing the vehicle status health flag field from uint32 to uint64 variable, requires the same changes in the mavlink message since mavlink messages just copies the value over form uorb to mavlink msg. Although at the moment there is exactly 32 enum values defined so it shouldnt be a problem |
There's quite a lot to digest here and some of this will need careful/deliberate testing. It might ultimately be easier to bring it in separate chunks where we can exhaustively test a single thing at a time.
Why does it make sense for this to be user configurable? Shouldn't it be information provided by each board? |
Unfortunately we can't change the existing fields in mavlink. This would have to be new fields at the end or a new message entirely. |
One thought here to consider is if we should be leaning towards using device ids whenever possible instead of indexes. These are ultimately much easier to decode back to the particular sensor, bus, calibration, etc.
I'm not really a fan of the subsystem_info topic. I think we can do better than Present/Enabled/Ok, and it's not very helpful for debugging logs. A higher level sensor status message (maybe by type like sensor_accel_status) could be interesting. Something that helps you reconstruct the valuable data equivalent to |
@dagar thanks for the review. I would be interested in working on this if it is something that would end up being merged. It is essential for our application to have real time sensors health reporting and flight readiness status. |
@dusan19 I think it would be great if you could make a pull request just for the two bugs that you found and mentioned.
As @dagar already answered, unfortunately this can't be changed. This means that this message is limited to 32 different sensors (or things). Right now, the spec already lists 25 options which means that we don't have a lot of room for whatever will come up in the future. This, to me, means that we need to come up with a new message (or better mechanism) to report preflight requirements and errors, sensor status, etc.. |
Great, thanks @dusan19! |
@@ -1739,14 +1739,21 @@ Commander::run() | |||
|
|||
/* update subsystem info which arrives from outside of commander*/ | |||
do { | |||
if (subsys_sub.updated()) { | |||
if ( (updated=subsys_sub.updated()) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An assignment in a condition rings a bell for me.
what about a while loop since the condition is anyways at the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions. |
FYI I haven't forgotten about this. For each sensor type I'm introducing a sensors status msg (eg We're also getting pretty close to having multi-EKF support merged (#14650), where we'll have a lot more data to work with (eg estimated biases for all sensors). |
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions. |
Thanks for the contribution! Given how much the state estimation framework has moved, I think taking a fresh pass is simpler than solving the many merge conflicts here. My apologies for this sitting in the review queue for so long - we're currently working to make this a much more timely process. |
FYI quite a lot of this has been addressed over the past year. |
Main issues:
Related bugs:
More detailed explanation and problem examples:
Sensors health bitmask is set in commander::status.onboard_control_sensors_health. It is set by preflight tests function, and by the health reported on the subsystem_info topic by sensors module.
Preflight checks set the bit in the health bitmask that matches the sensor instance. For example SUBSYSTEM_TYPE_GYRO bit is set for gyro 0, and SUBSYSTEM_TYPE_GYRO2 for gyro 1. On the other hand sensors module sets the bits considering the number of healthy sensors. Which means that in case of a gyro 0 failure, commander would set SUBSYSTEM_TYPE_GYRO to false, and sensors would set SUBSYSTEM_TYPE_GYRO2 to false. That behavior is not consistent.
Solution: sensors module should set the health of a specific sensor according to its index and not to the number of healthy sensors. This way we know which sensor failed by looking at the vehicle_status message. Especially useful when using external sensors, to be able to know which one to replace.
Furthermore, the imu, mag and baro health should be evaluated in the sensors module and reported over subsystem_info topic. Also preflight metrics and relevant flags should be reported by the sensors module, and be used in the commanders preflight checks. Preflight checks shouldn't subscribe to the topics reported by the imu, mag and baro drivers but only to the relevant topics from the sensors module.
Preflight checks do not consider health reported by sensors module (no data, stale data, timeout, high error count, high error density).
I could reproduce the following scenario: I set the required number of healthy sensors to 2 instead of default 1, enabled 2 mags, and had stale data on the second mag. The checks passed even though the second mag was broken.
Solution: use the health reported by the sensors module together with the preflight metrics and health flags from the sensor_preflight topic.
Preflight checks before used the max_mandatory_*_count constant to determine if the sensor was required. So by setting this to 2 would mean that 2 sensors were required to pass the checks, but not any two, it had to be the first two, (instances 0 and 1), so in my case mag 1 broke, so healthy mags 0 and 2 wouldn't allow arming. This is changed, such that each sensor has a prameter that specifies how many healthy sensors of that type have to be healthy to allow flying. The sensors are not ordered by their index. If the param is set to two, any two sensors of that type that are healthy would allow arming. Although primary sensor has to be healthy as a separate checks.
voted_sensors_update::parameters_update() sets the enabled flags for each sensor using the CAL_*i_EN parameter.
CAL_*_EN and CAL_*_ID params are handled inconsistently in parameters_update(). Enabled parameters are handled such that CAL_MAGi_EN refers to the sensor on uorb instance i.
The ID param is searched for, which means that if uorb instance i has the same id as param CAL_MAGj_ID then CAL_MAGj calibration will be set for driver i. But still CAL_MAGi_EN will enable driver i.
Fix is made such that enabled is set for the driver with the ID that's set with CAL_*_ID param.
Other fixes and improvements:
Commander runs preflight tests every 2 seconds, if the vehicle is unarmed. This way we have up to date health checks, and flight readiness, which can be reported over mavlink to the base station such that the user is informed about the system state and failures all the time and not only when arming is requested. Or be used otherwise.
I encountered a bug where I had 1 mag enabled but got mag sensors inconsistency failure. This happened because mag 2 was disabled but still had priority set to 100 instead of zero and was used in the inconsistency checks. This happens when the mag data is published after the parameter update in the voted_senors_update has already finished, and the enabled flag was not set to false for this sensor. When the data is available on the topic for the first time the mag is given the priority and after that the enabled flag is set to false, but the data is still pushed to the ecl data validator.
Solution: when the parameter update is run after the first publication of the late mag, set the priority to 0 when setting the enabled flag to false, and skip pushing the data to the validator so the validator doesn't set the used flag to true for this mag