Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uORB Subscription callbacks with WorkItem scheduling on publication #12207

Merged
merged 2 commits into from
Jun 17, 2019

Conversation

dagar
Copy link
Member

@dagar dagar commented Jun 7, 2019

The goal of this PR is to introduce a new module scheduling mechanism that combines the best aspects of tasks and work queues.

Background

Historically PX4 modules have been either individual tasks (primarily) or run out of the NuttX work queues (HPWORK/LPWORK). There are pros and cons to each.

Tasks

  • + minimal latency (task runs with minimal latency on data publication)
  • - dedicated stack (wasted memory)
  • - memory overhead of each task (fairly significant)
  • - poll and context switch relatively slow

Work queues

  • + minimal overhead (shared stack, no additional nuttx task)
  • - fixed cycle schedule, not event driven
  • - usually shared with unrelated drivers/modules (non-deterministic)
  • - HPWORK + LPWORK are each a task

Pull Request

This PR adds a new uORB Subscription class (uORB::SubscriptionCallbackWorkItem) that has a callback mechanism to schedule a cycle of a WorkItem when new data is published.

As an example I've also updated px4fmu (the pwm output module) and mc_att_control (multicopter attitude + rate controller) to run in the rate_ctrl WQ, rather than as two separate tasks.

New PX4 Work queue + ORB callback scheduling (PR)

  • + minimal overhead (shared stack, no additional nuttx task)
  • + scheduled on relevant data publication (event driven, minimal latency)
  • + PX4 work queues are relatively inexpensive and we can create many of them (threads within a single task)
  • + PX4 work queues group together inherently serialized processes (sharing a physical bus or other workflow)
  • + no context switch if scheduling is back to back in the same WQ (thread)

Example (compared to master)

Generic multicopter on a pixracer

master

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
   0 Idle Task                   12660 26.662   284/  748   0 (  0)  READY  3
   1 hpwork                        271  0.629   704/ 1780 249 (249)  w:sig  6
   2 lpwork                         34  0.139   712/ 1780  50 ( 50)  w:sig  8
   3 init                         1755  0.000  1888/ 2604 100 (100)  w:sem  3
   4 wq:manager                      2  0.000   824/ 1172 244 (244)  w:sem  5
  17 dataman                         1  0.000   752/ 1180  90 ( 90)  w:sem  4
  20 wq:lp_default                   7  0.000   848/ 1244 205 (205)  w:sem  5
  22 wq:I2C1                         0  0.000   360/ 1244 248 (248)  w:sem  5
  26 wq:hp_default                 121  0.209   664/ 1244 244 (244)  w:sem  5
 144 wq:SPI1                      9095 22.183   888/ 1244 254 (254)  READY  5
 152 wq:SPI4                         0  0.000   360/ 1244 251 (251)  w:sem  5
 165 wq:SPI2                       199  0.489   544/ 1244 253 (253)  w:sem  5
 179 sensors                      1592  3.708  1336/ 1964 238 (238)  READY 11
 183 commander                     415  0.699  1632/ 3212 175 (140)  w:sig  6
 185 commander_low_prio              1  0.000   680/ 2996  50 ( 50)  w:sem  6
 198 mavlink_if0                  2213  4.338  1632/ 2532 100 (100)  READY  4
 199 mavlink_rcv_if0               178  0.349  1408/ 2836 175 (175)  w:sem  4
 242 gps                            48  0.139  1064/ 1516 209 (209)  w:sem  4
 285 mavlink_if1                  3724  8.607  1656/ 2492 100 (100)  w:sig  4
 286 mavlink_rcv_if1               198  0.419  1424/ 2836 175 (175)  w:sem  4
 302 fmu                           859  2.169   888/ 1324 241 (241)  w:sem  8
 427 ekf2                         3664  9.167  4520/ 6572 239 (239)  READY  4
 429 mc_att_control               1446  3.848  1000/ 1660 240 (240)  w:sem  5
 437 mc_pos_control                137  0.279   936/ 1860 239 (239)  w:sem  4
 456 navigator                     158  0.279   992/ 1764 105 (105)  w:sem  4
 623 log_writer_file                 0  0.000   376/ 1164  60 ( 60)  w:sem 28
 490 mavlink_if2                  2312  5.178  1656/ 2540 100 (100)  w:sig  4
 492 mavlink_rcv_if2               207  0.489  1424/ 2836 175 (175)  w:sem  4
 585 logger                        428  1.119  1352/ 3644 234 (234)  w:sem 28
 635 top                          1580  4.548  1312/ 1684 255 (255)  RUN    3

Processes: 30 total, 6 running, 24 sleeping, max FDs: 54
CPU usage: 69.00% tasks, 4.34% sched, 26.66% idle
DMA Memory: 5120 total, 1024 used 1024 peak
Uptime: 45.223s total, 12.661s idle

nsh> free
	     total       used       free    largest
Umem:       231280     191664      39616      31328

PR

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
   0 Idle Task                   26165 29.802   284/  748   0 (  0)  READY  3
   1 hpwork                        508  0.635   592/ 1780 249 (249)  w:sig  6
   2 lpwork                         63  0.141   712/ 1780  50 ( 50)  w:sig  8
   3 init                         1752  0.000  1864/ 2604 100 (100)  w:sem  3
   4 wq:manager                      2  0.000   760/ 1172 244 (244)  w:sem  4
 581 top                          1553  3.884  1320/ 1684 255 (255)  RUN    3
  17 dataman                         1  0.000   752/ 1180  90 ( 90)  w:sem  4
  20 wq:lp_default                  12  0.000   784/ 1500 205 (205)  w:sem  4
  22 wq:I2C1                         0  0.000   360/ 1244 248 (248)  w:sem  4
 138 wq:SPI1                     16933 22.033   888/ 1244 254 (254)  READY  4
 143 wq:SPI4                         0  0.000   360/ 1244 251 (251)  w:sem  4
 153 wq:SPI2                       370  0.494   544/ 1244 253 (253)  w:sem  4
 171 sensors                      2918  3.742  1368/ 1964 238 (238)  w:sem 11
 173 commander                     650  0.706  1632/ 3212 140 (140)  READY  6
 174 commander_low_prio              2  0.000   560/ 2996  50 ( 50)  w:sem  6
 184 mavlink_if0                  4017  4.519  1648/ 2532 100 (100)  READY  4
 186 mavlink_rcv_if0               313  0.423  1336/ 2836 175 (175)  w:sem  4
 191 wq:hp_default                 214  0.211  1104/ 1500 244 (244)  w:sem  4
 233 gps                            82  0.141  1064/ 1516 209 (209)  w:sem  4
 275 mavlink_if1                  6933  8.686  1648/ 2492 100 (100)  w:sig  4
 277 mavlink_rcv_if1               338  0.423  1336/ 2836 175 (175)  w:sem  4
 317 wq:rate_ctrl                 3440  4.661  1008/ 1500 255 (255)  w:sem  4
 414 ekf2                         6951  9.251  4520/ 6572 239 (239)  w:sem  4
 423 mc_pos_control                226  0.282   952/ 1860 239 (239)  w:sem  4
 434 navigator                     227  0.141   992/ 1764 105 (105)  w:sem  4
 560 log_writer_file                 0  0.000   376/ 1164  60 ( 60)  w:sem 28
 471 mavlink_if2                  4150  5.084  1648/ 2540 100 (100)  w:sig  4
 475 mavlink_rcv_if2               359  0.353  1424/ 2836 175 (175)  READY  4
 540 logger                        726  0.776  1352/ 3644 234 (234)  w:sem 28

Processes: 29 total, 6 running, 23 sleeping, max FDs: 54
CPU usage: 66.60% tasks, 3.60% sched, 29.80% idle
DMA Memory: 5120 total, 1024 used 1024 peak
Uptime: 81.630s total, 26.165s idle

nsh> free
	     total       used       free    largest
Umem:       231344     186336      45008      36656

Summary

Relevant CPU (%) Memory (bytes free)
Master (90bf26b) 6.017% (fmu 3.848% + mc_att_control 2.169%) 39616 B
PR (cfbdf2a) 4.661% (wq:rate_ctrl) 45008 B

By only moving two tasks to a workqueue overall cpu load decreased by a couple percent and we've saved over 5 kB of ram.

The downside? End-to-end control latency increased slightly (~ 90 us), but this is a result of the combined mc_att_control that can easily be split following this PR.

There are a number of other opportunities for merging multiple tasks into workqueues like this that will result in significant memory savings and performance improvements. Perhaps more importantly for some is that this structure will enable us to run the multicopter rate controller significantly faster without choking. The small cpu savings above mainly comes from reducing the number of context switches (at a paltry 250 Hz) that become crippling at significantly higher rates (1 - 4 kHz).

davids5
davids5 previously approved these changes Jun 7, 2019
Copy link
Member

@davids5 davids5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dagar this make sense to me, but life cycle management needs to be documented.

ROMFS/px4fmu_common/init.d/rcS Outdated Show resolved Hide resolved
@dagar
Copy link
Member Author

dagar commented Jun 8, 2019

Here's the same comparison again, but at 1 kHz using the change from #12145.

Relevant CPU (%) Memory (bytes free)
PR #12145 (7e755b99449cd) 18.82% (fmu 8.96% + mc_att_control 9.87%) 39504 B
This PR @1 kHz (7f57a3b) 9.18% (wq:rate_ctrl) 44560 B

@dagar dagar force-pushed the pr-orb_subscription_callback branch 2 times, most recently from 2f9e550 to c62eb55 Compare June 9, 2019 15:44
@dagar dagar force-pushed the pr-orb_subscription_callback branch from c62eb55 to ff6cc9d Compare June 10, 2019 00:35
@dagar dagar changed the title [WIP] uORB create SubscriptionCallback with WorkItem scheduling callbacks on publication + examples uORB Subscription callbacks with WorkItem scheduling on publication Jun 10, 2019
@dagar dagar marked this pull request as ready for review June 10, 2019 00:36
@dagar dagar force-pushed the pr-orb_subscription_callback branch 2 times, most recently from c1fba18 to 571dc35 Compare June 10, 2019 00:39
@dagar
Copy link
Member Author

dagar commented Jun 13, 2019

As discussed on the dev call, here's a combined branch for testing. #12256

Copy link
Member

@bkueng bkueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a design point, what I'm still missing is how you plan to handle failover, and sensor timeouts in particular?

src/modules/uORB/Subscription.hpp Outdated Show resolved Hide resolved
src/modules/uORB/SubscriptionCallback.hpp Outdated Show resolved Hide resolved
src/modules/uORB/SubscriptionCallback.hpp Outdated Show resolved Hide resolved
src/modules/uORB/SubscriptionCallback.hpp Outdated Show resolved Hide resolved
src/modules/uORB/SubscriptionCallback.hpp Outdated Show resolved Hide resolved
src/modules/uORB/SubscriptionCallback.hpp Outdated Show resolved Hide resolved
@dagar
Copy link
Member Author

dagar commented Jun 13, 2019

From a design point, what I'm still missing is how you plan to handle failover, and sensor timeouts in particular?

There's also a callback on sensor_correction that contains the selected gyro and only publishes when changed.

6516ae8#diff-69f93e040a677c931e797b5203c03387R101

6516ae8#diff-e1d4d75170cc48374d5f144ebfca43d9R105

As for the most other modules, they generally don't do anything on poll timeout other than setup the poll again. We'll still need to be careful and check things on a case-by-case basis.

@dagar dagar force-pushed the pr-orb_subscription_callback branch 2 times, most recently from 0e7a407 to 312999b Compare June 13, 2019 15:07
@dagar dagar force-pushed the pr-orb_subscription_callback branch from 312999b to a67e42b Compare June 13, 2019 18:07
@dagar dagar requested a review from bkueng June 13, 2019 18:45
@dagar
Copy link
Member Author

dagar commented Jun 13, 2019

Updated interval to microseconds to properly handle pwm rates (50-400 Hz or an interval of 2500 - 20000us).

@dagar
Copy link
Member Author

dagar commented Jun 14, 2019

The testing of this PR with several migrated modules (#12256) looks good. Unless there's any objection let's merge this (not yet used anywhere) and continue iterating within the context of updating appropriate modules (#12224, #12225, #12226, etc). If there are still concerns about failover or timeout handling it needs to be handled at that level anyway.

@dagar dagar requested review from julianoes and davids5 June 14, 2019 14:17
Copy link
Member

@bkueng bkueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re timeout: that works then. I was under the impression you wanted to pull the rate controller into the sensor acquisition thread.

@dagar
Copy link
Member Author

dagar commented Jun 17, 2019

Re timeout: that works then. I was under the impression you wanted to pull the rate controller into the sensor acquisition thread.

Still a possibility, but not if we don't need to. I'm actually considering going the other direction and pulling the filtering out of each driver and into the rate controller thread.

@dagar dagar merged commit 136962d into PX4:master Jun 17, 2019
@dagar dagar deleted the pr-orb_subscription_callback branch June 17, 2019 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants