PX4 general work queue #11570

dagar · 2019-03-02T17:03:41Z

Continuation of #11261.

More background to come, but in short this general px4 work queue will allow us to finally unblock several important things.

SPI DMA IMU drivers
running rate controllers at 1 kHz or faster
better all around real time system performance
significant memory savings

Closes #8814

Design Notes

Each PX4 module that previously used hrt_calls, HPWORK, or LPWORK can now easily be moved into this framework. These classes inherit either WorkItem to be run as needed, or ScheduledWorkItem if a fixed interval is desired.

The reason for the (somewhat awkward) WorkQueueManager is to get every queue (a pthread) running within the same task group on NuttX. This significantly reduces overhead, and when combined with uORB changes (#11176) has no disadvantages. The cost of each work queue is little more than the stack (1-2 kB).

The other option for scheduling WorkItems is something I've bolted into uORB itself. The common pattern throughout most important PX4 modules is a task that polls a uORB topic for updates. Many of these processes are already fundamentally serialized, so the coexistence of all these tasks wastes quite a lot of memory. Any work pipeline (a collection of tasks) that's fundamentally serial can be moved to share the same WorkQueue. Then we can lean on the uORB publication to schedule appropriate work.

TODO:

SITL unit tests shutdown problem
unit tests for new containers

dagar · 2019-03-02T17:35:22Z

(Repeating from #11261 (comment))
@julianoes @bkueng any idea why starting an additional task in SITL is causing the client to hang in the px4-shutdown call? I've opened #11525 for investigation.

See the CI failure in Jenkins (http://ci.px4.io:8080/blue/organizations/jenkins/PX4%2FFirmware/detail/PR-11570/1/pipeline) or run make tests locally to reproduce. I was only able to reproduce locally in docker with the cpus limited to 2. The "work queue manager" is a task started in SITL, but nothing in the unit tests is even using it.

src/include/containers/Queue.hpp

dagar · 2019-03-08T17:53:48Z

(Repeating from #11261 (comment))
@julianoes @bkueng any idea why starting an additional task in SITL is causing the client to hang in the px4-shutdown call? I've opened #11525 for investigation.

@bkueng unfortunately no change after #11601.

julianoes · 2019-03-12T11:13:22Z

src/include/containers/BlockingQueue.hpp

+	}
+
+	bool empty() const { return _count == 0; }
+	bool full() const { return _count == N; }


Should _count be protected by _mutex?

I don't believe so, even if you called empty() or full() in the middle of another thread pushing or popping you still have to go through the locked methods to actually do anything with the queue.

Can you think of a problematic case?

I'm just thinking "writing and reading of shared data simultaneously can lead to undefined behaviour", right? And I'm assuming this blocking queue should be thread-safe.

src/include/containers/LockGuard.hpp

src/platforms/common/px4_work_queue/WorkQueueManager.hpp

julianoes · 2019-03-14T07:46:04Z

The tests shut down when you comment out pthread_cond_destroy in BlockingList and BlockingQueue.

julianoes · 2019-03-14T07:57:19Z

It also works like this:

pthread_cond_broadcast(&_cv);
pthread_cond_destroy(&_cv);
pthread_mutex_destroy(&_mutex);

julianoes · 2019-03-14T10:02:13Z

@dagar however, with the broadcast addition the posix_wqueue_test test makes my whole Linux system stall (as in I can't start anything anymore and get fork failed: resource temporarily unavailable, not sure why!

dagar · 2019-03-15T03:53:27Z

The tests shut down when you comment out pthread_cond_destroy in BlockingList and BlockingQueue.

Oh, I wonder if it's because they're both static, and we're not really shutting things down properly yet.

Attempting to destroy a condition variable upon which other threads are currently blocked results in undefined behavior.

I'll allocate them separately in WorkQueueManagerStart().

AlexisTM · 2019-03-15T09:19:29Z

src/platforms/common/px4_work_queue/test/wqueue_start.cpp

+
+		daemon_task = px4_task_spawn_cmd("wqueue",
+						 SCHED_DEFAULT,
+						 SCHED_PRIORITY_MAX - 5,


Isn't it supposed to use: PX4_WQ_HP_BASE now?

It's a bit arbitrary here, and I don't think even matters. This task gets the work queues going then exits. It probably doesn't even need to exist now that I look at it.

AlexisTM · 2019-03-15T09:19:59Z

src/platforms/common/px4_work_queue/WorkQueueManager.cpp

+			}
+
+			// priority
+			param.sched_priority = SCHED_PRIORITY_MAX + wq->relative_priority;


Isn't it supposed to use PX4_WQ_HP_BASE ?

Well, I'd say no it's not supposed to use PX4_WQ_HP_BASE, but that might have been a better way to structure it.

I actually wanted to directly reference the task priority defines for each WQ, but on Linux those are calls (eg sched_get_priority_max(SCHED_FIFO)) that don't seem to have accessible compile time constants.

Then combined with the initial use case (primary SPI IMU driver highest priority thing in the system) and differences like the priority range (min to max) being so different between NuttX and Linux, I ended up with this simplification of first using priorities relative to max for the wq, then everything else underneath.

A bit later on I'm imagining the need to take another pass at this and make the priority table runtime configurable based on sensor configuration and even current health.

julianoes · 2019-03-15T09:44:19Z

@dagar I think I would appreciate if you make incremental commits with fixes after reviews. I have no idea what has changed whenever you force push and have to review everything again.

The tests seem to pass now for CI and for me locally now though which is good :).

src/platforms/common/px4_work_queue/WorkItem.cpp

src/platforms/common/px4_work_queue/WorkQueueManager.cpp

dagar · 2019-03-15T17:07:48Z

@dagar I think I would appreciate if you make incremental commits with fixes after reviews. I have no idea what has changed whenever you force push and have to review everything again.

This is a probably a bigger discussion than we should get into on a PR, but I would prefer that as well if we can move away from a rebase workflow, at least by default. We're still getting incremental commits in master that have negative value. Incremental history that changed something, then changed it back, doesn't fully work, etc. It's well intentioned, but often makes it harder or misleading when debugging (git blame), bisecting, backporting to an older branch, finding the original pull request (designs and discussion), or even just casually browsing.

If I were going to propose something I'd say in most cases we structure development such that each pull request corresponds to a single change that makes sense to squash and merge on completion. We'd open a pull request, add commits in response to reviews, testing, etc, merge (yes merge!) in new changes from master, and then when we're completely done we press "Squash and merge" to end up with a single, clean, atomic commit in master that references the full, true development history in all its gory detail.

julianoes · 2019-03-15T17:12:42Z

It's well intentioned, but often makes it harder or misleading when debugging (git blame), bisecting, backporting to an older branch, finding the original pull request (designs and discussion), or even just casually browsing.

Right, also true!

julianoes · 2019-03-15T17:13:35Z

Let me just complain for whatever role I'm doing at the moment 😄. When I'm reviewing I want it that way and when bisecting another!

dagar · 2019-03-15T17:37:04Z

Let me just complain for whatever role I'm doing at the moment . When I'm reviewing I want it that way and when bisecting another!

I don't mind complaining if it's accompanied by enough interest to push through the process of actually getting things changed on a wider scale. I only bothered with the rant because I think this might actually be a case where we can nearly please everyone (I'm sure someone will disagree).

I dream of a day where we can leverage Jenkins and the hardware test rack to automatically track down a regression via bisect.

julianoes · 2019-03-15T19:50:55Z

Sorry, I was just annoyed trying to do the review again but given the reasons you said I agree with you and it's fine! Didn't mean to come across that ranty.

davids5

@dagar - This looks ready (except for my signed question). Can you add a UML of the class hierarchy and a 1 sequence diagram as documentation?

src/include/containers/BlockingQueue.hpp

src/include/containers/LockGuard.hpp

davids5 · 2019-05-09T11:39:53Z

src/platforms/common/px4_work_queue/ScheduledWorkItem.cpp

+	dev->ScheduleNow();
+}
+
+void ScheduledWorkItem::ScheduleDelayed(uint32_t delay_us)


Please add RT on these so we can have real time and non real time scheduling.

I was intentionally trying to keep that separate to prevent abuse. Let's talk about options separately.

davids5 · 2019-05-09T13:43:14Z

src/platforms/common/px4_work_queue/WorkItem.cpp

+	}
+}
+
+bool WorkItem::Init(const wq_config_t &config)


It would be good to start enforcing the constructor rules now and set a correct pattern of 2 phase construct as discussed on the devcall.

davids5 · 2019-05-09T13:46:32Z

src/platforms/common/px4_work_queue/WorkItem.hpp

+namespace px4
+{
+
+class WorkItem : public IntrusiveQueueNode<WorkItem *>


Great yet another term that hides the meaning. :) Queue of copies and Queue of references says what need to be known.

How does it hide the meaning? Isn't the template argument telling you what it is?
Open to specific suggestions.

src/platforms/common/px4_work_queue/WorkQueue.cpp

dagar · 2019-05-09T15:06:19Z

@dagar - This looks ready (except for my signed question). Can you add a UML of the class hierarchy and a 1 sequence diagram as documentation?

Let's find a standard way to do this, I don't think there's ultimately much value if it ends up as another out of date document floating around google drive. I wonder if we can get acceptable results form doxygen with appropriate markup.

dagar · 2019-05-17T17:42:41Z

Thanks for the reviews everyone. I'll bring this in with the corresponding driver changes in #11571.

dagar mentioned this pull request Mar 2, 2019

PX4 general work queue and move all drivers to new work queue #11571

Merged

dagar added the Admin: Enhancement (improvement) 💡 label Mar 2, 2019

dagar added this to the Release v1.10.0 milestone Mar 2, 2019

dagar self-assigned this Mar 2, 2019

dagar requested review from bkueng and julianoes March 2, 2019 17:07

This was referenced Mar 2, 2019

PX4 general work queue #11261

Closed

[WIP] uORB: add px4 work queue call back mechanism on publish #11489

Closed

dagar force-pushed the pr-px4_general_wq branch from 08f9125 to a3086e2 Compare March 2, 2019 18:00

dagar mentioned this pull request Mar 2, 2019

containers add Queue and testing #11574

Merged

weekly-digest bot mentioned this pull request Mar 3, 2019

Weekly Digest (24 February, 2019 - 3 March, 2019) #11577

Closed

bkueng reviewed Mar 5, 2019

View reviewed changes

src/include/containers/Queue.hpp Outdated Show resolved Hide resolved

dagar force-pushed the pr-px4_general_wq branch from a3086e2 to 28e951e Compare March 8, 2019 17:23

dagar force-pushed the pr-px4_general_wq branch 2 times, most recently from 8a32064 to e97dfec Compare March 8, 2019 22:31

weekly-digest bot mentioned this pull request Mar 10, 2019

Weekly Digest (3 March, 2019 - 10 March, 2019) #11614

Closed

julianoes reviewed Mar 12, 2019

View reviewed changes

dagar force-pushed the pr-px4_general_wq branch from e97dfec to 375f84d Compare March 13, 2019 14:24

dagar force-pushed the pr-px4_general_wq branch from 375f84d to 1bc2527 Compare March 15, 2019 03:43

AlexisTM reviewed Mar 15, 2019

View reviewed changes

julianoes reviewed Mar 15, 2019

View reviewed changes

src/platforms/common/px4_work_queue/WorkItem.cpp Show resolved Hide resolved

src/platforms/common/px4_work_queue/WorkItem.cpp Show resolved Hide resolved

src/platforms/common/px4_work_queue/WorkQueueManager.cpp Show resolved Hide resolved

dagar force-pushed the pr-px4_general_wq branch from 1bc2527 to 990d8a0 Compare March 16, 2019 19:32

dagar mentioned this pull request Mar 21, 2019

fix bmi055: increase DLPF from 62.5 to 500 #11694

Merged

dagar force-pushed the pr-px4_general_wq branch from 990d8a0 to ac6c467 Compare April 24, 2019 17:45

dagar force-pushed the pr-px4_general_wq branch from ac6c467 to 1c7b014 Compare May 8, 2019 16:32

dagar requested review from julianoes, bkueng and davids5 May 8, 2019 16:33

davids5 previously approved these changes May 9, 2019

View reviewed changes

dagar dismissed davids5’s stale review via 541b400 May 9, 2019 15:04

dagar force-pushed the pr-px4_general_wq branch from 1c7b014 to 541b400 Compare May 9, 2019 15:04

PX4 general work queue

3c49eae

dagar force-pushed the pr-px4_general_wq branch from 541b400 to 3c49eae Compare May 9, 2019 17:46

weekly-digest bot mentioned this pull request May 12, 2019

Weekly Digest (5 May, 2019 - 12 May, 2019) #12005

Closed

dagar closed this May 17, 2019

LorenzMeier deleted the pr-px4_general_wq branch January 18, 2021 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PX4 general work queue #11570

PX4 general work queue #11570

dagar commented Mar 2, 2019 •

edited

Loading

dagar commented Mar 2, 2019

dagar commented Mar 8, 2019

julianoes Mar 12, 2019

dagar Mar 13, 2019

julianoes Mar 14, 2019

julianoes commented Mar 14, 2019

julianoes commented Mar 14, 2019

julianoes commented Mar 14, 2019 •

edited

Loading

dagar commented Mar 15, 2019

AlexisTM Mar 15, 2019

dagar Mar 15, 2019

AlexisTM Mar 15, 2019

dagar Mar 15, 2019

julianoes commented Mar 15, 2019

dagar commented Mar 15, 2019

julianoes commented Mar 15, 2019

julianoes commented Mar 15, 2019

dagar commented Mar 15, 2019

julianoes commented Mar 15, 2019

davids5 left a comment

davids5 May 9, 2019

dagar May 9, 2019

davids5 May 9, 2019

davids5 May 9, 2019

dagar May 9, 2019

dagar commented May 9, 2019

dagar commented May 17, 2019

PX4 general work queue #11570

PX4 general work queue #11570

Conversation

dagar commented Mar 2, 2019 • edited Loading

Design Notes

dagar commented Mar 2, 2019

dagar commented Mar 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julianoes commented Mar 14, 2019

julianoes commented Mar 14, 2019

julianoes commented Mar 14, 2019 • edited Loading

dagar commented Mar 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julianoes commented Mar 15, 2019

dagar commented Mar 15, 2019

julianoes commented Mar 15, 2019

julianoes commented Mar 15, 2019

dagar commented Mar 15, 2019

julianoes commented Mar 15, 2019

davids5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dagar commented May 9, 2019

dagar commented May 17, 2019

dagar commented Mar 2, 2019 •

edited

Loading

julianoes commented Mar 14, 2019 •

edited

Loading