Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PX4 general work queue #11570

Closed
wants to merge 1 commit into from
Closed

PX4 general work queue #11570

wants to merge 1 commit into from

Conversation

dagar
Copy link
Member

@dagar dagar commented Mar 2, 2019

Continuation of #11261.

More background to come, but in short this general px4 work queue will allow us to finally unblock several important things.

  • SPI DMA IMU drivers
  • running rate controllers at 1 kHz or faster
  • better all around real time system performance
  • significant memory savings

Closes #8814

Design Notes

Each PX4 module that previously used hrt_calls, HPWORK, or LPWORK can now easily be moved into this framework. These classes inherit either WorkItem to be run as needed, or ScheduledWorkItem if a fixed interval is desired.

The reason for the (somewhat awkward) WorkQueueManager is to get every queue (a pthread) running within the same task group on NuttX. This significantly reduces overhead, and when combined with uORB changes (#11176) has no disadvantages. The cost of each work queue is little more than the stack (1-2 kB).

The other option for scheduling WorkItems is something I've bolted into uORB itself. The common pattern throughout most important PX4 modules is a task that polls a uORB topic for updates. Many of these processes are already fundamentally serialized, so the coexistence of all these tasks wastes quite a lot of memory. Any work pipeline (a collection of tasks) that's fundamentally serial can be moved to share the same WorkQueue. Then we can lean on the uORB publication to schedule appropriate work.

TODO:

  • SITL unit tests shutdown problem
  • unit tests for new containers

@dagar
Copy link
Member Author

dagar commented Mar 2, 2019

(Repeating from #11261 (comment))
@julianoes @bkueng any idea why starting an additional task in SITL is causing the client to hang in the px4-shutdown call? I've opened #11525 for investigation.

See the CI failure in Jenkins (http://ci.px4.io:8080/blue/organizations/jenkins/PX4%2FFirmware/detail/PR-11570/1/pipeline) or run make tests locally to reproduce. I was only able to reproduce locally in docker with the cpus limited to 2. The "work queue manager" is a task started in SITL, but nothing in the unit tests is even using it.

@dagar dagar force-pushed the pr-px4_general_wq branch from a3086e2 to 28e951e Compare March 8, 2019 17:23
@dagar
Copy link
Member Author

dagar commented Mar 8, 2019

(Repeating from #11261 (comment))
@julianoes @bkueng any idea why starting an additional task in SITL is causing the client to hang in the px4-shutdown call? I've opened #11525 for investigation.

@bkueng unfortunately no change after #11601.

@dagar dagar force-pushed the pr-px4_general_wq branch 2 times, most recently from 8a32064 to e97dfec Compare March 8, 2019 22:31
}

bool empty() const { return _count == 0; }
bool full() const { return _count == N; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should _count be protected by _mutex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so, even if you called empty() or full() in the middle of another thread pushing or popping you still have to go through the locked methods to actually do anything with the queue.

Can you think of a problematic case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just thinking "writing and reading of shared data simultaneously can lead to undefined behaviour", right? And I'm assuming this blocking queue should be thread-safe.

src/include/containers/LockGuard.hpp Outdated Show resolved Hide resolved
@dagar dagar force-pushed the pr-px4_general_wq branch from e97dfec to 375f84d Compare March 13, 2019 14:24
@julianoes
Copy link
Contributor

The tests shut down when you comment out pthread_cond_destroy in BlockingList and BlockingQueue.

@julianoes
Copy link
Contributor

It also works like this:

pthread_cond_broadcast(&_cv);
pthread_cond_destroy(&_cv);
pthread_mutex_destroy(&_mutex);

@julianoes
Copy link
Contributor

julianoes commented Mar 14, 2019

@dagar however, with the broadcast addition the posix_wqueue_test test makes my whole Linux system stall (as in I can't start anything anymore and get fork failed: resource temporarily unavailable, not sure why!

@dagar dagar force-pushed the pr-px4_general_wq branch from 375f84d to 1bc2527 Compare March 15, 2019 03:43
@dagar
Copy link
Member Author

dagar commented Mar 15, 2019

The tests shut down when you comment out pthread_cond_destroy in BlockingList and BlockingQueue.

Oh, I wonder if it's because they're both static, and we're not really shutting things down properly yet.

Attempting to destroy a condition variable upon which other threads are currently blocked results in undefined behavior.

I'll allocate them separately in WorkQueueManagerStart().


daemon_task = px4_task_spawn_cmd("wqueue",
SCHED_DEFAULT,
SCHED_PRIORITY_MAX - 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it supposed to use: PX4_WQ_HP_BASE now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit arbitrary here, and I don't think even matters. This task gets the work queues going then exits. It probably doesn't even need to exist now that I look at it.

}

// priority
param.sched_priority = SCHED_PRIORITY_MAX + wq->relative_priority;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it supposed to use PX4_WQ_HP_BASE ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'd say no it's not supposed to use PX4_WQ_HP_BASE, but that might have been a better way to structure it.

I actually wanted to directly reference the task priority defines for each WQ, but on Linux those are calls (eg sched_get_priority_max(SCHED_FIFO)) that don't seem to have accessible compile time constants.

Then combined with the initial use case (primary SPI IMU driver highest priority thing in the system) and differences like the priority range (min to max) being so different between NuttX and Linux, I ended up with this simplification of first using priorities relative to max for the wq, then everything else underneath.

A bit later on I'm imagining the need to take another pass at this and make the priority table runtime configurable based on sensor configuration and even current health.

@julianoes
Copy link
Contributor

@dagar I think I would appreciate if you make incremental commits with fixes after reviews. I have no idea what has changed whenever you force push and have to review everything again.

The tests seem to pass now for CI and for me locally now though which is good :).

@dagar
Copy link
Member Author

dagar commented Mar 15, 2019

@dagar I think I would appreciate if you make incremental commits with fixes after reviews. I have no idea what has changed whenever you force push and have to review everything again.

This is a probably a bigger discussion than we should get into on a PR, but I would prefer that as well if we can move away from a rebase workflow, at least by default. We're still getting incremental commits in master that have negative value. Incremental history that changed something, then changed it back, doesn't fully work, etc. It's well intentioned, but often makes it harder or misleading when debugging (git blame), bisecting, backporting to an older branch, finding the original pull request (designs and discussion), or even just casually browsing.

If I were going to propose something I'd say in most cases we structure development such that each pull request corresponds to a single change that makes sense to squash and merge on completion. We'd open a pull request, add commits in response to reviews, testing, etc, merge (yes merge!) in new changes from master, and then when we're completely done we press "Squash and merge" to end up with a single, clean, atomic commit in master that references the full, true development history in all its gory detail.

@julianoes
Copy link
Contributor

It's well intentioned, but often makes it harder or misleading when debugging (git blame), bisecting, backporting to an older branch, finding the original pull request (designs and discussion), or even just casually browsing.

Right, also true!

@julianoes
Copy link
Contributor

Let me just complain for whatever role I'm doing at the moment 😄. When I'm reviewing I want it that way and when bisecting another!

@dagar
Copy link
Member Author

dagar commented Mar 15, 2019

Let me just complain for whatever role I'm doing at the moment . When I'm reviewing I want it that way and when bisecting another!

I don't mind complaining if it's accompanied by enough interest to push through the process of actually getting things changed on a wider scale. I only bothered with the rant because I think this might actually be a case where we can nearly please everyone (I'm sure someone will disagree).

I dream of a day where we can leverage Jenkins and the hardware test rack to automatically track down a regression via bisect.

@julianoes
Copy link
Contributor

Sorry, I was just annoyed trying to do the review again but given the reasons you said I agree with you and it's fine! Didn't mean to come across that ranty.

davids5
davids5 previously approved these changes May 9, 2019
Copy link
Member

@davids5 davids5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dagar - This looks ready (except for my signed question). Can you add a UML of the class hierarchy and a 1 sequence diagram as documentation?

src/include/containers/BlockingQueue.hpp Outdated Show resolved Hide resolved
src/include/containers/BlockingQueue.hpp Outdated Show resolved Hide resolved
src/include/containers/LockGuard.hpp Outdated Show resolved Hide resolved
dev->ScheduleNow();
}

void ScheduledWorkItem::ScheduleDelayed(uint32_t delay_us)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add RT on these so we can have real time and non real time scheduling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was intentionally trying to keep that separate to prevent abuse. Let's talk about options separately.

}
}

bool WorkItem::Init(const wq_config_t &config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to start enforcing the constructor rules now and set a correct pattern of 2 phase construct as discussed on the devcall.

namespace px4
{

class WorkItem : public IntrusiveQueueNode<WorkItem *>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great yet another term that hides the meaning. :) Queue of copies and Queue of references says what need to be known.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it hide the meaning? Isn't the template argument telling you what it is?
Open to specific suggestions.

@dagar
Copy link
Member Author

dagar commented May 9, 2019

@dagar - This looks ready (except for my signed question). Can you add a UML of the class hierarchy and a 1 sequence diagram as documentation?

Let's find a standard way to do this, I don't think there's ultimately much value if it ends up as another out of date document floating around google drive. I wonder if we can get acceptable results form doxygen with appropriate markup.

@dagar
Copy link
Member Author

dagar commented May 17, 2019

Thanks for the reviews everyone. I'll bring this in with the corresponding driver changes in #11571.

@dagar dagar closed this May 17, 2019
@LorenzMeier LorenzMeier deleted the pr-px4_general_wq branch January 18, 2021 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PX4 general work queue
5 participants