SPI: support for ISR and work_queue driven transfers #11302

dakejahl · 2019-01-25T20:43:08Z

Added an _is_locked variable to the SPI class which is a simple bit mask where the bit position corresponds to the SPI bus number. When running SPI::transfer from an ISR, LockMode is selected as LOCK_NONE (because you cannot pend on a sem in interrupt context). I have added a check for the _is_locked flag for the given bus when LockMode is none, thus preventing an ISR driven SPI::transfer from clobbering an already active transfer taking place in workqueue/task/etc.

dakejahl · 2019-01-25T20:45:53Z

Next steps, test this with two IMUs (6500/9250) running on the same SPI bus (SPI1). Schedule one using the new px4_workq and the other with the current hrt_call_every. Look at spi_isr_deferred perf counter to see if the ISR driven transfer is being occasionally aborted.

dakejahl · 2019-01-25T21:07:23Z

Merged into #11261 and tested like so

	if(_whoami == MPU_WHOAMI_6500) {
		ScheduleOnInterval(_call_interval - MPU9250_TIMER_REDUCTION, 10000);
	} else {

		hrt_call_every(&_call,
			       1000,
			       _call_interval - MPU9250_TIMER_REDUCTION,
			       (hrt_callout)&MPU9250::measure_trampoline, this);
	}

This appears to be working correctly

And I am seeing perf counts for deferred ISR driven transfers

which match the bad transfers perf counter from a failed SPI::transfer

@dagar if you want to test it yourself
https://github.com/PX4/Firmware/tree/spi_locking_workq_and_isr

davids5

@dakejahl I am a tad bit uncomfortable with the removal of result, but I guess it can be added back if we extent the upstream nuttx SPI api.

Please verify the 2 comments raised

src/lib/drivers/device/nuttx/SPI.cpp

dakejahl · 2019-01-26T18:08:55Z

I removed result because _transfer was always returning OK. And yeah nuttx SPI exchange has no return value.

dakejahl · 2019-01-27T22:16:03Z

I need to do one more test on this Monday, I had left out applying the bus number to the bitmask when checking if the bus was locked. This shouldn't have had any affect with the way the system runs currently, I just want to sanity check it.

dakejahl · 2019-01-28T17:23:11Z

Okay I think this is done now. I fixed an issue where I wasn't checking the bit position in the _is_locked bit mask. So @dagar the ISR driven cycle is actually deferring much less than what was apparent previously (previously I was just looking at if anyone was using any SPI bus, i.e activity on SPI2).

This actually isn't relevant though because the default LOCK_MODE is LOCK_PREEMPTION anyways (enters critical section for transfer). However I changed it to LOCK_THREADS for the purpose of demonstrating this _is_locked mechanism is functional.

Bench tested on a Teal with 2 IMUs(6500/9250) on SPI1, one running from the new wq and the other from an hrt_call_every

src/lib/drivers/device/nuttx/SPI.cpp

bkueng · 2019-01-29T06:35:16Z

src/lib/drivers/device/nuttx/SPI.cpp

+
+void SPI::lock(struct spi_dev_s *dev)
+{
+	_is_locked |= (1 << _device_id.devid_s.bus);


@dakejahl this needs to be atomic. It's a shared state potentially written to by multiple threads.
You can use the API in #11328.

Also this assumes that all SPI instances for a given bus run on the same thread. Can you add this as comment here to make the assumption explicit?

FYI once we're sure there's no longer a mix of PX4 SPI drivers running out of HRT and threads we can drop all of this and lean on the underlying nuttx SPI_LOCK (semaphore).

What I meant to emphasize was that this only exists for the case of a potential HRT and thread SPI conflict.

this needs to be atomic. It's a shared state potentially written to by multiple threads.
You can use the API in #11328.

Cool! I'll add that in as soon as #11328 is merged.

dakejahl

Also this assumes that all SPI instances for a given bus run on the same thread.

Could you explain please? I don't think it does.

bkueng · 2019-01-30T08:58:06Z

Could you explain please? I don't think it does.

The implementation is not correct if this does not hold. This is what can go wrong:

2 threads enter SPI::lock with the same bus id.
both set _is_locked
one of the threads (let's say thread1) enters the critical section, the other one (thread2) blocks
thread1 leaves the CS and clears the bit in _is_locked.
thread2 enters the CS, while the bit in _is_locked is cleared.

dagar · 2019-01-30T15:30:20Z

@dakejahl #11328 has merged.

…or not the given bus is locked. This is neccessary to defer ISR context SPI transfers that would otherwise stomp on an ongoing SPI transfer

…atic.

…k -EINVAL. Updated file header.

Co-Authored-By: dakejahl <37091262+dakejahl@users.noreply.github.com>

dakejahl · 2019-01-30T21:19:26Z

@bkueng Thanks for the explanation! I've updated the code. Lock free atomic operations are a new concept to me, please let me know if this doesn't look right.

edit: Oh, ignore me. I need to check if the lock bit is already set before setting, otherwise it can potentially be "set" twice.

davids5 · 2019-01-30T21:55:01Z

@dakejahl @dagar - Is it true that in the end (all done), there is only 1 thread per bus and the possibility of hrt interruption.?

If so:There is a 1:1 of bus:semaphore so this reduce to

interrupt context && dev->semaphore count == 0 -> exit

So the loser is always running in interrupt context. Therefore the test does not need protection as it is non interruptible - no nesting or HI priority IRQ) So dev->semaphore count == 0 => Exit. the sem_wait is already wrapped in a DI, EI and is safe.

Is the problem a shared bus? Barro and Nuttx driver on same bus?

dagar · 2019-01-30T23:10:14Z

I believe the only reason that wasn't done is accessing the somewhat hidden spidev semaphore.

dakejahl · 2019-01-31T01:34:51Z

Yeah I wasn't sure how to get at that semaphore. btw thanks everyone for reviewing, lots of little nuances I wasn't aware of with this.

bkueng

I need to check if the lock bit is already set before setting, otherwise it can potentially be "set" twice.

That's fine, if the at-most-one-thread-per-bus assumption holds (same as @davids5's question: 'Is it true that in the end (all done), there is only 1 thread per bus and the possibility of hrt interruption.?').

src/lib/drivers/device/nuttx/SPI.cpp

davids5 · 2019-01-31T11:26:07Z

Given the " at-most-one-thread-per-bus assumption" has a complication we need to resolve to make it so.

A spi bus on some platforms can have a device that is accessed from a device driver in nuttx in the foreground. The params module can use a mtd derivative such as at25 eeprom or a ramtron FRAM and should have only non hrt threads running it. This needs to be true because the OS is used for the reader-writer serization (we should check with an test build with PX4_VERIFY_ASSERT(!up_interrupt_context()) in that driver)!

FMUv4 has on SPI2 a ms5611 barometer and the FRAM. The barometer data sheet states effectively do not wiggle SPI pins doing conversions. Hece this complication that I am not 100% solves the noise issues.

The only way this worked is because A) the ms5611 driver's bus transactions are on a work queue (ISR CAN NOT use OS functions that wait (sem_wait)) and B) it is not on a bus that has hrt threaded devices.

We need to catch a potential misuse: PX4_SPI_BUS_BARO == PX4_SPI_BUS_RAMTRON and some hrt device also on bus.

Which is the same problem the lock is needed for:

The default is locking_mode(LOCK_PREEMPTION), right-red path - this locks the hrt threads out.
the baro uses locking_mode(LOCK_THREADS)` the blue path which shares the dev->smaphore with the Nuttx Driver

hrt takes the left-red path - assume it owns the bus ONLY because of 2
The default non hrt (threads init or wq) take the right-red path - this locks the hrt threads out
baro (and the OS's drivers in spirit) takes right-blue path

The root problem is 1 can trounce on 3.

We should look at moving param module's IO to the SPIx bus thread (and enforce this architecturally) if we do that then everything will be on ONE bus thread and the we can then drop the sem_wait and use only the atomic for isr protection.

We should also change the parameter to not be written during flight. That will fully solve the barro noise issue. If we had adequate holdup time and a power fail detection we could commit them with an orderly shutdown as can be done with SW reboot. But that is a whole other issue.

Co-Authored-By: dakejahl <37091262+dakejahl@users.noreply.github.com>

dakejahl · 2019-01-31T18:38:04Z

Given the " at-most-one-thread-per-bus assumption" has a complication we need to resolve to make it so.

I think this makes sense architecturally and we should enforce it and document it.

We should also change the parameter to not be written during flight.

You are referring to _locking_mode? I agree.

That will fully solve the barro noise issue. If we had adequate holdup time and a power fail detection we could commit them with an orderly shutdown as can be done with SW reboot. But that is a whole other issue.

I think the way the ms5611 collects is:

Schedules the measurement on the hp_wq -> spi::transfer (tells ms5611 to measure)
Reschedules the collection phase on the hp_wq for some time in the future. Meanwhile ms5611 is performing a measurement, the SPI bus is not locked, and param module can use the bus.

I think we would need to keep the SPIx (spi2 on fmu-v4) bus locked for the entirety of the measure/collect of the ms5611 if the condition is "don't wiggle SPI while doing a conversion".

Yeah?

davids5 · 2019-01-31T18:53:04Z

We should also change the parameter to not be written during flight.

You are referring to _locking_mode? I agree

No param save to the mtd.

I think the way the ms5611 collects is:

Schedules the measurement on the hp_wq -> spi::transfer (tells ms5611 to measure)
Reschedules the collection phase on the hp_wq for some time in the future. Meanwhile ms5611 is performing a measurement, the SPI bus is not locked, and param module can use the bus.

I think we would need to keep the SPIx (spi2 on fmu-v4) bus locked for the entirety of the measure/collect of the ms5611 if the condition is "don't wiggle SPI while doing a conversion".

I agree but only on HW where it matters and it can be done and that is case by case.

…hold for the isr lock

dakejahl · 2019-02-02T00:30:33Z

Okay, maybe we revisit this once #11261 is merged? I added the comment for the one thread per SPI bus assumption.

dagar · 2019-03-02T17:21:52Z

Another option to consider is to simply prevent the SPI transfer from an interrupt. Once #11571 is merged the only concern is 3rd party drivers rebased on newer PX4. I think it would be acceptable to fail at runtime with an error (or even assert) if it points to instructions that show how trivial it is to migrate.

dakejahl requested a review from dagar January 25, 2019 20:43

dakejahl mentioned this pull request Jan 25, 2019

PX4 general work queue #11261

Closed

davids5 reviewed Jan 26, 2019

View reviewed changes

src/lib/drivers/device/nuttx/SPI.cpp Outdated Show resolved Hide resolved

src/lib/drivers/device/nuttx/SPI.cpp Outdated Show resolved Hide resolved

weekly-digest bot mentioned this pull request Jan 27, 2019

Weekly Digest (20 January, 2019 - 27 January, 2019) #11307

Closed

bkueng reviewed Jan 29, 2019

View reviewed changes

dakejahl commented Jan 29, 2019

View reviewed changes

dagar added Admin: Enhancement (improvement) 💡 NuttX (OS) labels Jan 29, 2019

dagar added this to the Release v1.10.0 milestone Jan 29, 2019

dakejahl and others added 8 commits January 30, 2019 13:57

Added a member variable to SPI that is a bit mask indicating whether …

63367b4

…or not the given bus is locked. This is neccessary to defer ISR context SPI transfers that would otherwise stomp on an ongoing SPI transfer

Cleaning up the SPI file

58bf87b

added a perf_counter for deferred ISRs. Cleaned a bit more

7c5c051

replaced include guards with pragma once. Made _is_locked properly st…

82b33c5

…atic.

Added in the bitmask check on _is_locked, this was missing. Added bac…

d818ec4

…k -EINVAL. Updated file header.

changed _is_locked to a uint32 and started the bus numbering at 0

f9b97d2

removed accidental extra ampersand for bit compare

b99ddda

Update src/lib/drivers/device/nuttx/SPI.cpp

b7dce1c

Co-Authored-By: dakejahl <37091262+dakejahl@users.noreply.github.com>

dakejahl force-pushed the pr-spi_correct_locking branch from 0abc38c to b7dce1c Compare January 30, 2019 20:57

made _is_locked an atomic type variable

a1fab7f

bkueng reviewed Jan 31, 2019

View reviewed changes

src/lib/drivers/device/nuttx/SPI.cpp Outdated Show resolved Hide resolved

Update src/lib/drivers/device/nuttx/SPI.cpp

0934e8e

Co-Authored-By: dakejahl <37091262+dakejahl@users.noreply.github.com>

Added in comment about the 1 thread per spi bus assumption that must …

c53fa1f

…hold for the isr lock

dagar mentioned this pull request Mar 2, 2019

PX4 general work queue #11570

Closed

2 tasks

weekly-digest bot mentioned this pull request Mar 3, 2019

Weekly Digest (24 February, 2019 - 3 March, 2019) #11577

Closed

dakejahl closed this Jun 21, 2019

dakejahl deleted the pr-spi_correct_locking branch July 8, 2019 04:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPI: support for ISR and work_queue driven transfers #11302

SPI: support for ISR and work_queue driven transfers #11302

dakejahl commented Jan 25, 2019

dakejahl commented Jan 25, 2019

dakejahl commented Jan 25, 2019 •

edited

Loading

davids5 left a comment

dakejahl commented Jan 26, 2019

dakejahl commented Jan 27, 2019 •

edited

Loading

dakejahl commented Jan 28, 2019

bkueng Jan 29, 2019

dagar Jan 29, 2019

dagar Jan 29, 2019

dakejahl Jan 29, 2019

dakejahl left a comment •

edited

Loading

bkueng commented Jan 30, 2019

dagar commented Jan 30, 2019

dakejahl commented Jan 30, 2019 •

edited

Loading

davids5 commented Jan 30, 2019 •

edited

Loading

dagar commented Jan 30, 2019

dakejahl commented Jan 31, 2019 •

edited

Loading

bkueng left a comment

davids5 commented Jan 31, 2019 •

edited

Loading

dakejahl commented Jan 31, 2019 •

edited

Loading

davids5 commented Jan 31, 2019

dakejahl commented Feb 2, 2019

dagar commented Mar 2, 2019

SPI: support for ISR and work_queue driven transfers #11302

SPI: support for ISR and work_queue driven transfers #11302

Conversation

dakejahl commented Jan 25, 2019

dakejahl commented Jan 25, 2019

dakejahl commented Jan 25, 2019 • edited Loading

davids5 left a comment

Choose a reason for hiding this comment

dakejahl commented Jan 26, 2019

dakejahl commented Jan 27, 2019 • edited Loading

dakejahl commented Jan 28, 2019

bkueng Jan 29, 2019

Choose a reason for hiding this comment

dagar Jan 29, 2019

Choose a reason for hiding this comment

dagar Jan 29, 2019

Choose a reason for hiding this comment

dakejahl Jan 29, 2019

Choose a reason for hiding this comment

dakejahl left a comment • edited Loading

Choose a reason for hiding this comment

bkueng commented Jan 30, 2019

dagar commented Jan 30, 2019

dakejahl commented Jan 30, 2019 • edited Loading

davids5 commented Jan 30, 2019 • edited Loading

dagar commented Jan 30, 2019

dakejahl commented Jan 31, 2019 • edited Loading

bkueng left a comment

Choose a reason for hiding this comment

davids5 commented Jan 31, 2019 • edited Loading

dakejahl commented Jan 31, 2019 • edited Loading

davids5 commented Jan 31, 2019

dakejahl commented Feb 2, 2019

dagar commented Mar 2, 2019

dakejahl commented Jan 25, 2019 •

edited

Loading

dakejahl commented Jan 27, 2019 •

edited

Loading

dakejahl left a comment •

edited

Loading

dakejahl commented Jan 30, 2019 •

edited

Loading

davids5 commented Jan 30, 2019 •

edited

Loading

dakejahl commented Jan 31, 2019 •

edited

Loading

davids5 commented Jan 31, 2019 •

edited

Loading

dakejahl commented Jan 31, 2019 •

edited

Loading