Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asynchronous versions of most embedded-hal traits using GATs #285

Closed
wants to merge 32 commits into from

Conversation

lachlansneff
Copy link
Contributor

@lachlansneff lachlansneff commented Jun 21, 2021

GATs (generic associated types), among other things let you return futures from trait methods.

The lifetimes can be a bit tricky but it results in a pretty clean trait.

For example:

/// Async transfer in place.
pub trait TransferInPlace<Word: 'static> {
    /// Error type
    type Error;

    /// Associated future for the `transfer` method.
    type TransferInPlaceFuture<'a>: Future<Output = Result<(), Self::Error>> + 'a
    where
        Self: 'a;

    /// Writes `words` to the slave from the `readwrite` buffer and reads words into the same buffer.
    /// This method uses a single `readwrite` buffer.
    fn transfer_inplace<'a>(&'a mut self, readwrite: &'a mut [Word]) -> Self::TransferInPlaceFuture<'a>;
}

It's important to note that GATs are only available on nightly at the moment, so this uses feature attributes. I'd like to discuss how this should be exposed (likely through a feature until GATs are stabilized).

Also, there are some traits that don't use futures that are duplicated in the nb, blocking, and futures modules. They should probably be merged.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ryankurte (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@Dirbaio
Copy link
Member

Dirbaio commented Jun 22, 2021

Nice to see more interest on async! :)

Some general feedback:

1. Can't use DMA

I see the proposed traits are modeled after the existing nb traits, which read/write one word at a time. This rules out implementing them with DMA. CPU needs to get involved for every single word, waking the async task and having it start the next word write. Waking an async task is not free (can take a few us), for fast IO (eg multi-mhz SPI) it's likely that the time to wake the task will be similar to the time to send one word. In this case, the behavior will degrade to "blocking, but crappier": the CPU will have no time to run other tasks anyway, and it'll be slower and clunkier than real blocking.

Note that nb traits suffer from this same problem. However this limitation was "okay-ish", since it's impossible to impl nb traits with DMA anyway (because they can't borrow a buffer for the entire duration of the operation, only the instant when polled). With futures it's possible to allow DMA impls, so it'd be a mistake not to.

embassy-traits reads/writes entire slices (example) so that impls can set up DMA to read/write the entire slice in one go and only wake the task when it's fully done.

However doing DMA with "naked" slices is unsound but only if the futures are leaked. Since DMA is reading/writing directly from the slice, DMA has to be stopped if the future is dropped mid-operation. This is easy, embassy does that no problem. However in Rust you can leak things (not run Drop) without unsafe. If you do that, DMA will not be stopped and can keep writing to the buffer after it's deallocated. Detailed explanation of the problem here.

To fully solve this, the traits would have to use embedded-dma, but then the ergonomics heavily suffer, especially without alloc.

Embassy "solves" it by forbidding the user to leak futures. Dropping them is fine, leaking them is not. This means embassy's API is not fully safe, but in practice it's really really easy to use it correctly, since there's only a handful of ways to leak things (mem::forget, ManuallyDrop, Box::leak, Rc/Arc cycles), and they're all really easy to avoid. IMO this is the right tradeoff: super efficient DMA use, almost-safe API, super ergonomic. I expect this to be very controversial for adoption in embedded-hal though, as many people are of the "must be fully safe at any cost" opinion.

The third solution would be to fix the Leak Problem in Rust itself. IMO the Leak Problem is a Rust design issue. There's been some discussions, it might be fixable with a Leak auto trait that works similar to Sized. However it seems no one is feeling the pain of this as we (embedded) do, as we don't want alloc. For example io-uring has the same problem (io-uring is essentially DMA done by the kernel instead of the hardware) and they solve it by owning the buffers with Box/Vec.

And no, Pin doesn't prevent the leak problem. Pinning the buffer doesn't. Pinning the future doesn't.

2. Maybe not add all traits

I'd wait until there's clear demand of traits before adding them.

  • Async Uart, Spi and I2c are definitely wanted.
  • Async ADC is likely to have no benefit, since usually taking an ADC sample is fast enough that the core won't have time to do useful work on other tasks during it.
  • I'm not sure about Capture, its usage is quite niche.

3. Lifetimes

Lifetimes are indeed complicated. embassy-traits has where Self: 'a (I guess this is where you took it from?) because embassy drivers can borrow pins/peripherals instead of owning them, so eg an Uart driver is Uart<'a, UART1> instead of the more common Uart<UART1>.

If we drop this requirement then it can be simplified (I think that bound can be simply removed), but I would like to keep it. It's nice that drivers can be non-'static.

(I just wanted to point out the reason behind these bounds, since it's not very obvious).

4. Iterator traits

I think iterator-based traits like WriteIter shouldn't exist in async flavor because 1. it forces CPU involvement for each word and 2. utility is arguably limited since the iter itself is not async (should be a Stream, which is a whole other can of worms).

src/futures/spi.rs Outdated Show resolved Hide resolved
src/futures/spi.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
@lachlansneff
Copy link
Contributor Author

@Dirbaio

With futures it's possible to allow DMA impls, so it'd be a mistake not to.

I agree with this. And I'm personally fine with an unsafe API to improve the ergonomics, as long as it's easy to use safely.

I'm unsure if FullDuplex should be copied over to the futures module because reading and writing single words to SPI is probably going to be nearly instant.

Cargo.toml Outdated Show resolved Hide resolved
@ryankurte
Copy link
Contributor

ryankurte commented Jun 23, 2021

hey thanks for the PR! looks generally good, and async is going to be great fun when we get it worked out ^_^

i'm in agreement with @eldruin here (#172 (comment)), the GAT approach seems workable to me, definitely need to demonstrate / prove each of them in a couple of use cases (linux-embedded-hal may be one good HAL candidate as there's already some async in the sub-projects).

@Dirbaio also makes some great points, perhaps we could focus on SPI and I2C (and UART?) to reduce the scope of the initial implementation?

embassy-traits reads/writes entire slices (example) so that impls can set up DMA to read/write the entire slice in one go and only wake the task when it's fully done.

this approach is what i would expect / seems reasonable to me in that it should be implementable with or without DMA, i don't think it'd be useful to expose embedded-dma or other strange constraints in this interface.

the one caveat would seem to be UART, to me the UART read/write would ideally be represented as a stream/sink, though i understand these aren't yet in core. i would be okay with a futures dependency for now if there's a no_std and alloc free way of getting it in.

i think UART differs from the other peripherals because the transactions are unbounded, it seems like circular buffering / DMA in this case is a configuration / implementation detail that shouldn't need to be exposed to HAL consumers. i can see an argument for a slice based write to avoid the byte-wise copy/flush (and improve the ergonomics of the internal buffer), but it doesn't seem like there's an unsurprising way to implement a serial read via slice without pushing complexity up to users.

it also doesn't seem like there's an obvious/ergonomic way to address wakers at the moment (#172 (comment)), but the proposed wake-every-poll approach would seem to do the job for now.

For reference, couple of prior works on the topic (that i think y'all have already mentioned, just useful to have a list):

@Dirbaio
Copy link
Member

Dirbaio commented Jun 24, 2021

UART read/write would ideally be represented as a stream/sink

These have the same problem as per-word traits: you need CPU involvement for each word, you can eg write a big slice with DMA. Maybe it could be done with a stream/sink of slices, but that gets ugly fast.

it doesn't seem like there's an unsurprising way to implement a serial read via slice without pushing complexity up to users.

"Read a slice of words" is the lowest common denominator of what most hardware out there can do, while not sacrificing much:

It allows efficient DMA use if you know how much data you're going to receive upfront.
It still allows stuff like "read until newline" by reading 1-word silces in a loop.

@ryankurte
Copy link
Contributor

These have the same problem as per-word traits: you need CPU involvement for each word, you can eg write a big slice with DMA. Maybe it could be done with a stream/sink of slices, but that gets ugly fast.

yeah, though they'd work well implemented on top of IRQs or DMA and a ring-buffer, i think the ideals differ a bit depending on whether you look at this as an abstraction for physical UART peripherals vs. a UART abstraction for drivers to use.

"Read a slice of words" is the lowest common denominator of what most hardware out there can do, while not sacrificing much:

seems alright so long as we're clear that any call might return a partially filled slice (say you had a [u8; 1024] buffer you'd expect to RX data roughly as it came in not to fill the buffer first i think. the opposite case would mean you'd almost always have to use byte-wise reads for responsiveness.)

@eldruin
Copy link
Member

eldruin commented Jun 27, 2021

I would like to note that if we only fill the received array partially, we need to return the number of words read. Alternatively, return a slice to the input array, although this would result in an interface like #286.

@Dirbaio
Copy link
Member

Dirbaio commented Jun 29, 2021

seems alright so long as we're clear that any call might return a partially filled slice (say you had a [u8; 1024] buffer you'd expect to RX data roughly as it came in not to fill the buffer first i think. the opposite case would mean you'd almost always have to use byte-wise reads for responsiveness.)

In practice that would degenerate to byte-by-byte read anyway. The future would have to complete as soon as 1 byte has been received, and while the CPU is processing that byte no more bytes would be read. The hardware FIFO for UARTs is typically tiny, usually just 1..4 bytes.

The "allow short writes" solution makes a lot of sense when the IO is backed by a big buffer, and read calls can copy big chunks off it. That's what std::io::Read does. However I don't think forcing implementations to buffer is a good idea. With DMA hardware it's nice that the hardware can read/write directly to the slices provided by the user.

embassy-nrf has two uart drivers: Uarte implementing uart::Read and uart::Write, which wait for the entire slice to be read/written and can therefore do zerocopy IO, and BufferedUarte which has ringbuffers hardware reads/writes to, and impls io::AsyncRead, io::AsyncWrite. These copy to/from the ringbuffers, so it's not zerocopy IO, but can be more performant since the hardware can be constantly reading/writing to the ringbuffer even when the CPU is busy with other things. AsyncRead allows "short reads" like what you're proposing.

Embassy also has a ReadUntilIdle trait, although I think that one is maybe too specific: https://github.com/embassy-rs/embassy/blob/d49adc98beebe5b7982204150ebdc0a485fefab2/embassy-traits/src/uart.rs#L23-L25

It really depends on the use case. I don't think there's One Uart Trait To Rule Them All.

  • Terminal for a human to type in? use uart::Read with 1-byte slices. Humans are slow, you won't miss keypresses.
  • Custom protocol where you reasonably control the timing and frame sizes? maybe you can get away with uart::Read or uart::ReadUntilIdle, which is nice because you get zerocopy IO.
  • Complex protocol, hard to predict timings/sizes, want max performance? use BufferedUarte with AsyncRead/AsyncWrite.

Also, having generic io::AsyncRead, io::AsyncWrite representing "a pipe of bytes" is nicer because they're more general. For example, a PPP driver could require them, which would allow doing PPP over serial, PPP over TCP, PPP over Unix pipe...

TLDR:

  • IMO uart traits should be "wait until entire slice has been read" to allow zerocopy
  • For more advanced usecases generic io traits work better
  • io traits can be implemented for uart with ringbuffers.

src/futures/digital.rs Outdated Show resolved Hide resolved
@lachlansneff
Copy link
Contributor Author

Now that GATs are no longer an incomplete feature, it's looking likely that they'll land before the end of the year! Maybe it's time to get this in as an unproven feature?

@burrbull
Copy link
Member

burrbull commented Aug 4, 2021

Maybe just async feature?

@eldruin
Copy link
Member

eldruin commented Aug 4, 2021

We have decided against an unproven feature. See here.
The best way would be to demonstrate that these traits work well by creating HAL implementations for them and also demonstrating their use in a driver.
If only some of the traits are proven, we can always merge those and add the others as they are shown to work. I assume once a couple async traits are merged, community interest will drive implementations for the rest fairly quickly.

@burrbull
Copy link
Member

burrbull commented Aug 4, 2021

I propose to hide them under async feature anyway and add doc that this is unstable.
Before stabilization it will prevent us to use unstable feature.
After stabilization this will enable "additional" functionality.
Unlike unproven feature we don't need to delete it ever.
Also we will don't need to increase MSRV.

@lulf
Copy link

lulf commented Nov 3, 2021

@lachlansneff I see there is one unresolved comment from @Dirbaio. I volunteer continuing this work if you don't have time for it.

@Dirbaio From embassy POV, do you think these traits look sufficiently complete that we can give a go at making the embassy implementations support them? (I can make a PR for it)

@Dirbaio
Copy link
Member

Dirbaio commented Nov 3, 2021

@lulf Yes, the traits here look great and I don't think there will be any issue switching embassy to them. It'd be great to do it, it'll help test the traits :)

I have one major concern about merging this PR: async fn in traits is coming. It'll allow using actual async fns, which will desugar to GAT + regular fn like the traits in this PR. It is very likely that async fn traits will become the idiomatic way, and switching from GATs to async fn will be a breaking change.

IMO we should release this in a separate crate, for example async-embedded-hal. This crate can get breaking changes as needed, releasing v0.1, v0.2, v0.3... When we consider it "proven", instead of releasing a v1.0 we can merge it into the main embedded-hal as originally planned.

Gating the async traits behind an async feature won't work if we plan to make breaking changes to them, even if we declare "if you enable the async feature you're not covered by semver". There will be cases where e.g. async traits got broken in v1.1 -> v1.2 so you have to pin embedded-hal = "=1.1", but v1.2 added some unrelated blocking trait that your HAL wants to implement so your HAL requires v1.2 and now you can't build at all.

@lachlansneff
Copy link
Contributor Author

lachlansneff commented Nov 3, 2021

@lulf

@lachlansneff I see there is one unresolved comment from @Dirbaio. I volunteer continuing this work if you don't have time for it.

I'll finish making those changes today or tomorrow.

I'm with @Dirbaio on this: a separate crate async-embedded-hal or embedded-hal-async would be the right move until async trait is in nightly. I'll go ahead and make the crate in the next day or so.

If anyone has an opinion about the name, I'm fine with either.

@burrbull
Copy link
Member

burrbull commented Nov 3, 2021

+ on embedded-hal-async. Easier to find on crates.io

@derekdreery
Copy link

derekdreery commented Nov 22, 2021

@Dirbaio I have one major concern about merging this PR: async fn in traits is coming. It'll allow using actual async fns, which will desugar to GAT + regular fn like the traits in this PR. It is very likely that async fn traits will become the idiomatic way, and switching from GATs to async fn will be a breaking change.

I've had the following issue with SPI using embassy: I want to write out bits to a display as quickly as possible (since there are a lot to write and my SPI is not that fast). To do this, I want to use 2 buffers for my DMA. I want one buffer to be DMAd out to the peripheral while I fill the other.

Now a difference between async fn and futures with GATs is that the futures are a concrete type. This type could have a method like start_dma or similar that starts the operation immediately. For the async fn version, remember futures do nothing until polled. So we have to poll the future (await the async fn). This complicates how you implement the parallel write mentioned above.

So basically what I'm saying is that I'd like a concrete type for the future, and then a method like start_dma that will eagerly start writing to the peripheral without waiting for the future to be polled. Maybe there is an elegant solution I've missed, if so please tell me, but otherwise, I feel like we're losing ergonomics if we use async fn, at least for my use case.

@lulf
Copy link

lulf commented Dec 16, 2021

@lachlansneff I think starting with the embedded-hal-async is nice. What is the preferred way to go about this? Merge this PR and reorganize the e-h into two crates then or do we want to see that reorg happening in this PR?

@lachlansneff
Copy link
Contributor Author

@lachlansneff I think starting with the embedded-hal-async is nice. What is the preferred way to go about this? Merge this PR and reorganize the e-h into two crates then or do we want to see that reorg happening in this PR?

Sorry, the semester just finished, so I didn't have time to set this up. I have it half set up on my laptop. Was thinking to just create the crate and drop this PR.

@ryankurte
Copy link
Contributor

ryankurte commented Dec 19, 2021

given the fundamentals (generic type bounds for now, async trait one day) are pretty clear now i wonder whether it might be useful for us (@rust-embedded/hal) to create the embedded-hal-async crate here with the usual tooling and a bunch of warnings about it's alpha-ness? then we could take a similar approach to e-h to collaboratively build it up component by component.

@Dirbaio Dirbaio mentioned this pull request Dec 20, 2021
3 tasks
@Dirbaio
Copy link
Member

Dirbaio commented Dec 20, 2021

I've opened a PR adding embedded-hal-async. #334

@Dirbaio
Copy link
Member

Dirbaio commented May 4, 2022

This can probably be closed, now that we have embedded-hal-async.

@eldruin
Copy link
Member

eldruin commented May 4, 2022

Indeed. Thank you everybody!

@eldruin eldruin closed this May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Review is incomplete T-hal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants