Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmartLedsAdapter::new does not have async impl for Rmt::new_async? #4

Open
brainstorm opened this issue Jun 24, 2024 · 18 comments
Open

Comments

@brainstorm
Copy link

Mentioned in esp-rs matrix and here too, flagging it here:

error[E0277]: the trait bound `esp_hal::rmt::ChannelCreator<Async, 0>: TxChannelCreator<'_, _, _>` is not satisfied
  --> src/main.rs:27:37
   |
27 |     let led = SmartLedsAdapter::new(rmt.channel0, io.pins.gpio38, rmt_buffer, &clocks);
   |               --------------------- ^^^^^^^^^^^^ the trait `TxChannelCreator<'_, _, _>` is not implemented for `esp_hal::rmt::ChannelCreator<Async, 0>`
   |               |
   |               required by a bound introduced by this call
   |
   = help: the following other types implement trait `TxChannelCreator<'d, T, P>`:
             <esp_hal::rmt::ChannelCreator<Blocking, 0> as TxChannelCreator<'d, esp_hal::rmt::Channel<Blocking, 0>, P>>
             <esp_hal::rmt::ChannelCreator<Blocking, 1> as TxChannelCreator<'d, esp_hal::rmt::Channel<Blocking, 1>, P>>
             <esp_hal::rmt::ChannelCreator<Blocking, 2> as TxChannelCreator<'d, esp_hal::rmt::Channel<Blocking, 2>, P>>
             <esp_hal::rmt::ChannelCreator<Blocking, 3> as TxChannelCreator<'d, esp_hal::rmt::Channel<Blocking, 3>, P>>
@bjoernQ
Copy link

bjoernQ commented Jun 24, 2024

I somehow missed that on Matrix.

There is no async-version of smart_leds_trait AFAIK but probably we can just implement an async-write on SmartLedsAdapter

@brainstorm
Copy link
Author

brainstorm commented Jun 24, 2024

Thanks! Perhaps worth opening another issue/discussion although applies to this particular case too:

What do you think about the use of ::new() builder for both blocking and async? As mentioned over matrix:

Regarding blocking vs async ::new() methods in esp_hal, I've observed that there's I2s::new() which is is async (when feature-gated w/ async), but then there's Uart::new_async_with_config and Rmt::new_async() (instead of just using ::new() for both modes like in the I2s case).

I personally like ::new() for both blocking and async if possible, so if I have to implement SmartLedsAdapter's new async builder method which variant would be "preferred"?:

SmartLedsAdapterAsync::new()
SmartLedsAdapter::new()            # <-- with features = ["async"], <3
SmartLedsAdapter::new_async()

Is it worth making this uniform across peripherals in esp-hal or is it too painful of a breaking change at this point?

/cc @MabezDev

@Dominaezzz
Copy link

Seeing as the SmartLedsAdapter struct itself doesn't provide the async-ness, but the rmt channel does. It makes sense to just have a single new() function, then the write method can be conditionally async or blocking depending on what the rmt channel's config is.

(Though one could argue that since this struct's sole purpose was to provide an adapter for smart_leds, maybe it shouldn't have an async variant without an upstream trait, but no reason to be strict I guess)

Is it worth making this uniform across peripherals in esp-hal or is it too painful of a breaking change at this point?

This could probably be done for the drivers that mandate DMA, as the async mode could be inferred from the dma channel as well. (Personally I'd say it's too soon as the DMA based apis aren't mature enough)

@brainstorm
Copy link
Author

brainstorm commented Jun 30, 2024

Thanks for the comments and ideas!

Seeing as the SmartLedsAdapter struct itself doesn't provide the async-ness

Indeed, that Rmt struct member, Option<TX>, where TX is pointing to pub trait TxChannel: private::TxChannelInternal<crate::Blocking> does seem to seal the deal on blocking, unfortunately :_/

(...) then the write method can be conditionally async or blocking

I'm not sure if limiting the async-ness to just the write method as @Dominaezzz suggests would help for too long since the RX side would eventually need async as well (for other applications or even for, say, RMT async-driven RX code for photosensors (or LEDs being used as photosensors :P)?

For now I'm being cheeky, breaking stuff locally, patching and defaulting to async for write (don't know if the #cfg's/conditionals/gating needed here would be accepted as "clean" or good?):

on smartleds-esp-hal:

diff --git a/esp-hal-smartled/src/lib.rs b/esp-hal-smartled/src/lib.rs
index 98164779..86a91249 100644
--- a/esp-hal-smartled/src/lib.rs
+++ b/esp-hal-smartled/src/lib.rs
@@ -30,7 +30,7 @@ use esp_hal::{
     clock::Clocks,
     gpio::OutputPin,
     peripheral::Peripheral,
-    rmt::{Error as RmtError, PulseCode, TxChannel, TxChannelConfig, TxChannelCreator},
+    rmt::{asynch::TxChannelAsync, Error as RmtError, PulseCode, TxChannelConfig, TxChannelCreatorAsync},
 };
 use smart_leds_trait::{SmartLedsWrite, RGB8};
 
@@ -73,7 +73,7 @@ macro_rules! smartLedBuffer {
 /// interaction functionality using the `smart-leds` crate
 pub struct SmartLedsAdapter<TX, const BUFFER_SIZE: usize>
 where
-    TX: TxChannel,
+    TX: TxChannelAsync,
 {
     channel: Option<TX>,
     rmt_buffer: [u32; BUFFER_SIZE],
@@ -82,7 +82,7 @@ where
 
 impl<'d, TX, const BUFFER_SIZE: usize> SmartLedsAdapter<TX, BUFFER_SIZE>
 where
-    TX: TxChannel,
+    TX: TxChannelAsync,
 {
     /// Create a new adapter object that drives the pin using the RMT channel.
     pub fn new<C, O>(
@@ -93,7 +93,7 @@ where
     ) -> SmartLedsAdapter<TX, BUFFER_SIZE>
     where
         O: OutputPin + 'd,
-        C: TxChannelCreator<'d, TX, O>,
+        C: TxChannelCreatorAsync<'d, TX, O>,
     {
         let config = TxChannelConfig {
             clk_divider: 1,
@@ -160,7 +160,7 @@ where
 
 impl<TX, const BUFFER_SIZE: usize> SmartLedsWrite for SmartLedsAdapter<TX, BUFFER_SIZE>
 where
-    TX: TxChannel,
+    TX: TxChannelAsync,
 {
     type Error = LedAdapterError;
     type Color = RGB8;
@@ -168,7 +168,7 @@ where
     /// Convert all RGB8 items of the iterator to the RMT format and
     /// add them to internal buffer, then start a singular RMT operation
     /// based on that buffer.
-    fn write<T, I>(&mut self, iterator: T) -> Result<(), Self::Error>
+    async fn write<T, I>(&mut self, iterator: T) -> Result<(), Self::Error>
     where
         T: IntoIterator<Item = I>,
         I: Into<Self::Color>,
@@ -187,14 +187,15 @@ where
         *seq_iter.next().ok_or(LedAdapterError::BufferSizeExceeded)? = 0;
 
         // Perform the actual RMT operation. We use the u32 values here right away.
-        let channel = self.channel.take().unwrap();
-        match channel.transmit(&self.rmt_buffer).wait() {
-            Ok(chan) => {
-                self.channel = Some(chan);
+        let mut channel = self.channel.take().unwrap();
+        match channel.transmit(&self.rmt_buffer).await {
+            Ok(_) => {
+                //self.channel = Some(chan);
                 Ok(())
             }
-            Err((e, chan)) => {
-                self.channel = Some(chan);
+            //Err((e, chan)) => {
+            Err(e) => {
+                //self.channel = Some(chan);
                 Err(LedAdapterError::TransmissionError(e))
             }
         }

Then adding async on the fn write trait member from smart-leds-trait crate, effectively having a breaking version of the (now async) trait, leveraging the relatively new (and simpler!) support for async traits in nightly.

@brainstorm
Copy link
Author

brainstorm commented Jun 30, 2024

Oh, scratch that comment about what to convert to async @Dominaezzz: I just realised that smart-leds-trait only implements TX because it only has the SmartLedsWrite trait definition... and what I commented about the photosensors (RX case) would be implemented via the already fully async RMT impl 🤦🏻

With this out of the way, it seems like I just need to carefully gate my changes through #[cfg(feature = "async")], so it might not be too tricky after all.

@tschundler
Copy link
Contributor

Digging around with this today (see comments in esp-rs/esp-hal#1779) I seem to be stuck. The async context switching is surprisingly slow. Sometimes so slow that by the first time the future is polled, the RMT has already read 32 pulses. And wake up triggered by interrupt is particularly bad. (though I am also suspicious I may be doing something wrong with the interrupts)

I have this kind of working in my branch (esp-rs/esp-hal@main...tschundler:esp-hal:async-led-iterator) in very messy code with no interrupts. Maybe it can serve as inspiration?

I may personally give up on async and focus on my original goal in esp-rs/esp-hal#1768 of iterator input for RMT since that part I have working well. I'm well beyond my depth at this point. I'm debugging by panicking and dumping RMT registers. I feel there has to be a better way.

@Dominaezzz
Copy link

The async context switching is surprisingly slow.

Are you running in release mode?

@tschundler
Copy link
Contributor

tschundler commented Jul 14, 2024

The async context switching is surprisingly slow.

Are you running in release mode?

Yes. Even though the cargo generate template has some optimization in debug mode, I also tried with --release

Digging around FastLED I see a lot of mention of IRAM for timing critical stuff. Maybe that is a factor? Specifically for Embassy itself?

Fwiw without async, I get working LED signals in Wokwi at 40MHz stimulated, so that gives time for 5x more code than that is running, which is why I feel the issue is the overhead of async executor. But I also haven't played with cargo asm to see if the compiler is failing to inline something when the code is expressed differently.

@MabezDev
Copy link
Member

@tschundler Maybe async is an issue, but you'll likely also want to place the rmt driver in RAM (esp-idf has an option for this), for better perf.

We already have a config option for this with SPI, once esp-rs/esp-hal#2156 is merged we could add one for RMT.

@jessebraham jessebraham transferred this issue from esp-rs/esp-hal Oct 3, 2024
@robamu
Copy link

robamu commented Oct 12, 2024

I actually implemented this before I saw the issue.

I opened a PR here:

#6

I successfully used it to drive a 46 LED lightstrip asynchronously :)

@tschundler
Copy link
Contributor

I wasn't able to get this working well enough for Burning Man so priority of fixing it up dropped. I'll play with your PR when I get a chance. I have a variety of LEDs with different timing tolerances.

Skimming it... So you didn't work around the bug that async RMT right now can't refill the buffer and only supports one-shot tx, rather you're sending a bunch of single shorts. As long as the next single shot is less than the latch time, that would work. That's simpler than my approach trying to fix RMT to allow arbitrary input lengths in async mode. (And then doing to conversion to pulse-codes on the fly)

Have you checked how long it is between chunks? I wonder if it could do the pulse code conversion while RMT is sending in the SmartLEDs code. My goals are parallel output and lower memory usage to support >2k total LEDs.

@robamu
Copy link

robamu commented Oct 13, 2024

I actually measured the time compared to the blocking API. Writing all 46 LEDs with async API takes around 3-4 ms, blocking API takes ~2 ms (but is blocking, which incurred issues with the IR receiver handling) . Considering than my LED fine ticker timer is 20 ms, this was not a problem for me. I actually saw that the blocking API is doing some magic of continously filling the other buffer, and doing that with the async API would probably require changes or a custom driver, because the async API does not listen to the threshold interrupts.

I did not measure the time between chunks yet, but I can try to do that.

@robamu
Copy link

robamu commented Oct 14, 2024

Each write takes around 52 microseconds. (all tests of course done with --release version)

@tschundler
Copy link
Contributor

That's close to what I got testing your code. It work with only 2 of 6 types of LEDs that I have handy. For others, the extra 20-some microsecond delay between pixels is treated as the latch time, and you only get one pixel working.

So I think it is reasonable with comments and we can work on improving it.

@robamu
Copy link

robamu commented Oct 17, 2024

Hmm, does that mean that these pixels would require a new transfer to be started immediately? The current async API does not allow this from what I can see, custom (interrupt) code would probably required to do this :/

@tschundler
Copy link
Contributor

Hmm, does that mean that these pixels would require a new transfer to be started immediately? The current async API does not allow this from what I can see, custom (interrupt) code would probably required to do this :/

The new transfer needs to be within 6us judging by https://wp.josh.com/2014/05/13/ws2812-neopixels-are-not-so-finicky-once-you-get-to-know-them/amp/

If this gap is longer than ~6,000ns, it will become a reset pulse (TLL) to the next pixel in the string.

despite the datasheet saying 50us for latch time. Though it seems some clones do have a longer latch time.

@robamu
Copy link

robamu commented Oct 17, 2024

Okay, that's unfortunate. Just so I understand correctly: The amount of time required for the RMT to finish, wake the task again, context switch back to start the next transfer is longer than those 6 us? I don't see another reliable way to deal with this other than interrupts or blocking/polling just like the blocking API does..

@tschundler
Copy link
Contributor

The amount of time required for the RMT to finish, wake the task again, context switch back to start the next transfer is longer than those 6 us?

Yes, it looks like it is taking 20uS (~30us to send 24 bits + ~50uS total per pixel)

I found that when I faked async by doing a loop of polling I had no problem, and I forgot how, but I estimated the async context switch to take about 8us, which seemed long, but I was also suspicious that I'm doing something incorrectly.


I've realized that testing my variant, I should probably flip pins for when ISR is running and when the task is awake, then I can see the timing relationship better.

(And in my case, since my real goal is parallel output, maybe I2S/LCD out is what I should really be looking into)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

6 participants