Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DMA based on embedded-dma traits #153

Merged
merged 49 commits into from
Feb 13, 2021
Merged

Implement DMA based on embedded-dma traits #153

merged 49 commits into from
Feb 13, 2021

Conversation

richardeoin
Copy link
Member

@richardeoin richardeoin commented Oct 17, 2020

DMA implementation based on the embedded-dma traits and the existing implementation in the stm32f4xx-hal. Currently some parts are very similar - the Stream trait and its implementation are completely identical similar, and much of dma/mod.rs is similar. However it remains to be seen how much will stay the same when implementing the BDMA and/or MDMA. However some changes are required since we'd like to support the BDMA and MDMA also.

Implementation / examples for:

  • Memory-to-Memory
  • SPI TX
  • I2C RX

TODO:

  • BDMA implementation
  • Confirm that owning the target peripheral is the right API design
  • Do the compiler fences also need matching synchronisation Memory barriers for the Cortex-M7
  • Add an example using RTIC - Thanks ryan-summers!

Contributions to this branch are very welcome, especially implementations and examples for other peripherals.

Ref #80

@richardeoin richardeoin marked this pull request as draft October 17, 2020 21:43
@antoinevg
Copy link
Contributor

I've just pushed a proof-of-concept example using the SAI peripheral with DMA:

https://github.com/antoinevg/stm32h7xx-hal/blob/sai-dma/examples/sai-dma-passthru.rs

I coded this up before you pushed this branch so I'll probably need to go back and update it for your changes!

@antoinevg antoinevg mentioned this pull request Oct 18, 2020
@antoinevg
Copy link
Contributor

antoinevg commented Oct 18, 2020

Please excuse me if I'm a little off-track here but I'm still very new to embedded-hal implementation!

re: Soundness of taking a mutable reference to the target peripheral, rather than consuming it completely

Are we assuming that a given peripheral will only ever have a single DMA stream associated with it?

The SAI peripheral, in particular, has two channels: A & B

Each of these channels has its own DMA stream associated with it.

Usually configured as one stream each for transmitting and receiving audio data.

So there are two related issues here I think:

  1. If we are passing a mutable ref or moving the entire peripheral into the Transfer constructor for SAI Channel A then I can no longer create a second Transfer for SAI Channel B.
  2. Because the data register for each SAI channel is a sub-register we cannot pass it into the peripheral_target_address! macro.

Would it make sense to modify Transfer to just accept the address of the data register rather than the whole peripheral?

i.e. something like:

let mut dma1_str0: Transfer<_, _, MemoryToPeripheral, _> = Transfer::init(
    dma1_stream.0,
    sai1.target_address(SaiChannel::ChannelA),
    tx_buffer,
    None,
    dma1_config,
);

let mut dma1_str1: Transfer<_, _, PeripheralToMemory, _> = Transfer::init(
    dma1_stream.1,
    sai1.target_address(SaiChannel::ChannelB),
    rx_buffer,
    None,
    dma1_config,
);

And also modify the peripheral_target_address! macro to allow us to specify a sub-register using something like:

peripheral_target_address!(
    (pac::SAI1, cha.dr, u32, M2P, DMAReq::SAI1A_DMA),
    (pac::SAI1, chb.dr, u32, P2M, DMAReq::SAI1B_DMA),
);

@mattico
Copy link
Contributor

mattico commented Oct 18, 2020

@antoinevg

My expectation is that it would work like Serial.split() but returning some Channel<> struct that's just used for DMA.

I'm not familiar with the SAI peripheral but for example if you need to ensure the peripheral settings aren't changed during the DMA transfer, do it like Serial.split() and consume the serial port while the DMA is happening, then you can have a function that consumes the Channels and gives you access to the peripheral again.

If you don't need to ensure that peripheral settings aren't changed during the DMA transfer you can just have the peripheral store an Option<Channel> that you take() so you can't get multiple copies of the channel.

If you need to ensure that only some peripheral settings aren't changed during the DMA transfer you can do something like Serial.split() but also return a version of the peripheral with some typestate so that certain functions aren't implemented in that state (see spi::Enabled and spi::Disabled).

@mattico
Copy link
Contributor

mattico commented Oct 18, 2020

I've had some DMA code using the serial peripheral: https://github.com/mattico/stm32h7xx-hal/tree/add-dma-example
I'm going to rebase it on this and try to modify the HAL serial to support DMA.

@mattico
Copy link
Contributor

mattico commented Oct 18, 2020

@richardeoin Re: Do the compiler fences also need matching synchronisation barriers for the Cortex-M7?

https://community.st.com/s/article/FAQ-DMA-is-not-working-on-STM32H7-devices has good information about this if you haven't seen it.

From what I remember, I couldn't get DMA to work with the serial port without disabling the dcache on the DMA buffers. No combination of dsb() compiler_fence() or even not enabling the global dcache (!) worked. (so I was probably doing something wrong...)

@adamgreig
Copy link
Member

With DMA in a cached region you'd have to use the cache control operations to specifically invalidate (on DMA writes) or clean (on DMA reads) the memory in use for DMA.

@antoinevg
Copy link
Contributor

@mattico

My expectation is that it would work like Serial.split() but returning some Channel<> struct that's just used for DMA.
<schnip/>

Thank you for taking the time to answer.

I think I understood about 1% of what you were saying there, I really meant it when I said I'm new to embedded-hal development! 🤣

Could you show what you mean in a few lines of hypothetical code? Alternatively, is there a tutorial or documentation I could read that can help bridge the gaps between the working SAI/DMA implementation code I posted earlier in this thread, my two questions and your reply?

I could probably piece things together on my own from the sources but time is expensive and it would be cool if I could put the little I have towards contributing functionality rather than reverse engineering!

@richardeoin
Copy link
Member Author

Hi @antoinevg @mattico ! Those examples are great, thanks for sharing them. Also nice that you're trying to update them to work with this PR.

Indeed DMA with the SAI isn't well supported without several changes, so right now you're running a bit ahead of where this PR is @antoinevg

Are we assuming that a given peripheral will only ever have a single DMA stream associated with it?

Yes, the current instances of peripheral_target_address!( assume that a given peripheral only has one associated DMA stream (at once). That's a result of us representing ownership to the compiler in blocks of 'peripheral', which is the approach most current PACs take (but not the only possible approach).

Indeed the approach taken by the serial HAL would be a good option. Here's the function where the serial HAL (which owns the serial PAC) is split into two ZSTs. Here we're using ZSTs to represent ownership, whilst not consuming resources at runtime.

You'd then implement the DMA like so:

peripheral_target_address!(
    (Rx<pac::USART1>, rdr, u8, P2M, DMAReq::USART1_RX_DMA),
    (Tx<pac::USART1>, tdr, u8, M2P, DMAReq::USART1_TX_DMA),
);

Potentially this approach would work for the SAI channels too.

@richardeoin
Copy link
Member Author

@mattico The link about memory layout and dcache is useful, although it's a separate issue (I think). This PR doesn't consider the dcache at all yet, it just assumes it's off.

@richardeoin
Copy link
Member Author

@antoinevg I added an extra option to the peripheral_target_address! macro, so the proposal from your comment above should compile now if you add it!

peripheral_target_address!(
    (pac::SAI1, cha.dr, u32, M2P, DMAReq::SAI1A_DMA),
    (pac::SAI1, chb.dr, u32, P2M, DMAReq::SAI1B_DMA),
);

If you have a neater way of modifying the macro that's also great, I didn't spend too much time thinking about it.

Of course it doesn't solve the problem with actually initialising both streams at the same time, since they both want a mutable reference to pac::SAI1

@antoinevg
Copy link
Contributor

antoinevg commented Oct 19, 2020

@antoinevg I added an extra option to the peripheral_target_address! macro, so the proposal from your comment above should compile now if you add it!

Awesome, working nicely thank you!

Of course it doesn't solve the problem with actually initialising both streams at the same time, since they both want a mutable reference to pac::SAI1

We also need access to pac::SAI1 after DMA has started up.

The SAI peripheral is a bit weird in that it needs DMA to be up & running before setting you can set its dmaen and en bits.

I see that transfer.start(|sai1| {}) supplies a reference to the peripheral's register block so I could just twiddle the bits directly.

That said, wouldn't it be better to rather have hal::SAI1 passed so we can call e.g. sai1.enable_dma() and sai1.is_fifo_empty() methods?

EDIT:

It gets worse… 😅

It looks like the SAI peripheral needs the DMA to be fully configured before it can be successfully configured otherwise DMA stalls after the first interrupt is fired and the flag registers are corrupted.

Which of course we can't do because the Transfer constructor needs the configured SAI peripheral.

@richardeoin
Copy link
Member Author

I see that transfer.start(|sai1| {}) supplies a reference to the peripheral's register block

Yup, this closure allows access to whatever memory the Transfer has exclusive mutable access to. That that would be the place to set the dmaen and dma bits. I experimented with implementing TargetAddress for both the PAC and HAL types for I2C. It seems to work well, so I think that can be done for other peripherals too.

It looks like the SAI peripheral needs the DMA to be fully configured before it can be successfully configured

Ah, you mean that the DMA must be configured before the call to SAI1.i2s_ch_a(...)? That does make it very difficult to give the transfer exclusive ownership to anything.

If you're willing to use unsafe, then working around this is quite easy. The most obvious escape hatch that already exists is the steal() method.

let mut sai1_a_for_dma = unsafe { pac::Peripherals::steal().SAI1 };
let mut sai1_b_for_dma = unsafe { pac::Peripherals::steal().SAI1 };

let mut transfer_ch0: Transfer<_, _, PeripheralToMemory, _> = Transfer::init(
        streams.0,
        &mut sai1_a_for_dma,
        ...);
let mut transfer_ch1: Transfer<_, _, MemoryToPeripheral, _> = Transfer::init(
        streams.1,
        &mut sai1_b_for_dma,
        ...);

The usefulness of this construction is an argument that TargetAddress should be implemented directly on the PAC/HAL struct itself, rather than a mutable reference to them. Then you could write:

let mut transfer_ch0: Transfer<_, _, PeripheralToMemory, _> = Transfer::init(
        streams.0,
        unsafe { pac::Peripherals::steal().SAI1 },
        ...);

@mtthw-meyer
Copy link
Contributor

FWIW when I was working on SAI I was thinking the SAI would be handed off to the DMA.

@richardeoin richardeoin force-pushed the dma branch 2 times, most recently from 17aa9e4 to 335f183 Compare October 23, 2020 20:20
@antoinevg
Copy link
Contributor

antoinevg commented Oct 25, 2020

Thank you for the suggestions @richardeoin, they helped a lot.

We've now got a first iteration of the sai_dma_passthru example implemented against embedded_dma:

antoinevg@e01b7e8

There are still a few things which are not ideal, but hopefully this gets us a step closer.

Notes:

  1. I had to add a type-alias for the other DMA Request Multiplexer for SAI3/4 in src/dma/dma.rs:755
  2. It would be nice if we could implement TargetAddress for u16's
  3. It doesn't look like dma::Transfer has support for configuring the circular buffer.
  4. I couldn't figure out the type signature for dma::Transfer 🤯 so I needed to reach for the hatch in the interrupt handler. This is also blocking me from doing an implementation using rtic.

@richardeoin
Copy link
Member Author

richardeoin commented Oct 25, 2020

Great! The example looks reasonable already, and it's very useful to have to play around with.

Some quick responses to your notes:

  1. You almost certainly should not do that. DMA1/2 do not have any request lines from SAI4, those lines go to the BDMA instead. So SAI4 transfers (unlike SAI1/2/3) should be defined in bdma.rs instead.
  2. Yes, I think for some peripherals it would be useful to implement more than one of u32,u16,u8. At the moment it's not possible because MemSize is an associated type, not a generic type. It wouldn't be too complicated to add, but it would add a 5th(!) parameter to the type signature of dma::Transfer.
  3. Indeed there's no support for the circular buffer, I haven't looked at that at all.
  4. The type signature is something like dma::Transfer<dma::dma::Stream1<DMA1>, &mut stm32::SAI1, dma::PeripheralToMemory, &mut [u32; 128]>. To use it with RTIC you'll need to specify the static lifetime for the two references. That shouldn't be too hard for the buffer (&'static mut [u32; 128]) but it's more complicated with the peripheral. It's another good reason that TargetAddress should be implemented directly on the PAC/HAL struct itself, rather than a mutable reference. So I'll probably change that soon.

Even though some debug probes might load it, we certainly shouldn't assume that
in the examples.
The BDMA does not support all of the functionality in Stream. For now
unsupported methods are just left as no-ops.
As well as the PAC structure.

Also add the option to include an extra layer of
indirection for the target register
Allow use of mem::transmute to elide the intermediate types
Without barrier instructions, the Cortex-M7 core can reorder the execution of
transfers to normal and device memory with respect to each other.

This does not correspond to the compiler_fence calls, which are concerned with
generating the correct _program order_. The Cortex-M7 core does not reorder the
transfers to device memory (the DMA configuration) with respect to each other.
The remaining core Stream trait should apply to MDMA streams also, although
implementing this is future work
@richardeoin richardeoin mentioned this pull request Jan 4, 2021
3 tasks
@richardeoin
Copy link
Member Author

@ryan-summers Could you take a look at the merge above? I had to re-work much of #173 to merge #171, in the end it was quite simple but I hope it still works (On hardware I only checked that remainder was zero after non-cancelled transfers).

@ryan-summers
Copy link
Member

I gave it a review and did some hardware testing with the dma branch after the merge and everything is looking good from my end

Move methods that we can't implement for the MDMA from the `Stream` trait to the
`DoubleBufferedStream` trait (the MDMA is not double-buffered).
@richardeoin richardeoin mentioned this pull request Jan 7, 2021
6 tasks
The lack of support for constant source buffers was first raised by
@ryan-summers here
#153 (comment)

This solution adds a 5th generic type parameter to `Transfer`, and uses this to
implement transfer for two different type constraints on `BUF`.

The additional generic type parameter will also be useful for MDMA support,
which will also need its own versions of `init`, `next_transfer`,
`next_transfer_with` and so on.

The downside is the additional complexity of another type parameter
@richardeoin
Copy link
Member Author

Breaking change: Added additional generic type on Transfer. See #178

Updating smoltcp on DMA branch
@bors bors bot closed this Jan 31, 2021
@bors bors bot deleted the branch master January 31, 2021 20:07
@richardeoin richardeoin reopened this Jan 31, 2021
@richardeoin
Copy link
Member Author

Reopened after it was closed by an errant bors

Copy link
Member

@ryan-summers ryan-summers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We can always implement future changes if we want to make some things cleaner (like supporting full duplex DMA), but this is a wonderful start

@richardeoin
Copy link
Member Author

bors r+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants