Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMx7x (E70, S70, V70, V71) DCD Support #693

Merged
merged 39 commits into from
Jul 21, 2021
Merged

Conversation

HiFiPhile
Copy link
Collaborator

@HiFiPhile HiFiPhile commented Mar 3, 2021

Describe the PR
Implementation of the SAM70 DCD driver.

Additional context
Tested with cdc_dual_ports, hid_composite, uac2_headset.

TODO:

  • Fix EP0 bugs
  • Test with more examples
  • Clean up & format code
  • Add DMA & FIFO support
  • Integrate BSP support

Any help is appreciated :)

@HiFiPhile HiFiPhile force-pushed the dcd_same70 branch 2 times, most recently from 630509d to d8acce7 Compare March 3, 2021 20:52
@hathach hathach added the Porting Adding new mcu/board/os support label Mar 4, 2021
@hathach
Copy link
Owner

hathach commented Mar 4, 2021

great PR, I will help and test it out whenever I could. Please keep it updated 👍 👍

@HiFiPhile
Copy link
Collaborator Author

HiFiPhile commented Mar 5, 2021

With DMA and Dual Bank the throughput is quite good.
Running at 150MHz in ITCM, with cdc_xfer_cb commented out

    // TODO search for wanted char first for better performance
    for(uint32_t i=0; i<xferred_bytes; i++)
    {
      tu_fifo_write(&p_cdc->rx_ff, &p_cdc->epout_buf[i]);

      // Check for wanted char and invoke callback if needed
      if ( tud_cdc_rx_wanted_cb && ( ((signed char) p_cdc->wanted_char) != -1 ) && ( p_cdc->wanted_char == p_cdc->epout_buf[i] ) )
      {
        tud_cdc_rx_wanted_cb(itf, p_cdc->wanted_char);
      }
    }

I got 126us for 4096 bytes OUT transfer, that's 32.5MB/s !

Even without DMA & Dual Bank I got about 12.5MB/s.

Unfortunately I can't run cdc_dual_ports in HS mode with dual bank as the packet ram (4KB) is too small.

@hathach
Copy link
Owner

hathach commented Mar 7, 2021

@HiFiPhile though are pretty good number, and we haven't got into the speed optimization of development stage just yet 😄

@perigoso
Copy link
Collaborator

perigoso commented Jun 9, 2021

whats missing on this PR? I have a device that will end up using this driver, and it would be convenient for me to have it merged, can i help in any way?

@HiFiPhile
Copy link
Collaborator Author

Hi, at the moment this PR is tested on bulk transfer but other types of transfer remain untested (should work)

I've done some more work about fifo support but also remain untested.

@hathach
Copy link
Owner

hathach commented Jun 10, 2021

Hi, at the moment this PR is tested on bulk transfer but other types of transfer remain untested (should work)

I've done some more work about fifo support but also remain untested.

Getting cdc_msc and hid_composite running is good enough for PR to merge. The rest can be done as follow up PRs. It is not easy to get everything right for the 1st pr.

@HiFiPhile
Copy link
Collaborator Author

I have a device that will end up using this driver, and it would be convenient for me to have it merged, can i help in any way?

I'll push all works, could you test BSP integration and build with gcc ? Since I'm developing with IAR and my own BSP layer.

@perigoso
Copy link
Collaborator

I can build with GCC, but the only board i have with this device is custom, so Ill have to use my own BSP as well

Signed-off-by: Rafael Silva <perigoso@riseup.net>
Signed-off-by: Rafael Silva <perigoso@riseup.net>
Signed-off-by: Rafael Silva <perigoso@riseup.net>
Signed-off-by: Rafael Silva <perigoso@riseup.net>
Signed-off-by: Rafael Silva <perigoso@riseup.net>
@perigoso
Copy link
Collaborator

perigoso commented Jun 11, 2021

Got it to compile fine on gcc, just needed to swap some macros that are deprecated in the sam cmsis, i guess it doesnt error in IAR, as for actually testing in hardware I can't do it just yet, I'll Pr the changes in your branch forgot i have acces to the repo, so i just pushed directly, hope thats ok with everyone

@HiFiPhile
Copy link
Collaborator Author

HiFiPhile commented Jun 11, 2021

Today I've tested MSC also works, maybe this weekend I'll test more examples.

On Cortex-M7 cache coherence is something need to be take care of.

I use 128KB ITCM + 128KB DTCM + 128KB SRAM, code section on ITCM, readwrite section on DTCM, SRAM is only for DMA buffer. So I can set SRAM region to non-cached in MPU so no need to Clean/Invalidate cache.

@perigoso perigoso changed the title SAME70 DCD Support SAM7x (E70, S70, V70, V71) DCD Support Jun 11, 2021
@perigoso perigoso force-pushed the dcd_same70 branch 2 times, most recently from a2dc16e to b18262d Compare June 11, 2021 22:40
@HiFiPhile
Copy link
Collaborator Author

HiFiPhile commented Jul 21, 2021

Try change only BOARD_DEVICE_RHPORT_SPEED in tusb_conf.h, I use TUD_OPT_HIGH_SPEED in dcd init.

I've no problem running high speed with my AMD Arch pc, could you try my compiled cdc_dual_ports
cdc_fs_8c6cd53.zip
cdc_hs_8c6cd53.zip


I've a Debian pc who need device qualifier otherwise it will fail, or maybe there is bug in my stall function
With device qualifier added it works.
Fixed by 5492d91


I've a GL3520 hub who doesn't work in FS mode, no setup packet received after bus reset. But HS mode works !

@hathach
Copy link
Owner

hathach commented Jul 21, 2021

@HiFiPhile with latest push the cdc_msc now work great, yeah !!! For analyzer, you are right about the cable, it only works with very short cable (0.3m length), while other 0.5-1m doesn't, even that those cables all work with other HS board such as lpc18xx. I guess same7x PHY is more peaky than lpc.

Finally it works, though I go further to test with net lwip example, but it failed that one. Look like setup is still missing, it is probably overrun or race condition.

image

Copy link
Owner

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superb work !! Thank you very much for your PR and effort to troubleshoot the issue with the PR. I have tested cdc_msc, hid_composite example both work greats. There is still issue with net_lwip_webserver and webusb examle, it is probably due to race condition and/or the missing setup etc ... Which we could definitely fix later with followup PRs. The race condition is more severe with HS since bus is much faster and so does the mcu.

PS: @HiFiPhile I am about to merge just now, do you have any pending code or thing that want to push.

@HiFiPhile
Copy link
Collaborator Author

do you have any pending code or thing that want to push.

I'll take a look of net_lwip_webserver if I can make a quick fix, otherwise we can come back later.

@HiFiPhile
Copy link
Collaborator Author

It's simply because SAME70 need different EP for IN and OUT.

Now WebUSB echo works but I can't see \r\nTinyUSB WebUSB device example\r\n which is received in BusHound.

@hathach
Copy link
Owner

hathach commented Jul 21, 2021

It's simply because SAME70 need different EP for IN and OUT.

Now WebUSB echo works but I can't see \r\nTinyUSB WebUSB device example\r\n which is received in BusHound.

Ah thanks, I didn't notice that. Great, merge now, #975 will make it easier for cross-platform application to detect these requirement

@hathach hathach merged commit c737aa6 into hathach:master Jul 21, 2021
@HiFiPhile HiFiPhile deleted the dcd_same70 branch July 21, 2021 17:23
@fjgpereira
Copy link

I managed to get tinyusb/CDC workin on a SAMS70, but had to adapt the code to
the new microchip include files from Harmony3 - XC32.
Detected a few cache/DMA issues that were solved with 2 clean/invalidate cache instructons.
Thank you very much.

@HiFiPhile
Copy link
Collaborator Author

Hi @fjgpereira ,

had to adapt the code to the new microchip include files from Harmony3 - XC32.

So with Harmony3 the register defines are different ?

Detected a few cache/DMA issues that were solved with 2 clean/invalidate cache instructions.

In the DCD driver I haven't included cache clean/invalidate since it's configuration dependant. For example in my case I use DTCM/ITCM and set normal sram to non-cached.

I can add a define to enable them when needed.

@hathach
Copy link
Owner

hathach commented Jul 29, 2021

maybe you could add an PR for cache, it is typically required on M7 mcu. I did something similar for rt10xx
https://github.com/hathach/tinyusb/blob/master/src/portable/nxp/transdimension/dcd_transdimension.c#L48

@fjgpereira
Copy link

Hi,

The include files i found on the Harmony3 project seem to have suffered a few changes, but for the most part the symbols are the same.
For instance most macros now require a value/parameter, etc, the USBHS_DEVDMA array now is spelled in upercase, etc.

In case it has any use, I placed a patch/diff file with the changes at [http://inocam.com/~fjp/dcd_samx7x_H3.diff]

Regarding cache, everything looked to be working fine with ICache enabled and DCache disabled, but to enable DCache
required the invalidate/clean instructions. Maybe it can be placed inside #ifdefs.

I did not investigate the reason why, but It was also necessary to add a __DSB() barrier
to dcd_event_setup_received() on usbd.c after a call to memcpy(), or else Setup events (GetStatus, etc.) were
not processed correctly.

@HiFiPhile
Copy link
Collaborator Author

maybe you could add an PR for cache, it is typically required on M7 mcu. I did something similar for rt10xx
https://github.com/hathach/tinyusb/blob/master/src/portable/nxp/transdimension/dcd_transdimension.c#L48

I think it's better to add a on/off switch, also for dcd_transdimension.

Clean/invalidate cache will cause hardfault in case of it is not enabled.

How I edited your comment :D

@hathach
Copy link
Owner

hathach commented Jul 30, 2021

I think it's better to add a on/off switch, also for dcd_transdimension.
Clean/invalidate cache will cause hardfault in case of it is not enabled.
How I edited your comment :D

I didn't see any issue with rt10xx so far, I don't remember caching within stock example. If it is an issue on other MCU, it is better to detect if cached is enabled or not. Adding an macro is easy, but that config file is growing and already complicated to user (than I would like it to).

@fjgpereira so Microchip has different naming for the Harmony than the asf4, that is a pain, though if there is a macro to detect which framework is used, we could have macro to pick up correct one. I am not familiar with Harmony, do you know if Microchip plan to replace existing asf4 entirely with it.

@fjgpereira
Copy link

fjgpereira commented Jul 30, 2021

Don't know about Microchip plans, but I looked into the include files supplied with XC32 compiler and found that it also uses the new naming conventions.

I didn't check if there are any standard macros to detect the Microchip tools/versions, but as the registers pointer names are different, maybe you can use something like this:

#ifdef USBHS_REGS
#define USBHS USBHS_REGS
#define PMC PMC_REGS
#define __MP__ (1)
#else
#define __MP__
#define USBHS_DEVDMA UsbhsDevdma
#endif

and then add __MP__ after each macro.

@HiFiPhile
Copy link
Collaborator Author

@hathach I think it's also possible to include register defines in tinyUSB, like dcd_transdimension.

@hathach
Copy link
Owner

hathach commented Jul 30, 2021

It is possible, but (transdimention) tdi is a bit extreme since all naming and register address and mapping is totally different between lpc and rt10xx. It is probably needed if we also bundle sam3u etc into the same dcd. However, it will depend on actual register naming/mapping from microchip. The naming here is not that bad, it seems only take an extra argument for some macros. Anyway, I would prefer to review actual code before making any decision at all.

@HiFiPhile
Copy link
Collaborator Author

I did not investigate the reason why, but It was also necessary to add a __DSB() barrier
to dcd_event_setup_received() on usbd.c after a call to memcpy(), or else Setup events (GetStatus, etc.) were
not processed correctly.

How did you configure the MPU regions ?

@HiFiPhile
Copy link
Collaborator Author

I'm making changes in #985 .

@fjgpereira
Copy link

Actually most of the CPU initialization code and peripheral libraries is generated automatically by Harmony3 and I usually only look inside that code when I find some bug or hardware feature not implemented by the default code generated automatically, so I probably am not able to help.

However, MPU is disabled in the Harmony3 graphical tool, but I didn't check if it applies any default configuration.
Looking at the generated code, I also don't find any MPU initialization
(that is: I cannot find the MPU init code that can be found on H3 examples that make use of the MPU)

Regarding that problem on dcd_evfent_setup_received(), I initially noticed that the TU_LOG2_VAR() debug messages were displaying wrong bmRequestType/direction bits on GetStatus requests. Then I noticed the problem did not happen when some log message was inserted after the memcpy(), that lead to conclude that any code that caused "delay" in that spot would solve the problem, so I guess it can be related to cache coherency issues.

@HiFiPhile
Copy link
Collaborator Author

I also don't find any MPU initialization

That's strange... so basically Harmony3 enable cache without prior MPU region configuration ?

But by default the memory type of USB FIFO is 'Device Memory' so it's not cached...

@fjgpereira
Copy link

That's strange... so basically Harmony3 enable cache without prior MPU region configuration ?

I think that is the default Harmony3 settings for new projects.

When the MPU is enabled, it does offer a recommended configuration with 6 regions, ITCM, DTCM, SRAM, EBI_SMC, EBI_SRAM and QSPI, ranging from 0x0 to 0x8ffffffff, but I don't know what applies to 0xA0100000 without specifically defining another region.

@perigoso
Copy link
Collaborator

And that is why Harmony is not used

@fjgpereira
Copy link

And that is why Harmony is not used

So far I've found a few bugs in the auto-generated code, specially with unusual device configurations, but all in all, I think it has saved me a lot of time. However, I almost only use CPU and peripheral initialization code, avoiding other libraries and RTOS.
The tools for pin assignment were specially useful to draw the board schematics, as it helped choose peripherals and assign pins to each function. I think it helped get the first prototype PCBs working without mistakes.

@HiFiPhile
Copy link
Collaborator Author

And that is why Harmony is not used

I think it has saved me a lot of time

I think it's ture for every MCU companion software I've used (STM32Cube, MCUXpresso, Simplicity studio, etc.) while pin tool is practical for PCB planning, their generated code is very pathetic. I always ended with direct register access.

@perigoso
Copy link
Collaborator

Sure you can save a lot of time, and if you're lucky you can have a good experience with it. But you run the risk of having a veeery bad time, it's just not reliable.

@fjgpereira
Copy link

Sure you can save a lot of time, and if you're lucky you can have a good experience with it. But you run the risk of having a veeery bad time, it's just not reliable.

I suspected that and decided to use just a minimal set of features.
That's why I decided to try tinyusb instead of the Harmony's USB stack
that has lots of dependencies with other libraries and rtos.
For the same reason, avoided the E70 GMAC and corresponding TCP/IP stack and
instead chose an WizNet W6100 for TCP/IP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Porting Adding new mcu/board/os support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants