Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix D-Cache invalidation at boot and during MPU setup on Cortex-M7 #35258

Merged
merged 2 commits into from
May 18, 2021

Conversation

aurel32
Copy link
Collaborator

@aurel32 aurel32 commented May 13, 2021

This PR fixes issues introduced by d6b5023 and reported as #35089, #35107 and #35220.

It's not clear to me if we should check the status of the D-Cache, or if it is ensured to be disabled at reset, so I took the safe way.

@aurel32 aurel32 requested review from galak, erwango, mnkp and FRASTM May 13, 2021 13:40
@github-actions github-actions bot added the area: ARM ARM (32-bit) Architecture label May 13, 2021
@aurel32 aurel32 added area: Memory Protection platform: Microchip SAM Microchip SAM Platform (formerly Atmel SAM) platform: STM32 ST Micro STM32 bug The issue is a bug, or the PR is fixing a bug area: ARM ARM (32-bit) Architecture and removed area: ARM ARM (32-bit) Architecture labels May 13, 2021
On reset we do not know what is the status of the D-Cache, nor its
content.

If it is disabled, do not try to clean it, as it might contains random
data for random addresses, and this might just create a bus fault.
Invalidating it is enough.

If it is enabled, it means its content is not random.
SCB_InvalidateDCache() will clean it, invalidate it and disable it.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
@aurel32 aurel32 force-pushed the cortex-m7-d-cache-invalidation branch from 84d88ac to ae3170a Compare May 13, 2021 13:47
@galak galak self-assigned this May 13, 2021
@galak
Copy link
Collaborator

galak commented May 13, 2021

LGTM. Would like to get validation from @mnkp and @erwango / @ABOSTM that this fixes the 2 issues on STM32 as well before we merge this.

@galak galak added the block: HW Test Testing on hardware required before merging label May 13, 2021
@MaureenHelm MaureenHelm requested a review from mmahadevan108 May 13, 2021 15:32
@ioannisg
Copy link
Member

ioannisg commented May 14, 2021

Will test this on SAM E70, hang on

Confirmed, this makes the SAM E70 no crash, thanks @aurel32 for the fix

@mnkp
Copy link
Member

mnkp commented May 14, 2021

It would be good to test it on an NXP chip. They use a different MPU.

@galak
Copy link
Collaborator

galak commented May 14, 2021

It would be good to test it on an NXP chip. They use a different MPU.

Agreed, will ask @MaureenHelm to try this out. @mnkp does this address the issue you are seeing on Atmel SAM?

@mnkp
Copy link
Member

mnkp commented May 14, 2021

@mnkp does this address the issue you are seeing on Atmel SAM?

Sorry, I didn't mention since @ioannisg has already reported the test result. I've also tested the PR on SAM E70 and it fixes #35107.

In case CONFIG_NOCACHE_MEMORY=y, the D-Cache need to be clean and
invalidated before enabling the MPU to make sure no data from a
__nocache__ region is present in the D-Cache.

If the D-Cache is disabled, SCB_CleanInvalidateDCache() shall not be
used as it might contains random data for random addresses, and this
might just create a bus fault.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
@aurel32 aurel32 force-pushed the cortex-m7-d-cache-invalidation branch from ae3170a to 6c46bc9 Compare May 14, 2021 19:58
@aurel32
Copy link
Collaborator Author

aurel32 commented May 14, 2021

It would be good to test it on an NXP chip. They use a different MPU.

Agreed, will ask @MaureenHelm to try this out. @mnkp does this address the issue you are seeing on Atmel SAM?

I think more testing is definitely welcome. Note however that at the end the issue is not linked with the MPU, but just related to the fact that the SCB_CleanInvalidateDCache() is now called earlier when the cache is not yet enabled.

Copy link
Member

@MaureenHelm MaureenHelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested a few networking samples with mimxrt1060_evk and not seeing any issues.

The i.MX RT family (CM7) has the standard Arm MPU. It's only the Kinetis family (CM4) that has the NXP MPU.

@erwango erwango requested a review from ABOSTM May 17, 2021 06:54
@ABOSTM
Copy link
Collaborator

ABOSTM commented May 17, 2021

I confirm issue #35089 is fixed by this PR.
But I could not reproduce the #35220, test is always successful even before this PR,
so I cannot confirm this fixes the issue.
@FRASTM can you check on your side ?

@galak
Copy link
Collaborator

galak commented May 18, 2021

@FRASTM any updates on testing this?

@yashi
Copy link
Collaborator

yashi commented May 20, 2021

Why do we need to clean the d-cache in z_arm_mpu_init()? What data in no-cache area are we trying to write back to the memory? If there is nothing we need to keep, can we just call SCB_InvalidateDCache()?

Just for me to understand the context:

  • CONFIG_NOCACHE_MEMORY means we have uncached region in the system memory.
  • CONFIG_INIT_ARCH_HW_AT_BOOT means force Zephyr to initialize the hardware to reset state. For this case, we explicitly ask Zephyr not to initialize the hardware (which implicitly means that hardware is clean and initialized by PoR, no?)
  • The D-Cache is enabled at init_dcache() at PRE_KERNEL_1 after z_arm_mpu_init().
  • The commit ac3d943 added CB_CleanInvalidateDCache() in the function by @CristXu .

@CristXu
Copy link
Contributor

CristXu commented May 20, 2021

Why do we need to clean the d-cache in z_arm_mpu_init()? What data in no-cache area are we trying to write back to the memory? If there is nothing we need to keep, can we just call SCB_InvalidateDCache()?

Just for me to understand the context:

  • CONFIG_NOCACHE_MEMORY means we have uncached region in the system memory.
  • CONFIG_INIT_ARCH_HW_AT_BOOT means force Zephyr to initialize the hardware to reset state. For this case, we explicitly ask Zephyr not to initialize the hardware (which implicitly means that hardware is clean and initialized by PoR, no?)
  • The D-Cache is enabled at init_dcache() at PRE_KERNEL_1 after z_arm_mpu_init().
  • The commit ac3d943 added CB_CleanInvalidateDCache() in the function by @CristXu .

Hi,
This story is from here, usb: mcux RT1060 EVK - when using on-chip memory, USB fails, the ocram which means on-chip memory, the ocram is cachable when boot, and so that , some global varibles is in the ocram already. But after boot, we will configure the mpu, and we place the non-cache area on the ocram, then if we only invalidate the dcache, the variable will gone, then the usb will fail. so we need to clean&invalidate the ocram first, then configure it as non-cache area to save the old variable.

Regards,
Crist

@yashi
Copy link
Collaborator

yashi commented May 20, 2021

Hi @CristXu,

Thank you for your insight! 😃 Let me clarify a bit.

  • Before Zephyr boots
    • you have some values in OCRAM, which Zephyr needs.
    • OCRAM is cacheable at the time.
    • Because OCRAM is cacheable, the values might only be in the D-Cache and not in OCRAM.
  • After Zephyr is booted
    • OCRAM is configured be non-cacheable
    • But the USB needs the values in OCRAM
  • Thus,
    • you need to make sure the values are in OCRAM, not in the D-Cache.

My question to @CristXu is

  • Is it possible to clean/flash the cache to OCRAM before booting Zephyr?

Another question for Zephyr community is

  • Is it Zephyr's responsibility to clean/flash the contents of D-cache while booting? What happen if a board with CONFIG_NOCACHE_MEORY=y gets here from a bootloader which made D-cache dirty?

Don't take me wrong. I'm not against anything. I just want to know the protocol / requirements / expectations for booting. In fact, I think it's a good idea to have a region of memory for passing data. But in this case, the data is passed by the cache, not the OCRAM, AFAICS.

@CristXu
Copy link
Contributor

CristXu commented May 20, 2021

Hi @CristXu,

Thank you for your insight! 😃 Let me clarify a bit.

  • Before Zephyr boots

    • you have some values in OCRAM, which Zephyr needs.
    • OCRAM is cacheable at the time.
    • Because OCRAM is cacheable, the values might only be in the D-Cache and not in OCRAM.
  • After Zephyr is booted

    • OCRAM is configured be non-cacheable
    • But the USB needs the values in OCRAM
  • Thus,

    • you need to make sure the values are in OCRAM, not in the D-Cache.

My question to @CristXu is

  • Is it possible to clean/flash the cache to OCRAM before booting Zephyr?

Another question for Zephyr community is

  • Is it Zephyr's responsibility to clean/flash the contents of D-cache while booting? What happen if a board with CONFIG_NOCACHE_MEORY=y gets here from a bootloader which made D-cache dirty?

Don't take me wrong. I'm not against anything. I just want to know the protocol / requirements / expectations for booting. In fact, I think it's a good idea to have a region of memory for passing data. But in this case, the data is passed by the cache, not the OCRAM, AFAICS.

Yes, i agree with you. the values need update to the OCRAM after configure this area as the non-cacheable area. But sorry, i do not have any idea of clean/flash the cache to OCRAM before booting zephyr.

@FRASTM
Copy link
Collaborator

FRASTM commented May 20, 2021

The pb of the #35220 is not fixed with this PR. The
Because the the soc/arm/st_stm32/stm32f7 does not define the CONFIG_INIT_ARCH_HW_AT_BOOT
then the Cache is not disabled here in the arch/arm/core/aarch32/cortex_m/scb.c

@galak
Copy link
Collaborator

galak commented May 20, 2021

@FRASTM would it make sense for stm32f7 to define CONFIG_INIT_ARCH_HW_AT_BOOT?

@FRASTM
Copy link
Collaborator

FRASTM commented May 20, 2021

I do not really know what are the impacts of setting the CONFIG_INIT_ARCH_HW_AT_BOOT for the soc/stm32f7 family
especially on the boot phase

@aurel32
Copy link
Collaborator Author

aurel32 commented May 20, 2021

The pb of the #35220 is not fixed with this PR. The

Note that on my side I was able to reproduce the issue from #35220 on a stm32f723e discovery board, and that patch was fixing the problem. That's actually how I produced that patch.

Because the the soc/arm/st_stm32/stm32f7 does not define the CONFIG_INIT_ARCH_HW_AT_BOOT
then the Cache is not disabled here in the arch/arm/core/aarch32/cortex_m/scb.c

The PR is composed of two patches. One fixes the issue in arch/arm/core/aarch32/mpu/arm_mpu.c. This is the originally reported issue.

The other one fixes the issue for CONFIG_INIT_ARCH_HW_AT_BOOT, because i realized that it suffers from exactly the same issue.

Therefore there is no need to select this option for the STM32F7, instead we should understand why the fix doesn't work in your case. For me the bus fault happened in the SCB_CleanInvalidateDCache() function called from arch/arm/core/aarch32/mpu/arm_mpu.c. After this PR, this function should not be called anymore, if the cache is disabled. Could you please check where the bus fault happens on your case? That might help to understand the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture area: Memory Protection block: HW Test Testing on hardware required before merging bug The issue is a bug, or the PR is fixing a bug platform: Microchip SAM Microchip SAM Platform (formerly Atmel SAM) platform: STM32 ST Micro STM32
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants