-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating sequential-storage, adding flash erase support #884
Conversation
There was a breaking change to the on-disk data format in 0.7, so yes this will break things. Also, I'm very close to releasing v2.0 with some API changes that make dealing with the keys and values easier (and more straightforward). |
src/hardware/flash.rs
Outdated
|
||
// Note: While this is supported, multiple writes to the same word can trigger ECC errors due to | ||
// incorrect writes. | ||
impl embedded_storage_async::nor_flash::MultiwriteNorFlash for Flash {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, so I've isolated the problem with this PR.
The comment here is indeed the problem. The datasheet specifies:
It is not recommended to overwrite a flash word that is not virgin. The result may lead to an
inconsistent ECC code that will be systematically reported by the embedded flash memory,
as described in Section 4.7.7: Error correction code error (SNECCERR/DBECCERR)
Datasheet Section 4.3.9
My hope was that we could always set any bits from 1->0 without triggering the ECC faults. However, this appears not to be the case. I performed some manual write testing during the Stabilizer hardware::setup
process doing the following:
- Erase flash sector starting at 0x8100_0000, which is Bank 2, Sector 1 (flash sector = minimum erase size, 128KB)
- Write 0xAA to a full flash page at the start of the sector (flash page = minimum write length, 256 bits, 32 bytes)
At this point, I tried a number of test cases with varying results:
- Write 0x00 to the entire 256 bit flash page previously written with 0xAA
- This is acceptable and does not cause ECC faults
- Write 0xAA to the entire 256 bit flash page again (redundantly writing)
- This is acceptable and does not cause ECC faults
- Writing 0x00 to the first byte, and writing 0xAA to the remaining bytes
- This triggers an ECC fault
- Writing 0x00 to the first two bytes, writing 0xAA to remaining bytes
- This triggers an ECC fault
- Writing 0x00 to the first 4 bytes, writing 0xAA to remaining bytes
- This is acceptable and does not cause ECC faults
Currently, sequential-storage
"erases" an element by clearing the CRC (writing it to zero) from an item header. An item header is composed of a CRC (16 bit) + Length (16-bit) forming a 32-bit word. Thus, this operation is equivalent to test 4 above, which was observed to trigger ECC faults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I believe section 4.3.12 explains why this behavior is being observed:
During each program operation, a 10- bit ECC code is associated to each 256-bit data flash
word, and the resulting 266-bit flash word information is written in non-volatile memory
I suspect the ECC code can also only be written from 1->0. Based on how we manipulate the flash page, some of the ECC bits may go from 0->1 even if the original flash word only has bits going from 1->0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryan-summers This (modifying existing data, e.g. as deletion marker) sounds like a general issue for sequential-storage on top of ECC flashes. Maybe file an issue there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's why it requires the trait bound MultiwriteNorFlash
. I was hoping we could safely implement the trait, but we can't. I removed the implementation.
@jordens I've worked around bus faults by instead writing an empty serialized value into the map, which gives us |
Oh yes. That sounds good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Modulo the question of whether we need the double postcard serialization layers.
The fix is pretty small to eliminate the double serialization layer. Will push it shortly after testing. |
Fixes #861 by updating
clear <setting>
to actually erase the requested setting from flash memory instead of overwriting it with the current default value.This PR also updates to
sequential-storage
v1.0 by using embassy-futures to complete the async operations in a blocking manner. This shouldn't matter at all because the underlying impl is blocking anyways. However, the version bump may cause users to lose their flash settings.To test this, I cleared flash settings and rebooted the device. I then reprogrammed it with a new default broker of "mqtt2" and verified that the device used the new default of mqtt2 after reprogramming, which indicated that nothing was loaded from flash.