Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating sequential-storage, adding flash erase support #884

Merged
merged 14 commits into from
Jun 19, 2024

Conversation

ryan-summers
Copy link
Member

Fixes #861 by updating clear <setting> to actually erase the requested setting from flash memory instead of overwriting it with the current default value.

This PR also updates to sequential-storage v1.0 by using embassy-futures to complete the async operations in a blocking manner. This shouldn't matter at all because the underlying impl is blocking anyways. However, the version bump may cause users to lose their flash settings.

To test this, I cleared flash settings and rebooted the device. I then reprogrammed it with a new default broker of "mqtt2" and verified that the device used the new default of mqtt2 after reprogramming, which indicated that nothing was loaded from flash.

@ryan-summers ryan-summers marked this pull request as ready for review April 22, 2024 14:42
serial-settings/src/lib.rs Outdated Show resolved Hide resolved
src/settings.rs Outdated Show resolved Hide resolved
serial-settings/src/lib.rs Outdated Show resolved Hide resolved
@diondokter
Copy link

However, the version bump may cause users to lose their flash settings.

There was a breaking change to the on-disk data format in 0.7, so yes this will break things.

Also, I'm very close to releasing v2.0 with some API changes that make dealing with the keys and values easier (and more straightforward).
v1.0 is fine and without known bugs, so you can upgrade to that just fine if you don't want to do the refactoring. Going from v1 to v2 won't have changes on-disk.


// Note: While this is supported, multiple writes to the same word can trigger ECC errors due to
// incorrect writes.
impl embedded_storage_async::nor_flash::MultiwriteNorFlash for Flash {}
Copy link
Member Author

@ryan-summers ryan-summers Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so I've isolated the problem with this PR.

The comment here is indeed the problem. The datasheet specifies:

It is not recommended to overwrite a flash word that is not virgin. The result may lead to an
inconsistent ECC code that will be systematically reported by the embedded flash memory,
as described in Section 4.7.7: Error correction code error (SNECCERR/DBECCERR)

Datasheet Section 4.3.9

My hope was that we could always set any bits from 1->0 without triggering the ECC faults. However, this appears not to be the case. I performed some manual write testing during the Stabilizer hardware::setup process doing the following:

  1. Erase flash sector starting at 0x8100_0000, which is Bank 2, Sector 1 (flash sector = minimum erase size, 128KB)
  2. Write 0xAA to a full flash page at the start of the sector (flash page = minimum write length, 256 bits, 32 bytes)

At this point, I tried a number of test cases with varying results:

  1. Write 0x00 to the entire 256 bit flash page previously written with 0xAA
    • This is acceptable and does not cause ECC faults
  2. Write 0xAA to the entire 256 bit flash page again (redundantly writing)
    • This is acceptable and does not cause ECC faults
  3. Writing 0x00 to the first byte, and writing 0xAA to the remaining bytes
    • This triggers an ECC fault
  4. Writing 0x00 to the first two bytes, writing 0xAA to remaining bytes
    • This triggers an ECC fault
  5. Writing 0x00 to the first 4 bytes, writing 0xAA to remaining bytes
    • This is acceptable and does not cause ECC faults

Currently, sequential-storage "erases" an element by clearing the CRC (writing it to zero) from an item header. An item header is composed of a CRC (16 bit) + Length (16-bit) forming a 32-bit word. Thus, this operation is equivalent to test 4 above, which was observed to trigger ECC faults.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I believe section 4.3.12 explains why this behavior is being observed:

During each program operation, a 10- bit ECC code is associated to each 256-bit data flash
word, and the resulting 266-bit flash word information is written in non-volatile memory

I suspect the ECC code can also only be written from 1->0. Based on how we manipulate the flash page, some of the ECC bits may go from 0->1 even if the original flash word only has bits going from 1->0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryan-summers This (modifying existing data, e.g. as deletion marker) sounds like a general issue for sequential-storage on top of ECC flashes. Maybe file an issue there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's why it requires the trait bound MultiwriteNorFlash. I was hoping we could safely implement the trait, but we can't. I removed the implementation.

@ryan-summers
Copy link
Member Author

@jordens I've worked around bus faults by instead writing an empty serialized value into the map, which gives us remove_item-like behavior without needing to write flash multiple times.

@jordens
Copy link
Member

jordens commented Jun 18, 2024

@jordens I've worked around bus faults by instead writing an empty serialized value into the map, which gives us remove_item-like behavior without needing to write flash multiple times.

Oh yes. That sounds good.

src/settings.rs Outdated Show resolved Hide resolved
jordens
jordens previously approved these changes Jun 19, 2024
Copy link
Member

@jordens jordens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Modulo the question of whether we need the double postcard serialization layers.

@ryan-summers
Copy link
Member Author

The fix is pretty small to eliminate the double serialization layer. Will push it shortly after testing.

@ryan-summers ryan-summers added this pull request to the merge queue Jun 19, 2024
Merged via the queue into main with commit 3ef00cb Jun 19, 2024
7 of 8 checks passed
@jordens jordens deleted the fix/861/menu-clear branch June 19, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

USB/flash clear/save behavior w.r.t. default changes
3 participants