-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC for handling Erase WiFi Setting after OTA #6965
base: master
Are you sure you want to change the base?
Conversation
Issue: Sometimes when an ESP8266 is reflashed/upgraded the WiFi does not work. Then a serial flash with Erase Flash with WiFi setting is recommended. I have seen this more often when changing SDK by OTA. We don't have an erase WiFi for OTA. The PR tries to present a proof of concept solution. Actually it describes two different methods. There are 3 cases to consider when the firmware is updated by OTA. The new firmware: 1. has the same Flash Configuration as the old. 2. has a larger Flash Configuration than the old. 3. has a smaller Flash Configuration then the old. In theory after an OTA and before a restart, the flash could be erased for _case 1_. _Case 2_ is a problem because the size exceeds what the SPIEraseSector expects and fails. We have to wait for a restart, after which the SDK will have updated the values in `flashchip`. _Case 3_ is potentially unsafe because we could be erasing the code that is running. At app_entry() `flashchip` properties appear to be reset to a default 4MByte flash. Even an ESP8285 reported 4MByte. The value of `flashchip->chip_size` was changed at the 2nd SPIRead. The 1st read was to address 0, 4 bytes. To erase the flash WiFi area, I choose to wait until the SDK has finished its adjustments to the `flashchip` structure. Then to begin erasing WiFi sectors. I implemented two methods: 1. The first runs after the SDK calls `user_init()` and flash code execution is available (No IRAM needed), but restarts to be sure the SDK does not get confused about its sectors being erased. 2. The 2nd method runs as early as possible required IRAM. The sectors are erased before that 2nd read is processed. The SDK allowing the SDK to start off thinking the sectors were blank at boot. I also added an example/test sketch for exercising the feature. It is OTAEraseConfig.
Just to clarify, that only affects OTA to or from SDK 3.x.x? Or some settings (rfcal or credentials format) are incompatible even between 2.2.x versions? |
@mhightower83 In the latter case, we'd better rely on @mcspr |
@mcspr |
The problem with In effect, as the SDK starts up, by the time of the 2nd That said, I think I noticed a download result, that implied the serial download tool was changing the IDE configured value to the real chip size on the fly. |
According to your findings, My concern is about the API calls: |
It looks like the My take or interpretation so far is that
I see
Using the RTOS SDK for inspiration, if I am reading this right, it would appear the command issues a flash chip command to read an ID from the actual flash chip. |
What do you think of reading the first byte of flash at boot and then, if it does not reflects what We would have
|
Currently, I see the following when building and downloading the example sketch with the generic option selected with Flash information reported after using
Flash information reported after using
So currently it appears the I am concerned about the sealed-up devices that cannot be reflashed serially. Maybe there is a need to respect the original value of legacy devices. An old device may not expect the upgrade in size. The location of the EEPROM will move. We could copy it; however, if there are objects inside, pointing to places in flash, that could create an issue. And there are some more fancy EEPROM libraries out there in use that may wonder where their data went. If we could somehow increase the size and keep the old pointer offsets until a sketch could handle the migration to a larger size. Migration, referring to moving of EEPROM and SPIFFs, etc. to their new home. |
In case anyone is trying this, there appears to be a problem with the linker not using the erase_config module. It appears in my effort to reduce the code down to what was needed, I took away the one direct reference to this module that caused it to be linked in. There is also an issue of Soft WDT when doing a system_restart() with method 1. It appears I have to keep the call to __real_system_restart_local() instead. I'll try and sprinkle in some printf's I think I can make those work now to a degree. Before, all the code reduction, I was using a hack together deferred print library to get a peek of what was going on during this garbled print period. |
And code to ensure the UART speed stayed at 74880 for a more complete viewing of messages. Well at least, it stays at 74880 for a longer time than it was. Added pinMode filter to prevent UART TX pin from being disabled during user_init().
This is just a PoC an alternate for preinit() can be created later.
int8_t rssi_max; | ||
int8_t rssi_min; | ||
int8_t rssi; | ||
} wifi_health_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't these members initialized by default?
Some of them need to be initialized at values other than 0 (e.g. the "min" values and the "rssi_max"), or else it is quite easy to make mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing!
C++ is still new to me. My experience lies with C. I didn't realize you could now initialize inside a structure under C++. I know classes and structures have strong similarities. I have never seen a structure initializing elements directly before. I searched and found some examples. It certainly makes complex initialization like this easier. Thanks for the pointer.
int32_t rssi = WiFi.RSSI(); | ||
if (rssi < 10) { | ||
wifiHealth.rssi_max = max((int8_t)rssi, wifiHealth.rssi_max); | ||
wifiHealth.rssi_min = min((int8_t)rssi, wifiHealth.rssi_min); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here an example of things that may to wrong if the wifiHealth members aren't properly initialized.
In this example the struct is initialized, but that's easy to overlook when working with examples.
advange of C++ element initialization options.
method of handling direct ROM calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little puzzled overall with this. It seems to go to a lot of trouble to overwrite a sector in flash with new data after an OTA, and provides 3 ways of doing it only configurable via core file change/rebuild.
Wouldn't it be faster/safer/easier to just have that as part of the eboot command/work? eboot already can erase/write the entire flash, so it feels like another bit in the command there and maybe an address passed in eboot_cmd[xx] might be enough to handle this, no?
@@ -325,6 +325,36 @@ FlashMode_t EspClass::getFlashChipMode(void) | |||
return mode; | |||
} | |||
|
|||
#ifdef ERASE_CONFIG_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the Arduino.h
change, ERASE_CONFIG_H
seems like it is always defined. Therefore, can we remove the #if/#else/#endif blocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Back when I did this, I was asked to not add it to eboot because there was very little space left and other activities that needed the space. So, I tried to get the task done before the SDK started. There are three methods because I don't like any of them. Since it is a PoC I kept all the 3 ideas as methods to pick from. I was hoping for some indication of which one might be better.
RE: ERASE_CONFIG_H is more for development. I left it in since this is a PoC and it makes it clear where code is changing. Since the Arduino IDE does not have a flexible way of setting a global, it also makes it easy to remove the feature by editing Arduino.h.
@@ -62,6 +62,11 @@ void UpdaterClass::_reset() { | |||
} | |||
|
|||
bool UpdaterClass::begin(size_t size, int command, int ledPin, uint8_t ledOn) { | |||
#if defined(ERASE_CONFIG_H) && !defined(HOST_MOCK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERASE_CONFIG_H always defined, drop the condition (leaving host_mock)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
@@ -5,6 +5,7 @@ | |||
#include <flash_utils.h> | |||
#include <MD5Builder.h> | |||
#include <functional> | |||
#include <erase_config.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arduino.h already includes this file always, now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
@@ -65,6 +66,13 @@ class UpdaterClass { | |||
*/ | |||
bool begin(size_t size, int command = U_FLASH, int ledPin = -1, uint8_t ledOn = LOW); | |||
|
|||
#ifdef ERASE_CONFIG_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same #if comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
@@ -199,6 +207,9 @@ class UpdaterClass { | |||
int _ledPin = -1; | |||
uint8_t _ledOn; | |||
|
|||
#ifdef ERASE_CONFIG_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
#define ERASE_CFG__FIX_DIVIDER() do {} while(0) | ||
#endif | ||
|
||
#ifdef ERASE_CONFIG_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always true #if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
|
||
*/ | ||
|
||
#define ERASE_CONFIG_METHOD 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in general seems like this is not the right way to handle it because ytou need to edit core files and rebuild on a case-by-case basis.
if I understand correctly, this is equivalent to 1, 2, or 3 described above. Right now it's always doing 3). If 3) is always good, why support the other 2 methods (i.e. not seeing value in the extra code to do the same thing 2 other ways).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explained above
*/ | ||
if (0U == ebcmd->magic && | ||
0U == ebcmd->crc32 && | ||
ACTION_COPY_RAW == ebcmd->action && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the RTC and (presently) unused bits of the eboot_cmd the right spot for these options? We're not in eboot anymore. when this code runs. Also, we're moving to eboot-cmd in flash which would end up wiping the whole cmd anyway (flash erase once update done). I don't have the right answer yet...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot has changed in eboot since this PoC was submitted. I have been waiting for the dust to settle on the recent eboot activities. I agree eboot is the right place for this if you have the room to accommodate. These ugly options were the best alternative I could come up with.
ERASE_CFG__ETS_FLUSH(0); | ||
} | ||
|
||
#if 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constant ifdef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not sure I get why you're overloading pinMode as I don't see it used anywhere in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug code to preserve UART operation during the SDK/Core startup so a few more debug messages could print out. Without it, the UART was killed and some messages garbled.
extern cont_t* g_pcont; | ||
extern "C" void call_user_start(); | ||
|
||
extern "C" void IRAM_ATTR app_entry_redefinable(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the other methods probably need the "noextra4k" variants as well.
…anges. Fixed missed edit (rom_uart_div_modify) for debug build.
It sounds like the PoC has outlived its usefulness if it had any. |
@mhightower83 To be clear, the principle of what you accomplished here is absolutely necessary. What isn't clear is the approach. |
I'm keen to finalise #6538 is it worth my working through this PR to see how they might mesh? |
@davisonja yes, I think you're the person best suited to assess whether #6538 and this eboot idea would work together. I think we would need:
Points 1 and 2 should be easy enough. With those we will know how much space is left in the eboot sector. I think that the eboot alternative to this PR would need: I'm not sure who should implement points A or B, I think that's up to you both 😛 |
…o pr-erase-config
Status: FLASH_MAP_SUPPORT has not been properly incorporated.
Issue: Sometimes when an ESP8266 is reflashed/upgraded the WiFi does not work.
Then a serial flash with Erase Flash with WiFi setting is recommended. I have
seen this more often when changing SDK by OTA. We don't have an erase WiFi for
OTA.
This PR should not be merged. It presents three Proof of Concept solutions. The intent is that one of these solutions would be chosen, developed further, and incorporate in a new PR.
There are 3 cases to consider when the firmware is updated by OTA. The new firmware:
In theory after an OTA and before a restart, the flash could be erased for
case 1. Case 2 is a problem because the size exceeds the size specified in
flashchip. That size is used by SPIEraseSector to validate the callers request
and fails when too large. We have to wait for a restart, after which the SDK
will have updated the values in
flashchip
. Case 3 is potentially unsafebecause we could be erasing the code that is running.
At
app_entry()
flashchip
properties appear to be reset to a default 4MByteflash. Even an ESP8285 reported 4MByte. The value of
flashchip->chip_size
waschanged at the 2nd SPIRead by the SDK. The 1st read was to address 0, 4 bytes.
To erase the flash WiFi area for methods 1 and 2, I choose to wait until the SDK
has finished its adjustments to the
flashchip
structure. Then to begin erasingWiFi sectors. Method 3 runs before the SDK starts. It temporarily updates
flashchip->chip_size
. Then does the erase and puts everything back beforestarting the SDK.
Summary of the three methods:
The first runs after the SDK calls
user_init()
and flash code execution isavailable (No IRAM needed), but restarts to be sure the SDK does not get
confused about its sectors being erased.
The 2nd method runs as early as possible requires IRAM. The sectors are
erased before the 2nd read is processed. This allows the SDK to start off
thinking the sectors were blank at boot.
Similar to method 2 runs as early as possible, turns on flash code execution
so more of the initialization code can be moved to flash. Directly modifies the
size element in
flashchip
structure in ROM data, dRAM. This allows the flasherase to succeed.
The original flash size is restored before starting the SDK with the assumption
that the SDK will handle the size change properly. It should be noted that
only changing the size value in the
flashchip
structure, is equivalent to what theesptool.py is doing, when it flashes the firmware.
I also added an example/test sketch for exercising the feature. It is
OTAEraseConfig. It also gathers some WiFi signal/connection statistics.
Edited: Added Method 3, corrected some grammar.