Add new function `esp_hal::try_init` #2618

speelbarrow · 2024-11-27T14:55:30Z

Thank you for your contribution!

We appreciate the time and effort you've put into this pull request.
To help us review it efficiently, please ensure you've gone through the following checklist:

Submission Checklist 📝

I have updated existing examples or added new ones (if applicable).
I have used cargo xtask fmt-packages command to ensure that all changed code is formatted correctly.
My changes were added to the CHANGELOG.md in the proper section.
I have added necessary changes to user code to the Migration Guide.
My changes are in accordance to the esp-rs API guidelines

Extra:

I have read the CONTRIBUTING.md guide and followed its instructions.

Pull Request Details 📖

Description

Adds a new function esp_hal::try_init. This function initializes the system only if it hasn't been already, then steals the peripherals and returns them.

The main use case for this functionality is for panic handlers that require peripheral access (e.g. to notify the user that a panic has occurred on a hardware display). The panic handler should not rely on the prior initialization of the system. However, if it has not yet been initialized, stealing the peripherals is insufficient. In contrast, trying to re-initialize the system from inside the panic handler causes the panic handler to get caught in a death loop if the main program has already called init. The try_init function will check if the system has already been initialized and, if not, will call init_internal (where I have moved the implementation of init, aside from the taking of the peripherals). The output is always the stolen peripherals, and as such the function is marked as unsafe.

Previously, Peripherals::take() would unsafely check an unmangled static mut bool. This PR replaces that with an AtomicBool to ensure (attempt) safety.

Testing

Some simple tests have been added to hil-test, but I cannot verify its success as I do not own a JTAG-compatible board.

In lieu of this, I wrote a small crate to verify my changes worked as expected. The board used is the Wemos LOLIN32 with a built-in SSD1306 OLED display (unofficial info about the board can be found here, I cannot find any reference to it from Wemos themselves). The ESP module on the board is an ESP32-WROOM-32.

The src/main.rs (contents below) file contains several commented-out main functions with explanations as to what each one is for. Each one was flashed to the board and executed separately and successfully. I will continue to try to implement similar tests to the main functions above in hil-test, but I'm still working out how I could do so in a board-agnostic fashion.

#![no_std]
#![no_main]
#![allow(unused)]
#![feature(future_join)]

use core::{fmt::Write, future::join};
use embassy_time::Delay;
use embedded_hal_async::delay::DelayNs;
use esp_backtrace as _;
use esp_hal::{entry, gpio::GpioPin, i2c::master::I2c, peripherals::I2C0, timer::timg::TimerGroup};
use esp_println::println;
use ssd1306::{
    mode::{DisplayConfig, DisplayConfigAsync},
    rotation::DisplayRotation,
    size::DisplaySize128x64,
    I2CDisplayInterface, Ssd1306, Ssd1306Async,
};

// This is the use case for the API proposed in this PR.
#[no_mangle]
fn custom_pre_backtrace() {
    let mut display = {
        let peripherals = unsafe { esp_hal::try_init(Default::default()) };

        Ssd1306::new(
            I2CDisplayInterface::new(
                I2c::new(peripherals.I2C0, Default::default())
                    .with_sda(peripherals.GPIO5)
                    .with_scl(peripherals.GPIO4),
            ),
            DisplaySize128x64,
            DisplayRotation::Rotate0,
        )
        .into_terminal_mode()
    };

    _ = display.init();
    _ = display.clear();

    _ = display.write_str("PANICKED\nConnect USB for details");
}

/*
Used between executions of other `main` functions in this file, otherwise display state may persist between flashings.

#[entry]
fn main() -> ! {
    let peripherals = esp_hal::init(Default::default());
    let mut display = Ssd1306::new(
        I2CDisplayInterface::new(
            I2c::new(peripherals.I2C0, Default::default())
                .with_sda(peripherals.GPIO5)
                .with_scl(peripherals.GPIO4),
        ),
        DisplaySize128x64,
        DisplayRotation::Rotate0,
    )
    .into_terminal_mode();
    display.init().unwrap();
    display.clear().unwrap();

    println!("done");

    loop {}
}
*/

/*
Panic after initialization

#[entry]
fn main() -> ! {
    const MESSAGE: &'static str = "Hello, World!";

    let peripherals = esp_hal::init(Default::default());
    let delay = esp_hal::delay::Delay::new();

    let mut display = Ssd1306::new(
        I2CDisplayInterface::new(
            I2c::new(peripherals.I2C0, Default::default())
                .with_sda(peripherals.GPIO5)
                .with_scl(peripherals.GPIO4),
        ),
        DisplaySize128x64,
        DisplayRotation::Rotate0,
    )
    .into_terminal_mode();
    display.init().unwrap();

    display.clear().unwrap();
    for character in MESSAGE.chars() {
        display.print_char(character).unwrap();
        delay.delay_millis(100);
    }

    panic!("A panic message");
}
*/

/*
Panic without initialization

#[entry]
fn main() -> ! {
    panic!("A panic message")
}
*/

/*
Demonstrates that:
- `init` still behaves the same as it did before (i.e. panics if you try to call it more than once)
- When the second call to `init` panics, the panic handler still operates as expected

#[entry]
fn main() -> ! {
    esp_hal::init(Default::default());
    esp_hal::init(Default::default());
    unreachable!()
}
*/

/*
Panic after initialization in an asynchronous context

#[esp_hal_embassy::main]
async fn main(_spawner: embassy_executor::Spawner) {
    const MESSAGE: &'static str = "Hello, World!";

    let peripherals = esp_hal::init(Default::default());

    esp_hal_embassy::init(TimerGroup::new(peripherals.TIMG0).timer0);

    let mut delay = Delay;

    let mut display = Ssd1306Async::new(
        I2CDisplayInterface::new(
            I2c::new(peripherals.I2C0, Default::default())
                .with_sda(peripherals.GPIO5)
                .with_scl(peripherals.GPIO4)
                .into_async(),
        ),
        DisplaySize128x64,
        DisplayRotation::Rotate0,
    )
    .into_terminal_mode();
    display.init().await.unwrap();

    display.clear().await.unwrap();
    for character in MESSAGE.chars() {
        display.print_char(character).await.unwrap();
        delay.delay_ms(100).await;
    }

    panic!("A panic message");
}
*/

/*
Panic without initialization in an asynchronous context

#[esp_hal_embassy::main]
async fn main(_spawner: embassy_executor::Spawner) {
    panic!("A panic message")
}
*/

MabezDev · 2024-11-27T17:38:09Z

Thanks for the PR! What is the use case for this API?

speelbarrow · 2024-11-27T18:25:55Z

Thanks for the PR! What is the use case for this API?

Still working on getting the functionality exactly how I want it, will update the thread shortly!

speelbarrow · 2024-11-27T18:44:57Z

Thanks for the PR! What is the use case for this API?

Still working on getting the functionality exactly how I want it, will update the thread shortly!

@MabezDev

Done! I'm ready to move this out of drafts, but please let me know if I need to implement more hil-tests before doing so.

esp-hal/src/peripheral.rs

esp-hal/src/lib.rs

esp-hal/src/peripheral.rs

esp-hal/src/lib.rs

hil-test/Cargo.toml

hil-test/tests/init.rs

MabezDev · 2024-11-28T10:53:54Z

Previously, Peripherals::take() would unsafely check an unmangled static mut bool. This PR replaces that with an AtomicBool to ensure (attempt) safety.

Is there a specific issue you were running into where using Peripherals::steal() in the panic handler didn't work? Or is this strictly an attempt to make esp_hal::init safer?

I have some thoughts on this, and sorry for the questions, I just need to understand the rationale for the PR.

speelbarrow · 2024-11-29T04:47:00Z

@MabezDev

Previously, Peripherals::take() would unsafely check an unmangled static mut bool. This PR replaces that with an AtomicBool to ensure (attempt) safety.

Is there a specific issue you were running into where using Peripherals::steal() in the panic handler didn't work? Or is this strictly an attempt to make esp_hal::init safer?

stealing them without having called init (I haven't bothered to check exactly what part of it) causes a panic*. Calling init after having already called init also causes a panic*, specifically so as to not allow init to steal twice. But the rest of the function runs fine if you run it a second time (on my board. Perhaps there are other cases where it would not work that I haven't considered). The panic handler cannot (afford to) panic in either of those cases. So, once I was suggesting publicizing the "are we init yet?" boolean (albeit in a roundabout way) I figured I better make my implementation as safe as I could.

I have some thoughts on this, and sorry for the questions, I just need to understand the rationale for the PR.

No worries! I'm happy to answer any and all questions.

*or something else that causes the panic handler to spontaneously restart half way through execution

(P.S. sorry, meant to post this reply last night and forgot to hit send)

bugadani · 2024-11-29T08:26:29Z

I still don't understand what you are trying to do. Why is init not the very first thing you call? Or if it is, how exactly are you running into a panicking situation?

jessebraham · 2024-11-29T10:57:51Z

It's not clear to me how calling steal() could possibly result in a panic, having looked at the implementation 🤔

speelbarrow · 2024-11-29T14:46:19Z

I still don't understand what you are trying to do. Why is init not the very first thing you call?

@bugadani Its not that init is not the very first thing I call, its that the panic handler cannot guarantee whether or not it has been called. And I don't think a panic handler should fail based on whether or not init has been called in the main program body.

MabezDev · 2024-11-29T15:15:26Z

panic handler cannot guarantee whether or not it has been called

I don't think we can make any assumptions from the panic handler. Particularly if the panic is a result of UB (stack overflow corrupting some state etc) or a hardware fault and we're panicking from the exception handler. Calling try_init in a system where the INITIALIZED flag has been corrupted by stack overflow would result in init running twice. It's also important to note that try_init isn't a good idea on multicore chips, as the other core can still run whilst the other is in a panicked state and running init twice could interfere with the other core.

I think the best course of action is to assume nothing and pretend to init by calling Peripherals::steal(). I don't think try_init is the silver bullet we hoped it would be here.

Thank you for the PR, I'm inclined to close this based on my reasoning above, but I'll wait a little while to be convinced otherwise if you have some more arguments for it :).

speelbarrow · 2024-11-29T15:19:18Z

It's not clear to me how calling steal() could possibly result in a panic, having looked at the implementation 🤔

@jessebraham It's not steal that leads to the panic, it's I2c::new. The panic does not occur if I2c::new is preceded by a call to init (specifically lines 506-558) at any point during program execution.

What I have introduced here with try_init is a way to call init[506:558] such that:

the code only executes if init (specifically, Peripherals::take) hasn't yet
if init has been executed already, a panic does not occur

speelbarrow · 2024-11-29T15:43:10Z

@MabezDev

Calling try_init in a system where the INITIALIZED flag has been corrupted by stack overflow would result in init running twice.

Calling init as it is implemented now is susceptible to the same issue, since the check relies on a static mut bool. It could (in either case) be resolved by using OnceBool (assuming you trust it to handle corruption helpfully), but that would add an additional dependency which might be undesirable.

It's also important to note that try_init isn't a good idea on multicore chips, as the other core can still run whilst the other is in a panicked state and running init twice could interfere with the other core.

This is why I used an AtomicBool. If try_init ran before init gets called on the second core it would result in only one additional panic (avoiding a death loop). If init on the second core got called first, try_init would see that and function as a wrapper to Peripherals::steal.

I think the best course of action is to assume nothing and pretend to init by calling Peripherals::steal(). I don't think try_init is the silver bullet we hoped it would be here.

I'd be happy to do that if Peripherals::steal() was an effective workaround to the issue I'm facing, but it isn't. The logic I need to (conditionally) execute from the panic handler is everything other than Peripherals::steal in init (lines 506-558).

Thank you for the PR, I'm inclined to close this based on my reasoning above, but I'll wait a little while to be convinced otherwise if you have some more arguments for it :).

If you'd still like to close it, go right ahead. I've already determined that I can implement what I want by taking advantage of the #[no_mangle]d symbol that currently stores the check. This PR would just let me do that cleaner.

speelbarrow force-pushed the try-init branch 3 times, most recently from 58ceaff to 6bfebd8 Compare November 27, 2024 15:05

speelbarrow changed the title ~~Implement try-init feature~~ Add new function try_init Nov 27, 2024

speelbarrow changed the title ~~Add new function try_init~~ Add new function esp_hal::try_init Nov 27, 2024

speelbarrow force-pushed the try-init branch from 6bfebd8 to 99c9ee4 Compare November 27, 2024 18:41

speelbarrow force-pushed the try-init branch from 99c9ee4 to 45cb161 Compare November 27, 2024 18:52