Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miri fails when unwinding in safe code #6603

Closed
JoJoJet opened this issue Nov 14, 2022 · 10 comments
Closed

Miri fails when unwinding in safe code #6603

JoJoJet opened this issue Nov 14, 2022 · 10 comments
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior D-Complex Quite challenging from either a design or technical perspective. Ask for help! P-Unsound A bug that results in undefined compiler behavior
Milestone

Comments

@JoJoJet
Copy link
Member

JoJoJet commented Nov 14, 2022

Bevy version

f7c8eb7

What you did

I created this test:

#[test]
#[should_panic]
fn ub_find() {
    #[derive(Resource, Default)]
    struct SyncCounter(RwLock<()>);

    fn assert_non_sync(counter: Res<SyncCounter>) {
        // Panic if multiple instances of this system try to access `counter` at once.
        let _guard = counter.0.try_write().unwrap();

        // Make sure other systems have a chance to conflict with this one before dropping the guard.
        std::thread::sleep(std::time::Duration::from_millis(100));
    }

    let mut world = World::default();
    world.init_resource::<SyncCounter>();

    let mut stage = SystemStage::parallel()
        .with_system(assert_non_sync)
        .with_system(assert_non_sync)
        .with_system(assert_non_sync);

    stage.run(&mut world);
}

What were you expecting

The multiple instances of assert_non_sync should conflict with each other and safely cause a panic.

What went wrong

The following miri failure is triggered: https://github.com/JoJoJet/bevy/actions/runs/3457086747/jobs/5770279259

Additional information

I investigated a bit further on this branch. The test seems to randomly not fail sometimes, but eventually this failure was triggered:

https://github.com/JoJoJet/bevy/actions/runs/3457314695/jobs/5770662745

Some theories from @BoxyUwU https://discord.com/channels/691052431525675048/749335865876021248/1041500126880989184.

@JoJoJet JoJoJet added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Nov 14, 2022
@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events D-Complex Quite challenging from either a design or technical perspective. Ask for help! P-Unsound A bug that results in undefined compiler behavior and removed S-Needs-Triage This issue needs to be labelled labels Nov 14, 2022
@hymm
Copy link
Contributor

hymm commented Nov 14, 2022

can you test if this PR helps? #6524

@JoJoJet
Copy link
Member Author

JoJoJet commented Nov 14, 2022

It seems to. Hopefully miri isn't just spuriously passing again.

@james7132
Copy link
Member

error: Undefined Behavior: not granting access to tag <41945721> because that would remove [Unique for <41937186>] which is protected because it is an argument of call 11782684
    --> /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     |
490  | pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) {
     | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not granting access to tag <41945721> because that would remove [Unique for <41937186>] which is protected because it is an argument of call 11782684
     |
     = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
     = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <41945721> was created by a SharedReadWrite retag at offsets [0xc8..0xe0]
    --> crates/bevy_ecs/src/lib.rs:1200:5
     |
1200 |     }
     |     ^
help: <41937186> is this argument
    --> crates/bevy_ecs/src/system/function_system.rs:396:26
     |
396  |     unsafe fn run_unsafe(&mut self, input: Self::In, world: &World) -> Self::Out {
     |                          ^^^^^^^^^
     = note: BACKTRACE:
     = note: inside `std::ptr::drop_in_place::<std::borrow::Cow<'_, str>> - shim(Some(std::borrow::Cow<'_, str>))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<system::function_system::SystemMeta> - shim(Some(system::function_system::SystemMeta))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<system::function_system::FunctionSystem<(), (), (system::system_param::Res<'_, tests::ub_find::SyncCounter>,), (), for<'a> fn(system::system_param::Res<'a, tests::ub_find::SyncCounter>) {tests::ub_find::assert_non_sync}>> - shim(Some(system::function_system::FunctionSystem<(), (), (system::system_param::Res<'_, tests::ub_find::SyncCounter>,), (), for<'a> fn(system::system_param::Res<'a, tests::ub_find::SyncCounter>) {tests::ub_find::assert_non_sync}>))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<std::boxed::Box<dyn system::system::System<Out = (), In = ()>>> - shim(Some(std::boxed::Box<dyn system::system::System<Out = (), In = ()>>))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<schedule::system_container::SystemContainer> - shim(Some(schedule::system_container::SystemContainer))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<[schedule::system_container::SystemContainer]> - shim(Some([schedule::system_container::SystemContainer]))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `<std::vec::Vec<schedule::system_container::SystemContainer> as std::ops::Drop>::drop` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:3029:13
     = note: inside `std::ptr::drop_in_place::<std::vec::Vec<schedule::system_container::SystemContainer>> - shim(Some(std::vec::Vec<schedule::system_container::SystemContainer>))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<schedule::stage::SystemStage> - shim(Some(schedule::stage::SystemStage))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
note: inside `tests::ub_find` at crates/bevy_ecs/src/lib.rs:1200:5
    --> crates/bevy_ecs/src/lib.rs:1200:5
     |
1200 |     }
     |     ^
note: inside closure at crates/bevy_ecs/src/lib.rs:1179:18
    --> crates/bevy_ecs/src/lib.rs:1179:18
     |
1177 |     #[test]
     |     ------- in this procedural macro expansion
1178 |     #[should_panic]
1179 |     fn ub_find() {
     |                  ^
     = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

For posterity, here's the miri output, since GitHub Action's logs are TLL'ed.

@hymm
Copy link
Contributor

hymm commented Nov 14, 2022

I wonder if the issue is that we should be using FallibleTask rather than Task in TaskPool::scope. So that when the scope gets dropped we can cancel().await the tasks instead of just having them just doing set_cancel() when the task is dropped.

@JoJoJet
Copy link
Member Author

JoJoJet commented Nov 17, 2022

Looks like miri still fails periodically, even with #6524: https://github.com/JoJoJet/bevy/actions/runs/3484882520/jobs/5829897858

error: Undefined Behavior: Data race detected between Deallocate on thread `tests::ub_find` and Read on thread `TaskPool (0)` at alloc18680771+0x8
    --> /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:113:14
     |
113  |     unsafe { __rust_dealloc(ptr, layout.size(), layout.align()) }
     |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between Deallocate on thread `tests::ub_find` and Read on thread `TaskPool (0)` at alloc18680771+0x8
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: BACKTRACE:
     = note: inside `std::alloc::dealloc` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:113:14
     = note: inside `<std::alloc::Global as std::alloc::Allocator>::deallocate` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:250:22
     = note: inside `downcast_rs::alloc::alloc::box_free::<dyn system::system::System<In = (), Out = ()>, std::alloc::Global>` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:348:9
     = note: inside `std::ptr::drop_in_place::<std::boxed::Box<dyn system::system::System<In = (), Out = ()>>> - shim(Some(std::boxed::Box<dyn system::system::System<In = (), Out = ()>>))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<schedule::system_container::SystemContainer> - shim(Some(schedule::system_container::SystemContainer))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<[schedule::system_container::SystemContainer]> - shim(Some([schedule::system_container::SystemContainer]))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `<std::vec::Vec<schedule::system_container::SystemContainer> as std::ops::Drop>::drop` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:[302](https://github.com/JoJoJet/bevy/actions/runs/3484882520/jobs/5829897858#step:6:303)9:13
     = note: inside `std::ptr::drop_in_place::<std::vec::Vec<schedule::system_container::SystemContainer>> - shim(Some(std::vec::Vec<schedule::system_container::SystemContainer>))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
     = note: inside `std::ptr::drop_in_place::<schedule::stage::SystemStage> - shim(Some(schedule::stage::SystemStage))` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:490:1
note: inside `tests::ub_find` at crates/bevy_ecs/src/lib.rs:1225:5
    --> crates/bevy_ecs/src/lib.rs:1225:5
     |
1225 |     }
     |     ^
note: inside closure at crates/bevy_ecs/src/lib.rs:1201:18
    --> crates/bevy_ecs/src/lib.rs:1201:18
     |
1199 |     #[test]
     |     ------- in this procedural macro expansion
1200 |     #[should_panic]
1201 |     fn ub_find() {
     |                  ^
     = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

@hymm
Copy link
Contributor

hymm commented Nov 17, 2022

My next guess to why this is happening is that the Task abstraction just asks the future to cancel when it is dropped, but doesn't wait for the future to cancel. We might need to use FailableTask instead and explicitly wait for them to cancel when the scope is dropped.

@james7132 james7132 removed this from the 0.9.1 milestone Nov 17, 2022
@hymm
Copy link
Contributor

hymm commented Nov 19, 2022

can you test if this new PR helps? #6696

edit: looks like I've got some issue's to fix before this is ready. Fixed

@JoJoJet
Copy link
Member Author

JoJoJet commented Nov 20, 2022

Seems like that fixed the unsoundness (miri passed 10 times).

This test still fails inconsistently. It's supposed to panic, but sometimes it doesn't. I don't know if the panic is getting swallowed up, or if the test itself is just inconsistent.

@hymm
Copy link
Contributor

hymm commented Nov 21, 2022

would a https://doc.rust-lang.org/std/sync/struct.Barrier.html work better here than a sleep?

Might be getting swallowed, if we're not propagating errors correctly anymore. The code for that in scope has significantly changed behavior with these PR's.

@JoJoJet
Copy link
Member Author

JoJoJet commented Nov 21, 2022

Didn't know about that type, TIL.

After investigating, I think it's just that the test is inconsistent when run on CI. If the test is only given one thread, then it won't panic as expected and the test fails. This became more obvious after I changed it to use a Barrier, since it started deadlocking instead of just silently failing. I don't believe any panics are getting swallowed here.

All this to say, I believe that #6696 fixes this issue.

@james7132 james7132 added this to the 0.9.1 milestone Nov 22, 2022
bors bot pushed a commit that referenced this issue Nov 23, 2022
# Objective

- Fixes #6603

## Solution

- `Task`s will cancel when dropped, but wait until they return Pending before they actually get canceled. That means that if a task panics, it's possible for that error to get propagated to the scope and the scope gets dropped, while scoped tasks in other threads are still running. This is a big problem since scoped task can hold life-timed values that are dropped as the scope is dropped leading to UB.

---

## Changelog

- changed `Scope` to use `FallibleTask` and await the cancellation of all remaining tasks when it's dropped.
@bors bors bot closed this as completed in 8eb8ad5 Nov 23, 2022
cart pushed a commit that referenced this issue Nov 30, 2022
# Objective

- Fixes #6603

## Solution

- `Task`s will cancel when dropped, but wait until they return Pending before they actually get canceled. That means that if a task panics, it's possible for that error to get propagated to the scope and the scope gets dropped, while scoped tasks in other threads are still running. This is a big problem since scoped task can hold life-timed values that are dropped as the scope is dropped leading to UB.

---

## Changelog

- changed `Scope` to use `FallibleTask` and await the cancellation of all remaining tasks when it's dropped.
ItsDoot pushed a commit to ItsDoot/bevy that referenced this issue Feb 1, 2023
# Objective

- Fixes bevyengine#6603

## Solution

- `Task`s will cancel when dropped, but wait until they return Pending before they actually get canceled. That means that if a task panics, it's possible for that error to get propagated to the scope and the scope gets dropped, while scoped tasks in other threads are still running. This is a big problem since scoped task can hold life-timed values that are dropped as the scope is dropped leading to UB.

---

## Changelog

- changed `Scope` to use `FallibleTask` and await the cancellation of all remaining tasks when it's dropped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior D-Complex Quite challenging from either a design or technical perspective. Ask for help! P-Unsound A bug that results in undefined compiler behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants