Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault when enabling EGL hardware-acceleration and stopping the wayland loop #1443

Open
m4rch3n1ng opened this issue Jun 3, 2024 · 5 comments

Comments

@m4rch3n1ng
Copy link
Contributor

m4rch3n1ng commented Jun 3, 2024

trying to spawn smallvil with --comannd "kitty" (or alacritty) doesn't actually spawn them, but instead gives me this error

image

i found out that to fix that, i have to enable EGL hardware-acceleration like i did in m4rch3n1ng@0d083cf.

now, kitty and alacritty spawn just fine, but closing the winit window will now cause a segfault:

image
image

for me this also happens in my own compositor and in niri, but i can't get it to reproduce in anvil and i don't know how to fix it.

i don't know where exactly this issue lays, but since i don't actually use unsafe there, i don't think it should segfault.

the segfault happens both when closing the winit window using toplevel.send_close() and stopping the event loop using loop_signal.stop(), with a winit and a udev backend and in both release and debug mode.

@YaLTeR
Copy link
Contributor

YaLTeR commented Jun 4, 2024

From what I've noticed, it happens when you did renderer.bind_wl_display() and have an EGL client running when closing the compositor. A dmabuf client does not cause segfaults, which is likely why it doesn't repro in anvil (which has dmabuf feedback implemented in winit).

Not unlikely that this has something to do with drop order, but I'm pretty sure I tried different drop orders a while ago and they all caused this segfault.

@m4rch3n1ng
Copy link
Contributor Author

it happens when you did renderer.bind_wl_display() and have an EGL client running when closing the compositor

that's the most consistent way i can reproduce this segfault in my own compositor too, opening weston-terminal and closing the compositor doesn't cause the segfault, but with kitty it does, but in the reproduction with smallvil i did (m4rch3n1ng@8ad0494) it also segfaults with with weston-terminal and even when not opening anything at all, so it's even weirder there

@m4rch3n1ng
Copy link
Contributor Author

i don't know how helpful that is, but you can "fix" (it doesn't happen anymore) the segfault by leaking a reference to the DisplayHandle, so you are probably correct with the drop order being at fault.

diff --git a/smallvil/src/main.rs b/smallvil/src/main.rs
index 6c4c46e33f..78e0f7aaea 100644
--- a/smallvil/src/main.rs
+++ b/smallvil/src/main.rs
@@ -29,6 +29,10 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
 
     let display: Display<Smallvil> = Display::new()?;
     let display_handle = display.handle();
+
+    let _display_handle = display_handle.clone();
+    let _leak = Box::leak(Box::new(_display_handle));
+
     let state = Smallvil::new(&mut event_loop, display);
 
     let mut data = CalloopData {

i don't know how to change the drop order, so i'm sorry for not being able to provide more, but i'll try to get back to it some other time i guess

@m4rch3n1ng
Copy link
Contributor Author

did some more testing, and found out that the segfault in my smallvil reproduction example does not happen at the same place as in mayland and niri.

using rust-gdb the segfault in smallvil happens at

ffi::egl::UnbindWaylandDisplayWL(**self.display, wayland as _);
.

you can properly fix the segfault by dropping the event_loop before the data is dropped.

diff --git a/smallvil/src/main.rs b/smallvil/src/main.rs
index 6c4c46e33f..d3ec176df3 100644
--- a/smallvil/src/main.rs
+++ b/smallvil/src/main.rs
@@ -55,5 +55,7 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
         // Smallvil is running
     })?;
 
+    drop(event_loop);
+
     Ok(())
 }

in niri and in my own compositor the segfault happens at https://github.com/Smithay/wayland-rs/blob/48e74e9ef497d62b770b9b862ee4cb0876dc2494/wayland-backend/src/sys/server_impl/mod.rs#L434

and dropping the event_loop earlier doesn't do anything.

@m4rch3n1ng
Copy link
Contributor Author

i did a bisection on my own compositor when the segfault first appears in m4rch3n1ng/mayland@85c9e66 it also happens at

ffi::egl::UnbindWaylandDisplayWL(**self.display, wayland as _);
, but when i drop the event_loop earlier it doesn't fix it, instead it happens where it happens now.

a commit later, after a refactor with winit in m4rch3n1ng/mayland@e1d9046 it just happens at https://github.com/Smithay/wayland-rs/blob/48e74e9ef497d62b770b9b862ee4cb0876dc2494/wayland-backend/src/sys/server_impl/mod.rs#L434 now and hasn't changed since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants