-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Applications using wgpu
hang forever on bleeding edge Linux with Nvidia drivers 545.29.06 on GNOME / Wayland
#4775
Comments
This is probably a duplicate of #4689, but I'm just gonna add what I've found so far here: This is where we hang forever: wgpu/wgpu-hal/src/vulkan/instance.rs Line 960 in ebcfd25
Out of curiosity, I added some instrumentation: let fences = &[sc.fence];
unsafe {
let status = sc.device.raw.get_fence_status(sc.fence)
.map_err(crate::DeviceError::from)?;
println!("wait: {}", status);
sc.device.raw.wait_for_fences(fences, true, !0)
.map_err(crate::DeviceError::from)?;
sc.device.raw.reset_fences(fences).map_err(crate::DeviceError::from)?
} It seems to hang (all though checking the status is racy) when the fence is not already signaled:
Note that vulkan-tools |
This article also seems to suggest that using timeline semaphores are recommended over fences for host synchronization, so it might still be a worthwhile change in wgpu: |
Using a semaphore works for me, all though the patch I wrote isn't pretty. Preferably it should be used to wait for when submitting a command buffer. |
Thanks for the investigation into this!
Fences should still work. Either way you can't use timeline semaphores for swapchain stuff, you can only use binary semaphores. Does vkcube break if converted to wait for a fence? This sounds like this is a driver bug and needs to be reported to nvidia. |
So this is the patch I'm using on vulkan-tools.
I'm currently not able to reproduce it with
It's hard to say why. If someone has some other code they'd like me to run, I'd be happy to. |
Sounds good, any idea where? In the meanwhile since I'm not super familiar with |
I am not familiar enough with Vulkan to know what the best thing to do here is, but the Nvidia driver does seem to be violating this rough guarantee of the Vulkan spec:
So unless wgpu is violating the valid usages of the API (and thus triggered undefined behavior), calling So it seems fair to say this is at least partly a driver bug. |
@ids1024 That the scenario cited is about what should happen during a device loss, which is something different from what happens here. I don't know this for sure, but my current understanding is that the spec doesn't guarantee when the fence should be signaled, because the presentation engine might opt to hold onto the swapchain image for as long as it wants to. Which here seem to be up until a new frame is being submitted or presented. Android apparently does something like that so that it can use the swapchain image for things between render calls. At least that is my conclusion from a careful read of the spec regarding the relevant functions. That doesn't mean Nvidia might not still be interested in fixing it. That being said, the vast majority of applications do what I've proposed in #4967 so we probably just want to do that as well to avoid problems. |
Ah, I guess the line above that says the "return in finite time" guarantee is about device loss, so there's no mentioned guarantee it won't block indefinitely in other circumstances. |
|
I reported this issue direct to an nvidia linux driver dev |
for the record: |
Hi all! This is supposedly fixed in Nvidia driver 550.67 as can be seen in the driver release notes, and I can confirm that it works with my personal project using wgpu. |
I just checked (Gnome 46, Wayland, and Nvidia 550.67) and the problem is gone for me! |
sounds great! closing this as fixed then until we have new reason to believe otherwise |
Repro steps
Running anything which tries to use wgpu Vulkan, like:
The window starts and renders at least one frame, but becomes completely non-interactive (windows can't be interacted with or moved) and you receive a "hanging" prompt from GNOME:
Note that I think this might legitimately be a platform issue, however:
vkcube
(X11) andvkcube-wayland
which reports and runs (see below).Screencast.from.2023-11-25.18-23-21.webm
So
wgpu
is currently the lowest level of abstraction I've chased down.Platform
Log output from running the example:
uname -r
:The text was updated successfully, but these errors were encountered: