Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan validation error: VUID-vkDestroyFramebuffer-framebuffer-00892 #785

Closed
FlorianUekermann opened this issue Jul 12, 2020 · 21 comments
Closed
Labels
type: bug Something isn't working

Comments

@FlorianUekermann
Copy link

Occasionally (every few hundred frames on average) I get VUID-vkDestroyFramebuffer-framebuffer-00892 validation errors. The timing seems irregular, so I guess something is racing under the hood.

My render loop is quite straighforward (removed error handling):

let frame = swap_chain.lock().await.get_next_frame().unwrap()
...
let mut encoder =  device.create_command_encoder(&wgpu::CommandEncoderDescriptor { label: None });
...
let mut rpass = geometry_pass.begin(&mut encoder, &frame.output.view);
...
let mut rpass = lighting_pass.begin(&mut encoder, &frame.output.view);
...
queue.submit(Some(encoder.finish()));

I played around a little and adding:

std::thread::sleep(std::time::Duration::from_millis(10));
drop(frame);

at the end prevents the validation errors.

@FlorianUekermann FlorianUekermann added the type: bug Something isn't working label Jul 12, 2020
@kvark
Copy link
Member

kvark commented Jul 12, 2020

Would you have a repro case?

@FlorianUekermann
Copy link
Author

I can try to record a trace. Does that help?

@kvark
Copy link
Member

kvark commented Jul 12, 2020

Yes, an API trace would be wonderful, plus your adapter info for the reference (we should include it in the trace).

@FlorianUekermann
Copy link
Author

I attached the vulkaninfo output and the trace with current master (had to compress it twice, because github doesn't like .xz and just using zip makes it 20x larger).

lspci says: Advanced Micro Devices, Inc. [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] (rev c5)
trace.zip
vulkaninfo.txt

@kvark
Copy link
Member

kvark commented Jul 14, 2020

@FlorianUekermann thank you! What wgpu-rs (or wgpu) revision was that from?

@FlorianUekermann
Copy link
Author

wgpu-rs master at time of posting. I updated right before recording. I can't check right now.

@kvark
Copy link
Member

kvark commented Jul 22, 2020

I'm having trouble replaying this API trace due to API mismatches. Could you specify the exact revision (of either wgpu-rs or wgpu) this was taken from?

@kvark
Copy link
Member

kvark commented Jul 22, 2020

After a bit more wrangling, I'm able to replay this API trace. It's panicking on old.is_none() in the HUB registry for buffers. No validation errors before that.

@FlorianUekermann
Copy link
Author

I'll play around with some traces on the weekend. Is it possible that a race during recording has a different outcome when being replayed?

@kvark
Copy link
Member

kvark commented Jul 22, 2020

I don't currently see how this would be possible with regards to the buffer IDs. We record Action::DestroyBuffer right before we free the corresponding ID in the hub, and until we do that nothing can technically allocate this ID for a new buffer.
This is still fairly fresh tech though, so I might be missing something

@FlorianUekermann
Copy link
Author

FlorianUekermann commented Aug 2, 2020

This still happens on latest wgpu-rs master.

I fixed my panic on shutdown and tried again on wgpu-rs master, which uses wgpu b352093 right now. Unfortunately I can't play back the traces with that revision:

$ RUST_LOG=info cargo run --features winit -- ../../../trace/
[0.009707 INFO]()(no module): Loading trace '"../../../trace/"'
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Parser(Eof, Position { col: 1, line: 4780556 })', player/src/bin/play.rs:42:70
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Am I using the player wrong? Its not really a problem right now, so I can try again in a few weeks.

Btw, when shutting down I can also produce the occasional segfault and a few other validation errors. Maybe it is worth mentioning that I am rendering in a separate task, not the winit event loop.

Edit: Another observation - The validation error happens a lot more frequently when I am not recording a trace.

@kvark
Copy link
Member

kvark commented Aug 2, 2020

The trace can't be replayed because it wasn't finished gracefully, since the app has panicked. All you need to finish it is to edit the file by appending "]" to it.

@kvark
Copy link
Member

kvark commented Aug 2, 2020

I wonder if this is just gfx-rs/gfx#3184

@FlorianUekermann
Copy link
Author

All you need to finish it is to edit the file by appending "]" to it.

Ah thanks, now it works. I've tried with a few traces now and I don't get the validation error when playing the trace, even though I get multiple on the original run...

I've been playing around a little more and noticed that I get very noticeable stalls (~1s) right before the validation error prints. introducing a sleep before acquiring a new frame gets rid of the validation error as well as the stalls.

I wonder if this is just gfx-rs/gfx#3184

I hadn't seen that. Seems likely.
There aren't really any examples that are heavy enough to get frames in flight for long. I'll see if I can patch one to reproduce this issue.

@kvark
Copy link
Member

kvark commented Aug 2, 2020

Filed #861 to make this less painful

@FlorianUekermann
Copy link
Author

FlorianUekermann commented Aug 2, 2020

I modified the hello-triangle example to have trivial reproducer for this. If you think a stress test example is generally useful I can send a pull request:
https://github.com/FlorianUekermann/wgpu-rs
I get the error every few ms with: RUST_LOG=info cargo run --features subscriber --example hello-triangle -- --stress

Btw, it took me longer than I care to admit to figure out how to run the example with logs. Maybe that should be documented in a very obvious place (README?).

@kvark
Copy link
Member

kvark commented Aug 2, 2020

Btw, it took me longer than I care to admit to figure out how to run the example with logs. Maybe that should be documented in a very obvious place (README?).

Sorry about that! I agree, we should definitely put it into README. @cwfitzgerald could you make a small PR with this to wgpu-rs, please?

@benfrankel
Copy link
Contributor

benfrankel commented Aug 3, 2020

I'm running into this validation error as well. I see the validation error go away after applying one of the following changes:

  1. Simplify the scene (by rendering fewer instances, or a mesh with fewer vertices).
  2. Add a sleep after queue.submit as mentioned in vulkan validation error: VUID-vkDestroyFramebuffer-framebuffer-00892 #785 (comment) (10ms was too short for me, 30ms worked though). EDIT: Adding the sleep before queue.submit also works for me. The location of the sleep doesn't seem to matter too much. Also it might be relevant that device.poll takes about 30ms for me as well, or maybe that's just a coincidence.
  3. Call device.poll(wgpu::Maintain::Wait) before queue.submit.

I also tried calling device.poll after queue.submit, which did get rid of the validation error mentioned in this issue, but triggered a different one instead: UNASSIGNED-CoreValidation-DrawState-QueueForwardProgress. The result was the same when calling device.poll in a loop in a separate thread. I could make a separate issue for that?

kvark pushed a commit to kvark/wgpu that referenced this issue Jun 3, 2021
785: Update to latest web-sys bindings r=kvark a=grovesNL

We can wait for rustwasm/wasm-bindgen#2482 first

Co-authored-by: Joshua Groves <josh@joshgroves.com>
@AndiHofi
Copy link

AndiHofi commented Jul 10, 2021

I can reproduce this by running the mipmap example.

wgpu on master (e5142b3)

Pop_OS 20.04; Kernel 5.12.12; Navi10 - Radeon RX 5600XT; Vulkan SDK 1.2.182.0; MESA 21

edit: It may be related to the Vulkan driver in MESA.
It is easy to reproduce on my PC that uses Vulkan Instance Version: 1.2.131.

On my notebook I do never get those validation errors. It has a Vulkan Instance Version of 1.2.162

I will try to check it with different Vulkan versions to check my assumption.

@kvark
Copy link
Member

kvark commented Jul 12, 2021

I'm on VVL 1.2.176.0, and not seeing this validation error.
It could be that the error is back on 1.2.182.0, i.e. a regression in VVL?

@cwfitzgerald
Copy link
Member

Closing as out of date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants