-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How not to panic #171
Comments
Good point! For the time being, things should be pretty safe from the moment that all your shaders are correct (and you don't dun out of GPU memory). For dynamic use these panics are indeed not productive at all. Eventually wgpu will be used from the browser, which is also very dynamic - and I'm pretty sure they won't kill the tab whenever there is a shader error :) So in the long run I think there won't be any panics ever. For the time being we'll have to live with them, and during that time wgpu-py will be less suited for interactively writing shaders. |
I've asked upstream to get some more info: gfx-rs/wgpu-native#113 |
So to clearly answer this question in particular:
It seems very very unlikely! However, full disclosure, we, the maintainers of wgpu-py, have to delegate your question upstream, so please keep an eye on the thread @almarklein mentioned. |
I would love to know the progress on this. From the linked issue it looks like there has been some progress. StdErr from Kernel Process thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: [Error { kind: SemanticError("Unknown function 'sdBox'"), meta: Span { start: 2051, end: 2090 } }]', src\conv.rs:461:56 |
We plan to update wgpu-py soon(ish) so it can make use of the new wgpu-native that has implemented the push/pop error scopes. From what I know there are still few situations where wgpu can get into a state where it will panic, but these cases will get more rare over time. |
Will this be a weeks, months or years timeline? If it's unlikely to be done this year, I will have to look for a different solution for my project. |
Taking into account that it's the holiday season my guess would be that we start working on 5-7 weeks from now. Nothing has been done yet, so no dev branch to check. |
I tried my implementation with v0.10.0 but still got into a panic. I will try to run the debugger a bit more and see how the new changes of ErrorHandler work. Since the debugger now jumps farther and gets there. this time I remain optimistic and might even find a solution |
Are you able to reduce your use-case into a minimal standalone example that you could post here, and that we can then run? That might help get more insight or ideas on how to resolve it. |
I have taken a look at the examples and simply took the default Shadertoy shader and made an intentional typo in line 10, writing // Output to screen
--- fragColor = vec4(col,1.0);
+++ fragColor = vec4(coll,1.0); This is I guess the minimal script to run this in wgpu-py. (It works without the typo as execpected) from wgpu.utils.shadertoy import Shadertoy
shader_code = """
void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
// Normalized pixel coordinates (from 0 to 1)
vec2 uv = fragCoord/iResolution.xy;
// Time varying pixel color
vec3 col = 0.5 + 0.5*cos(iTime+uv.xyx+vec3(0,2,4));
// Output to screen
fragColor = vec4(coll,1.0);
}
"""
shader = Shadertoy(shader_code, resolution=(800, 450))
if __name__ == "__main__":
shader.show() instead of raising a ValidationError and getting a message telling me "unknown identifier" (as a similar typo does in the provided examples), you get this panic and the program exists thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: [Error { kind: UnknownVariable("coll"), meta: Span { start: 636, end: 640 } }]', src\conv.rs:557:58
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace` it might be specific to GLSL support using code directly from shadertoy.com wgpu-py/wgpu/backends/rs_helpers.py Line 407 in f2b7709
a helpful first breakpoint is here: wgpu-py/wgpu/utils/shadertoy.py Line 320 in f2b7709
I really hope this helps, I just looked at this on my home PC with an Intel Arc GPU - but had the same issue using an external machine (with Intel GPUs as well). Will try this on my laptop shortly (nvidia gpu). |
To chime in here, "incorrect shaders" (including syntactical and semantic typos which could be caught by some SPIRV-ish toolchain, but also incorrect and mismatched layouts of buffers where the shader is correct in and of itself but wrong in context) is an extremely common condition in any creative coding applications. My shaders are likely "correct" in a professional sense only at some unspecified point before the audience walks into the hall; at my desk they spend more time being incorrect than not. |
the whole use case for me is as part of an evaluation benchmark for code generation language models. One idea I am following is to run the generated code and then evaluate if frames match, another will look at the rendered frames with a CLIP model to see if the rendered frame matches semantically to the input prompt. |
I played around a bit with a few shaders, and I think that errors in shaders written in WGSL result in Python exceptions, whereas errors in GLSL result in a panic. This is unfortunate, but there is not much we can do here. It's simply that they've not made this work correctly yet for GLSL. I presume that WGSL support is of higher priority to the Naga devs than GLSL, but I expect things it will be fixed eventually. You could check the repo to see if there is an issue that already tracks this, and create one if not. (If you know a bit of Rust you might be able to help implement it.) |
To be clear, we're not questioning your need to be able to handle exceptions. We want it too! The simple fact is that wgpu-py is a Python library with Python wrappers for the Rust library wgpu-native. We don't develop wgpu-native. We develop wgpu-py here. The reality is that if a Rust library panics, the Python wrappers can't "undo the panic" and raise a Python exception instead. There's just nothing we can do in that situation. We depend on the Rust library developers to improve wgpu-native, to prevent it from panicking and provide a runtime mechanism to handle the errors that we can utilize in Python wrappers. |
I will dig around for the next few weeks to hopefully find a solution. But already found essentially the same issue tracking here: gfx-rs/wgpu#2545 |
for now I am using a janky solution that works to convince my supervisors. with tempfile.NamedTemporaryFile(suffix=".frag", mode="w") as f:
f.write(frag_shader_code)
f.flush()
try:
subprocess.run(["naga", f.name], check=True, capture_output=True)
except subprocess.CalledProcessError as e:
raise e it's pasted about here: wgpu-py/wgpu/utils/shadertoy.py Line 316 in 2ffacd9 perhaps there is a better solution eventually, but it bypasses the panic for me |
We've had code like that in wgpu-py itself in its very early days 😝, before WGSL existsted and spirv was the only target. I hope it gets fixed upstream fast! |
Getting back to the original issue. The current situation is as follows:
I think that's enough to close this issue. |
not to revive this issue. But I have run into at least one more shader that panics in a similar way. And this one validates with I made a minimal script of it in this gist: https://gist.github.com/Vipitis/0981436d889deb4d330768fa5843b9c8 Haven't tried to minimize the shadercode to figure out what causes it. But I have seen at least one other panic which I will look into in the coming days. the panic happens at the same place as above, the backtrace looks as follows thread '<unnamed>' panicked at C:\Users\runneradmin\.cargo\git\checkouts\naga-dbb2b19faed49210\bac2d82\src\back\spv\block.rs:2066:56:
internal error: entered unreachable code: Expression [53] is not cached!
stack backtrace:
0: 0x7ff83a804cea - wgpu_render_pass_execute_bundles
1: 0x7ff83a81bb1b - wgpu_render_pass_execute_bundles
2: 0x7ff83a8020c1 - wgpu_render_pass_execute_bundles
3: 0x7ff83a804a6a - wgpu_render_pass_execute_bundles
4: 0x7ff83a806e1a - wgpu_render_pass_execute_bundles
5: 0x7ff83a806a88 - wgpu_render_pass_execute_bundles
6: 0x7ff83a8074ce - wgpu_render_pass_execute_bundles
7: 0x7ff83a8073bd - wgpu_render_pass_execute_bundles
8: 0x7ff83a8056d9 - wgpu_render_pass_execute_bundles
9: 0x7ff83a8070c0 - wgpu_render_pass_execute_bundles
10: 0x7ff83a847f15 - wgpu_render_pass_execute_bundles
11: 0x7ff83a78e57d - wgpu_render_pass_execute_bundles
12: 0x7ff83a79fb91 - wgpu_render_pass_execute_bundles
13: 0x7ff83a7a958f - wgpu_render_pass_execute_bundles
14: 0x7ff83a7ae3e2 - wgpu_render_pass_execute_bundles
15: 0x7ff83a7afe04 - wgpu_render_pass_execute_bundles
16: 0x7ff83a64ee19 - wgpu_render_pass_execute_bundles
17: 0x7ff83a656957 - wgpu_render_pass_execute_bundles
18: 0x7ff83a3c6fbf - wgpuDeviceDestroy
19: 0x7ff83a326da7 - wgpuDeviceDestroy
20: 0x7ff83a4cc7e8 - wgpuDeviceCreateRenderPipeline
21: 0x7ff85bc010f3 - <unknown>
22: 0x7ff85bc1ac90 - PyInit__cffi_backend
23: 0x7ff85bc074af - <unknown>
24: 0x7ff85926d44b - PyObject_Call
25: 0x7ff85924c65f - PyEval_EvalFrameDefault
26: 0x7ff859243963 - PyObject_GC_Del
27: 0x7ff8592455a7 - PyFunction_Vectorcall
28: 0x7ff85924823d - PyEval_EvalFrameDefault
29: 0x7ff859243963 - PyObject_GC_Del
30: 0x7ff85924b182 - PyEval_EvalFrameDefault
31: 0x7ff85924812f - PyEval_EvalFrameDefault
32: 0x7ff859243963 - PyObject_GC_Del
33: 0x7ff8592455a7 - PyFunction_Vectorcall
34: 0x7ff859298a6a - PyArg_ParseTuple_SizeT
35: 0x7ff85929699d - PyType_GenericNew
36: 0x7ff8592f30ac - PyObject_MakeTpCall
37: 0x7ff85924e073 - PyEval_EvalFrameDefault
38: 0x7ff859243963 - PyObject_GC_Del
39: 0x7ff859308bdd - PyEval_EvalCodeWithName
40: 0x7ff859308b1f - PyEval_EvalCodeEx
41: 0x7ff859308a7d - PyEval_EvalCode
42: 0x7ff85923587a - PyBytes_DecodeEscape
43: 0x7ff8592357fa - PyBytes_DecodeEscape
44: 0x7ff8592d8083 - Py_wfopen
45: 0x7ff8592d6434 - Py_stat
46: 0x7ff8592aa7ef - PyRun_SimpleFileExFlags
47: 0x7ff8593a2e04 - PyRun_AnyFileExFlags
48: 0x7ff8592d8dd2 - PyList_GetItem
49: 0x7ff8592da32c - Py_RunMain
50: 0x7ff8592da1b5 - Py_RunMain
51: 0x7ff8592d9e85 - Py_Main
52: 0x7ff795d81258 - <unknown>
53: 0x7ff8db707344 - BaseThreadInitThunk
54: 0x7ff8dbee26b1 - RtlUserThreadStart the error originates from naga and not inside wgpu as was with the Unknown Variables. So I suspect it's actually an issue with naga - but I haven't yet understand enough or found a similar issue |
One more idea for a workaround is to run python in a subprocess with |
If you manage to minimize the shader code to expose the place where the error originates, it might be worth reporting this at the Naga repo. |
I bypass it now by not just using
I will open an issue on naga as it's clearly not intended to have validation be successful and then translation panic. While looking a bit deeper I found one or two open issues that mention a similar error message. |
now tracking in gfx-rs/wgpu#4549 |
Thanks for diving into this! |
The wgpu-native just received a fix (gfx-rs/wgpu-native#313) that makes errors in GLSL also result in nice exceptions (in wgpu-py) instead of a panic. That's one part of the problem gone. |
great to hear. I will see how complicated it is to try wgpu-native from source and perhaps that makes everything a lot easier. As I have since discovered more panics which don't get caught by naga validation or naga translation. |
I've asked a very similar question before — here gfx-rs/naga#108 . This is rightfully closed: error messages are vastly better than they were a year ago. However, stepping back a bit, I see the wgpu still happily delivers these messages while issuing a Rust-level panic — that is, printing something and swiftly killing the whole process.
Has this become the long term plan? or is there a way to recover from these panics? is the intention that our wgpu-py renderers are separate, easily re-initializable disposable processes? Is this what
triangle_subprocess.py
is aiming towards? Is wgpu[-py] for cattle rather than pets?I'm asking this question here and not upstream because, while this behavior very Rust, it's really not very Python at all — imagine writing wgpu-py from the REPL, or live-coding something built on wgpu-py from Jupyter kernel etc. Now that we have a WSL --- a shader language that we can pass along to wgpu as strings (without recourse to an external shader compiler) --- we dare not really use it dynamically, because a typo in a shader quits our process with:
It speaks to what long-term applications of wgpu-py apps that are viable inside the python ecosystem. And it's also completely different set of rules from, say both OpenGL and WebGL --- which are perfectly recoverable from an error state.
The text was updated successfully, but these errors were encountered: