Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How not to panic #171

Closed
marcdownie opened this issue Jun 14, 2021 · 26 comments
Closed

How not to panic #171

marcdownie opened this issue Jun 14, 2021 · 26 comments
Labels
question Further information is requested

Comments

@marcdownie
Copy link
Contributor

I've asked a very similar question before — here gfx-rs/naga#108 . This is rightfully closed: error messages are vastly better than they were a year ago. However, stepping back a bit, I see the wgpu still happily delivers these messages while issuing a Rust-level panic — that is, printing something and swiftly killing the whole process.

Has this become the long term plan? or is there a way to recover from these panics? is the intention that our wgpu-py renderers are separate, easily re-initializable disposable processes? Is this what triangle_subprocess.py is aiming towards? Is wgpu[-py] for cattle rather than pets?

I'm asking this question here and not upstream because, while this behavior very Rust, it's really not very Python at all — imagine writing wgpu-py from the REPL, or live-coding something built on wgpu-py from Jupyter kernel etc. Now that we have a WSL --- a shader language that we can pass along to wgpu as strings (without recourse to an external shader compiler) --- we dare not really use it dynamically, because a typo in a shader quits our process with:

Failed to parse WGSL code for Some(""): unknown type: `vec`
thread '<unnamed>' panicked at 'Parsing', src/device.rs:112:5

It speaks to what long-term applications of wgpu-py apps that are viable inside the python ecosystem. And it's also completely different set of rules from, say both OpenGL and WebGL --- which are perfectly recoverable from an error state.

@almarklein
Copy link
Member

Good point! For the time being, things should be pretty safe from the moment that all your shaders are correct (and you don't dun out of GPU memory). For dynamic use these panics are indeed not productive at all.

Eventually wgpu will be used from the browser, which is also very dynamic - and I'm pretty sure they won't kill the tab whenever there is a shader error :) So in the long run I think there won't be any panics ever. For the time being we'll have to live with them, and during that time wgpu-py will be less suited for interactively writing shaders.

@almarklein
Copy link
Member

I've asked upstream to get some more info: gfx-rs/wgpu-native#113

@Korijn
Copy link
Collaborator

Korijn commented Jun 15, 2021

So to clearly answer this question in particular:

Has this become the long term plan?

It seems very very unlikely!

However, full disclosure, we, the maintainers of wgpu-py, have to delegate your question upstream, so please keep an eye on the thread @almarklein mentioned.

@almarklein almarklein added the question Further information is requested label Dec 14, 2021
@Vipitis
Copy link
Contributor

Vipitis commented Aug 15, 2023

I would love to know the progress on this. From the linked issue it looks like there has been some progress.
A python try: can't catch these panics so crashes instead. But the error messages look really helpful for my use case, so if there is a way to get these into python as exception for example.

StdErr from Kernel Process thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: [Error { kind: SemanticError("Unknown function 'sdBox'"), meta: Span { start: 2051, end: 2090 } }]', src\conv.rs:461:56

@almarklein
Copy link
Member

We plan to update wgpu-py soon(ish) so it can make use of the new wgpu-native that has implemented the push/pop error scopes. From what I know there are still few situations where wgpu can get into a state where it will panic, but these cases will get more rare over time.

@Vipitis
Copy link
Contributor

Vipitis commented Aug 15, 2023

Will this be a weeks, months or years timeline? If it's unlikely to be done this year, I will have to look for a different solution for my project.
By any chance is there a way to preview those features, perhaps a dev branch?

@almarklein
Copy link
Member

Taking into account that it's the holiday season my guess would be that we start working on 5-7 weeks from now. Nothing has been done yet, so no dev branch to check.

@Vipitis
Copy link
Contributor

Vipitis commented Oct 9, 2023

I tried my implementation with v0.10.0 but still got into a panic. I will try to run the debugger a bit more and see how the new changes of ErrorHandler work. Since the debugger now jumps farther and gets there. this time I remain optimistic and might even find a solution

@almarklein
Copy link
Member

Are you able to reduce your use-case into a minimal standalone example that you could post here, and that we can then run? That might help get more insight or ideas on how to resolve it.

@Vipitis
Copy link
Contributor

Vipitis commented Oct 10, 2023

I have taken a look at the examples and simply took the default Shadertoy shader and made an intentional typo in line 10, writing coll instead of col

    // Output to screen
--- fragColor = vec4(col,1.0);
+++ fragColor = vec4(coll,1.0);

This is I guess the minimal script to run this in wgpu-py. (It works without the typo as execpected)

from wgpu.utils.shadertoy import Shadertoy

shader_code = """
void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // Normalized pixel coordinates (from 0 to 1)
    vec2 uv = fragCoord/iResolution.xy;

    // Time varying pixel color
    vec3 col = 0.5 + 0.5*cos(iTime+uv.xyx+vec3(0,2,4));

    // Output to screen
    fragColor = vec4(coll,1.0);
}
"""
shader = Shadertoy(shader_code, resolution=(800, 450))

if __name__ == "__main__":
    shader.show()

instead of raising a ValidationError and getting a message telling me "unknown identifier" (as a similar typo does in the provided examples), you get this panic and the program exists

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: [Error { kind: UnknownVariable("coll"), meta: Span { start: 636, end: 640 } }]', src\conv.rs:557:58
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace`

it might be specific to GLSL support using code directly from shadertoy.com
I have chased the debugger all the way to this specific line (debugger crashes when stepping into here):

result = ob(*args)

a helpful first breakpoint is here:
frag_shader_program = self._device.create_shader_module(


I really hope this helps, I just looked at this on my home PC with an Intel Arc GPU - but had the same issue using an external machine (with Intel GPUs as well). Will try this on my laptop shortly (nvidia gpu).

@marcdownie
Copy link
Contributor Author

marcdownie commented Oct 10, 2023

To chime in here, "incorrect shaders" (including syntactical and semantic typos which could be caught by some SPIRV-ish toolchain, but also incorrect and mismatched layouts of buffers where the shader is correct in and of itself but wrong in context) is an extremely common condition in any creative coding applications. My shaders are likely "correct" in a professional sense only at some unspecified point before the audience walks into the hall; at my desk they spend more time being incorrect than not.

@Vipitis
Copy link
Contributor

Vipitis commented Oct 10, 2023

the whole use case for me is as part of an evaluation benchmark for code generation language models. One idea I am following is to run the generated code and then evaluate if frames match, another will look at the rendered frames with a CLIP model to see if the rendered frame matches semantically to the input prompt.
Understanding what 'kind' of error will be helpful as you can either grade a generation as wrong or simply incomplete, if it is for example calling a function that doesn't exist (which you could then ask the model etc) until it ends up with something working. It is currently not clear if I manage to get to that 2nd part during my bachelor thesis, but it's definitely something very interesting I want to look at in the next year.
I have previously been told to look at naga for validation and I still might, but the lazy python developer I am first looked at various solutions to easily run shadertoy shaders directly with python, and this project had the most useful implementation.

@almarklein
Copy link
Member

I played around a bit with a few shaders, and I think that errors in shaders written in WGSL result in Python exceptions, whereas errors in GLSL result in a panic. This is unfortunate, but there is not much we can do here. It's simply that they've not made this work correctly yet for GLSL. I presume that WGSL support is of higher priority to the Naga devs than GLSL, but I expect things it will be fixed eventually.

You could check the repo to see if there is an issue that already tracks this, and create one if not. (If you know a bit of Rust you might be able to help implement it.)

@Korijn
Copy link
Collaborator

Korijn commented Oct 10, 2023

To chime in here, "incorrect shaders" (including syntactical and semantic typos which could be caught by some SPIRV-ish toolchain, but also incorrect and mismatched layouts of buffers where the shader is correct in and of itself but wrong in context) is an extremely common condition in any creative coding applications. My shaders are likely "correct" in a professional sense only at some unspecified point before the audience walks into the hall; at my desk they spend more time being incorrect than not.

To be clear, we're not questioning your need to be able to handle exceptions. We want it too!

The simple fact is that wgpu-py is a Python library with Python wrappers for the Rust library wgpu-native. We don't develop wgpu-native. We develop wgpu-py here.

The reality is that if a Rust library panics, the Python wrappers can't "undo the panic" and raise a Python exception instead. There's just nothing we can do in that situation.

We depend on the Rust library developers to improve wgpu-native, to prevent it from panicking and provide a runtime mechanism to handle the errors that we can utilize in Python wrappers.

@Vipitis
Copy link
Contributor

Vipitis commented Oct 10, 2023

You could check the repo to see if there is an issue that already tracks this, and create one if not. (If you know a bit of Rust you might be able to help implement it.)

I will dig around for the next few weeks to hopefully find a solution. But already found essentially the same issue tracking here: gfx-rs/wgpu#2545

@Vipitis
Copy link
Contributor

Vipitis commented Oct 12, 2023

for now I am using a janky solution that works to convince my supervisors.

with tempfile.NamedTemporaryFile(suffix=".frag", mode="w") as f:
    f.write(frag_shader_code)
    f.flush()
    try:
        subprocess.run(["naga", f.name], check=True, capture_output=True)
    except subprocess.CalledProcessError as e:
        raise e

it's pasted about here:

perhaps there is a better solution eventually, but it bypasses the panic for me

@almarklein
Copy link
Member

We've had code like that in wgpu-py itself in its very early days 😝, before WGSL existsted and spirv was the only target. I hope it gets fixed upstream fast!

@almarklein
Copy link
Member

Getting back to the original issue. The current situation is as follows:

  • wgpy-native has a mechanism to communicate errors via the C API.
  • In wgpu-py we make use of it, and raise Python errors (at the appropriate C call) in most cases, and log the error otherwise.
  • In wgpu-core, most errors now get communicated via this route.
  • There may be Rust panics in some specific cases (e.g. syntax errors in GLSL), but they will most certainly be communicated via the same mechanism eventually.

I think that's enough to close this issue.

@Vipitis
Copy link
Contributor

Vipitis commented Oct 19, 2023

not to revive this issue. But I have run into at least one more shader that panics in a similar way. And this one validates with naga just fine(the frag_shader_code.frag mentioned above) - it panics with naga when translating to SPIR-V, not when translating to wgsl. It's also working on shadertoy. here is the source: https://www.shadertoy.com/view/NsffD2

I made a minimal script of it in this gist: https://gist.github.com/Vipitis/0981436d889deb4d330768fa5843b9c8

Haven't tried to minimize the shadercode to figure out what causes it. But I have seen at least one other panic which I will look into in the coming days.

the panic happens at the same place as above, the result = ob(**args).

backtrace looks as follows

thread '<unnamed>' panicked at C:\Users\runneradmin\.cargo\git\checkouts\naga-dbb2b19faed49210\bac2d82\src\back\spv\block.rs:2066:56:
internal error: entered unreachable code: Expression [53] is not cached!
stack backtrace:
   0:     0x7ff83a804cea - wgpu_render_pass_execute_bundles
   1:     0x7ff83a81bb1b - wgpu_render_pass_execute_bundles
   2:     0x7ff83a8020c1 - wgpu_render_pass_execute_bundles
   3:     0x7ff83a804a6a - wgpu_render_pass_execute_bundles
   4:     0x7ff83a806e1a - wgpu_render_pass_execute_bundles
   5:     0x7ff83a806a88 - wgpu_render_pass_execute_bundles
   6:     0x7ff83a8074ce - wgpu_render_pass_execute_bundles
   7:     0x7ff83a8073bd - wgpu_render_pass_execute_bundles
   8:     0x7ff83a8056d9 - wgpu_render_pass_execute_bundles
   9:     0x7ff83a8070c0 - wgpu_render_pass_execute_bundles
  10:     0x7ff83a847f15 - wgpu_render_pass_execute_bundles
  11:     0x7ff83a78e57d - wgpu_render_pass_execute_bundles
  12:     0x7ff83a79fb91 - wgpu_render_pass_execute_bundles
  13:     0x7ff83a7a958f - wgpu_render_pass_execute_bundles
  14:     0x7ff83a7ae3e2 - wgpu_render_pass_execute_bundles
  15:     0x7ff83a7afe04 - wgpu_render_pass_execute_bundles
  16:     0x7ff83a64ee19 - wgpu_render_pass_execute_bundles
  17:     0x7ff83a656957 - wgpu_render_pass_execute_bundles
  18:     0x7ff83a3c6fbf - wgpuDeviceDestroy
  19:     0x7ff83a326da7 - wgpuDeviceDestroy
  20:     0x7ff83a4cc7e8 - wgpuDeviceCreateRenderPipeline
  21:     0x7ff85bc010f3 - <unknown>
  22:     0x7ff85bc1ac90 - PyInit__cffi_backend
  23:     0x7ff85bc074af - <unknown>
  24:     0x7ff85926d44b - PyObject_Call
  25:     0x7ff85924c65f - PyEval_EvalFrameDefault
  26:     0x7ff859243963 - PyObject_GC_Del
  27:     0x7ff8592455a7 - PyFunction_Vectorcall
  28:     0x7ff85924823d - PyEval_EvalFrameDefault
  29:     0x7ff859243963 - PyObject_GC_Del
  30:     0x7ff85924b182 - PyEval_EvalFrameDefault
  31:     0x7ff85924812f - PyEval_EvalFrameDefault
  32:     0x7ff859243963 - PyObject_GC_Del
  33:     0x7ff8592455a7 - PyFunction_Vectorcall
  34:     0x7ff859298a6a - PyArg_ParseTuple_SizeT
  35:     0x7ff85929699d - PyType_GenericNew
  36:     0x7ff8592f30ac - PyObject_MakeTpCall
  37:     0x7ff85924e073 - PyEval_EvalFrameDefault
  38:     0x7ff859243963 - PyObject_GC_Del
  39:     0x7ff859308bdd - PyEval_EvalCodeWithName
  40:     0x7ff859308b1f - PyEval_EvalCodeEx
  41:     0x7ff859308a7d - PyEval_EvalCode
  42:     0x7ff85923587a - PyBytes_DecodeEscape
  43:     0x7ff8592357fa - PyBytes_DecodeEscape
  44:     0x7ff8592d8083 - Py_wfopen
  45:     0x7ff8592d6434 - Py_stat
  46:     0x7ff8592aa7ef - PyRun_SimpleFileExFlags
  47:     0x7ff8593a2e04 - PyRun_AnyFileExFlags
  48:     0x7ff8592d8dd2 - PyList_GetItem
  49:     0x7ff8592da32c - Py_RunMain
  50:     0x7ff8592da1b5 - Py_RunMain
  51:     0x7ff8592d9e85 - Py_Main
  52:     0x7ff795d81258 - <unknown>
  53:     0x7ff8db707344 - BaseThreadInitThunk
  54:     0x7ff8dbee26b1 - RtlUserThreadStart

the error originates from naga and not inside wgpu as was with the Unknown Variables. So I suspect it's actually an issue with naga - but I haven't yet understand enough or found a similar issue

@Korijn
Copy link
Collaborator

Korijn commented Oct 20, 2023

for now I am using a janky solution that works to convince my supervisors.

with tempfile.NamedTemporaryFile(suffix=".frag", mode="w") as f:
    f.write(frag_shader_code)
    f.flush()
    try:
        subprocess.run(["naga", f.name], check=True, capture_output=True)
    except subprocess.CalledProcessError as e:
        raise e

it's pasted about here:

perhaps there is a better solution eventually, but it bypasses the panic for me

One more idea for a workaround is to run python in a subprocess with sys.executable, to see if some code will panic or not. You can avoid the dependency on the naga binary that way.

@almarklein
Copy link
Member

If you manage to minimize the shader code to expose the place where the error originates, it might be worth reporting this at the Naga repo.

@Vipitis
Copy link
Contributor

Vipitis commented Oct 20, 2023

for now I am using a janky solution that works to convince my supervisors.

with tempfile.NamedTemporaryFile(suffix=".frag", mode="w") as f:
    f.write(frag_shader_code)
    f.flush()
    try:
        subprocess.run(["naga", f.name], check=True, capture_output=True)
    except subprocess.CalledProcessError as e:
        raise e

it's pasted about here:


perhaps there is a better solution eventually, but it bypasses the panic for me

One more idea for a workaround is to run python in a subprocess with sys.executable, to see if some code will panic or not. You can avoid the dependency on the naga binary that way.

I bypass it now by not just using naga file.frag but naga file.frag file.spv since this translation does panic and hence throw an exception I can work with. However this doesn't make semantic sense in my implementation as I label those as 'code_error'. But that's all on me and trivial to solve.

If you manage to minimize the shader code to expose the place where the error originates, it might be worth reporting this at the Naga repo.

I will open an issue on naga as it's clearly not intended to have validation be successful and then translation panic. While looking a bit deeper I found one or two open issues that mention a similar error message.

@Vipitis
Copy link
Contributor

Vipitis commented Oct 24, 2023

now tracking in gfx-rs/wgpu#4549
I have also been able to construct a new minimal example, so the gist linked above as been updated to include that.

@almarklein
Copy link
Member

Thanks for diving into this!

@almarklein
Copy link
Member

The wgpu-native just received a fix (gfx-rs/wgpu-native#313) that makes errors in GLSL also result in nice exceptions (in wgpu-py) instead of a panic. That's one part of the problem gone.

@Vipitis
Copy link
Contributor

Vipitis commented Nov 3, 2023

great to hear. I will see how complicated it is to try wgpu-native from source and perhaps that makes everything a lot easier. As I have since discovered more panics which don't get caught by naga validation or naga translation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants