Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print stack trace for panics that originate in model code #1569

Closed
hannobraun opened this issue Feb 7, 2023 · 7 comments · Fixed by #1721
Closed

Print stack trace for panics that originate in model code #1569

hannobraun opened this issue Feb 7, 2023 · 7 comments · Fixed by #1721
Labels
good first issue Good for newcomers type: feature New features and improvements to existing features

Comments

@hannobraun
Copy link
Owner

Executing a todo!() in the code of a model results in this error message:
Screenshot from 2023-02-07 13-00-09

This is not ideal, as it lacks some critical information:

  • It doesn't explain that a panic occurred.
  • It doesn't show where that panic occurred.

Ideally, the error message should clearly explain that there was a panic while loading the model, originating from the model code, and print a stack trace for the panic. Even more ideally, this stack trace would leave out any irrelevant parts from outside of the model code, and only show the stack trace of the panic starting from the model function.

Panic handling for models centers around the on_panic function, and I think that any necessary changes will either be completely contained within that function, or at least center around it (if changing the function's signature is necessary, that will require changes in the code using that function).

Labeling as https://github.com/hannobraun/Fornjot/labels/good%20first%20issue, as this is a fairly localized change. It will require some knowledge of Rust panic and error handling to address, but shouldn't require much knowledge about Fornjot in particular.

@IamTheCarl
Copy link
Contributor

I've started to take a crack at implementing this and have run into problems.
catch_unwind will not give you the information you are looking for, period.
I'm not sure why. I think they just don't want you doing that.

Implementing a custom panic handler lets me get that information, but oddly it does not handle panics caught by catch_unwind. I'll look into that more later.
It also adds this nasty ugliness of needing a global variable to pass that panic information back to the rest of the application.

To top it off, official documentation suggests catching panics is a bad practice.
It looks to me like the client library is designed to be serialized. Maybe instead of compiling to a lib and linking that in at runtime, we should build an executable and print that data to stdout?

At that point, if the client application panics, we can just capture stderr to get the stack trace.
This would also be much more in-line with how the Rust documentation suggests using its libraries.

@IamTheCarl
Copy link
Contributor

Looking into the issue with the panic handler not intercepting panics caught by catch_unwind, it seems that the panic hook is a global variable and that each module we're loading has defined its own personal panic hook, so that's why that isn't working. We'd have to make some kind of change to the client library to fix that, or use a completely different approach to models.

@IamTheCarl
Copy link
Contributor

In case you're curious, here's my branch implementing what I have. Most of the changes are here.

@hannobraun
Copy link
Owner Author

Thank you for looking into this, @IamTheCarl!

I've started to take a crack at implementing this and have run into problems.
catch_unwind will not give you the information you are looking for, period.
I'm not sure why. I think they just don't want you doing that.

That's too bad.

Maybe the issue is that the backtrace isn't captured? You have to run a Rust application with RUST_BACKTRACE=1 to get one for panic messages, after all. If that's the case, I wonder if we can enable that unconditionally for model code somehow?

To top it off, official documentation suggests catching panics is a bad practice.

It is bad practice to use it as a replacement for Result. But I think it's appropriate for a case like this, where you have foreign code that you don't control, and you have a clear boundary where you want panics not to propagate any more.

On top of that, it's also necessary in this case. There's an FFI boundary between the Fornjot application and models (even though it's Rust on both sides), and panicking over an FFI boundary is undefined behavior. That must be avoided.

It looks to me like the client library is designed to be serialized. Maybe instead of compiling to a lib and linking that in at runtime, we should build an executable and print that data to stdout?

That's an interesting idea!

However, I'm not sure how well that concept will hold up in the future. Please note that the plan is to migrate from the current system to WASM (#71), and also to be able to run in browsers.

We'd have to make some kind of change to the client library to fix that, or use a completely different approach to models.

It's probably worth looking into how the transition to WASM would affect this. I don't think it would be worth going through a lot of trouble for a solution, that might not work anymore later.

@IamTheCarl
Copy link
Contributor

I just read through #71 very quickly to try and catch myself up.
It seems you have a few issues there that don't have a great solution yet.

1st: An interface binding between Fornjot and its WASM applications is hard to create and maintain.
2nd: Multiple execution "backends" will be needed for different platforms. I imagine these backends may eventually translate into other languages even.
3rd: You can't pack a Rust compiler into the web browser.

I think the serialization idea I mentioned could be a solution to all of those problems.

1st: To get the serialized form from the WASM runtime to Fornjot, you just need to drop it somewhere on the heap and pass a pointer to that back to Fornjot. No need for enum support or Rust ABI versions as that'll all be covered by the serde serialization format we choose. This could be completely implemented on a C ABI, and that would likely be a good first prototype.

2nd: The execution backend doesn't matter as long as it outputs that serialized data to somewhere Fornjot can read it, be that stdout, a block of memory, a file, a TCP socket, or something else that can transfer a linear block of bytes.

3rd: While I can't pack a rust compiler into a web browser, I can copy what the rust playground does. They send your code off to a remote docker container, compile and run it there, and send the result back. For us that result would be the serialized format.

One last thing to mention is that this serialized format would be language agnostic. If another language just generated this format, Fornjot would be able to accept it. I do realize this is a bit of a step away from the "code first" approach since this would technically be a file format. Think of it like LLVM's intermediate representation, for CAD models. Someone could even build a more traditional visual editor using this architecture.

If that all sounds good to you, I'm cool moving this conversation over to #71 and toying with an experimental implementation.

@IamTheCarl
Copy link
Contributor

After taking a second look at this I discovered that I critically misunderstood where the panic was being caught.
It's caught in the model's code, not from within Fornjot itself.

With that in mind I moved the location of my call to fj::abi::initialize_panic_handling(); from within Fornjot to within the model's code.

It works pretty well now. I just need to figure out how to make the #[fj::macro] call automatically insert that.
I am very new to proc macros so that will take a little time.
image

@hannobraun
Copy link
Owner Author

Thank you for your comments, @IamTheCarl!

I think you've convinced me about the serialized format (barring any new insights which might or might not arise). I have some concerns about overhead, but given that the CAD kernel has to do a lot of number-crunching, and WASM's overhead might not be any better, I would assume it doesn't really matter as far as performance goes. Code complexity also shouldn't be too bad, given what we're already dealing with in regards to the FFI boundary (and I'm unclear how much better that would end up being with WASM, if at all).

Some specific comments:

Multiple execution "backends" will be needed for different platforms. I imagine these backends may eventually translate into other languages even.

I think the current consensus is that Wasmer should work everywhere. (Although there was a lot of discussion about various solutions in that issue.) But that doesn't invalidate your argument. I'm just noting that this item doesn't fit my current understanding.

One last thing to mention is that this serialized format would be language agnostic. If another language just generated this format, Fornjot would be able to accept it.

👍

A suggestion for the implementation though, I think it might be best to initially stick to something easy and well-understood like Serde (maybe using Postcard as the serialization format). That would be low-maintenance (compared to anything more advanced or custom), and we can always switch to something else later.

I do realize this is a bit of a step away from the "code first" approach since this would technically be a file format. Think of it like LLVM's intermediate representation, for CAD models. Someone could even build a more traditional visual editor using this architecture.

Well, this serialized format would be generated on-the-fly by code, and later this would include back-and-forth communication with the kernel. So while this would be a departure from code-first in implementation, it's not really a departure in spirit. It's just a different calling convention, as far as model code is concerned.

If that all sounds good to you, I'm cool moving this conversation over to #71 and toying with an experimental implementation.

Sounds great!

As I mentioned above, I'm pretty much bought-in at this point, but I'm interested in what the folks over in #71 have to say about this new development.


Regarding the issue at hand (the panics), I've seen your pull request and will take a look next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers type: feature New features and improvements to existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants