Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic safety (or lack thereof) #1543

Open
nyurik opened this issue Feb 7, 2024 · 7 comments
Open

Panic safety (or lack thereof) #1543

nyurik opened this issue Feb 7, 2024 · 7 comments
Labels
docs Improvements or additions to documentation error-handling ffi-safety question Further information is requested

Comments

@nyurik
Copy link
Contributor

nyurik commented Feb 7, 2024

Would it be possible to refactor the code to place all unsafe code within one crate? Similar to how other crates keep FFI bindings in -sys, where as the rest of complex code is in pure safe Rust, without any safety concerns? This is more of a hypothetical question at this stage, but if this goal is set early on, it might be possible to move towards that.

The second aspect is panics - seems panics is the default method for most PGRX operations, whereas typically .unwrap() is a sign that an error is not being properly handled, and may result in a runtime error. Should gradual migration towards the Result<T,E> be preferred, and what would be needed for this?

I am a newcomer to the codebase, so please take it with a grain of salt. Thx!

@workingjubilee
Copy link
Member

workingjubilee commented Feb 7, 2024

Would it be possible to refactor the code to place all unsafe code within one crate? Similar to how other crates keep FFI bindings in -sys, where as the rest of complex code is in pure safe Rust, without any safety concerns? T

I'm not sure I understand this, because a safe function can be internally unsafe, but -sys crates are generally externally unsafe because they are pure bindings.

Consider pgrx::list::List: the main way to obtain one is to call an unsafe fn that then bears the safety assertion for the rest of the type's implementation. This is partially because rust-lang/unsafe-code-guidelines#256 hasn't been resolved decisively, but this is deeply inherent in safe wrappers.

@workingjubilee workingjubilee added question Further information is requested ffi-safety docs Improvements or additions to documentation labels Feb 7, 2024
@workingjubilee
Copy link
Member

I think the unsafe thing is not very actionable, but we could pervasively document what panics, for a start. I think we have many cases where we risk a panic, knowing it, and then don't have # Panic. There are cases where it's only known from accessing Postgres, though.

@workingjubilee workingjubilee changed the title Path to (code) safety Panic safety (or lack thereof) Feb 12, 2024
@nyurik
Copy link
Contributor Author

nyurik commented Feb 12, 2024

Thanks @workingjubilee ! Could you elaborate on your second point about accessing Postgress panics? Do you mean Postgres could cause a panic in C code, and the Rust code simply ignores it and passes it through until some code up the stack handles it? If so, we could move the panic catching logic to wrap the FFI code that calls C function - and only propagate a proper Result thereafter - again, assuming I understood the issue. And if so, the rest of the Rust-specific panics could be converted to the proper error result values as well.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 12, 2024

Thanks @workingjubilee ! Could you elaborate on your second point about accessing Postgress panics? Do you mean Postgres could cause a panic in C code, and the Rust code simply ignores it and passes it through until some code up the stack handles it?

Postgres has an error-handling context, it doesn't use unwinding. Instead it "throws" using longjmp. Some functions are known to throw errors. In some cases, when we intercept these errors, they are things we should unconditionally panic on, rather than attempting to catch anything, because the error is ABORT or something like that.

You might find looking at this informative:

@nyurik
Copy link
Contributor Author

nyurik commented Feb 12, 2024

thx for the links, but i must still be misunderstanding the meaning of "we should unconditionally panic on" -- at the end of the day, panics and error results are nearly identical in the sense that both can be caught and handled by the calling function. The results are just "cleaner". So there are these cases:

  • PG calls a PGRX Rust callback function which calls a bunch of other Rust functions. Some Rust function panics, and that panic is caught by the callback function and is passed to PG via some proper error reporting API. In this case, I don't see any reason to use panics - but instead can all use standard Rust Results.
  • PG calls a PGRX Rust callback that calls Rust code that in turn calls internal PG functions via PRGX FFI again. In this case, PG could throw some longjump exception that PGRX FFI could probably unwind / re-panic / pass back to the initial PGRX callback function, which will convert that panic back to the error API(?).
  • I don't think there is a case when the app's main fn is written in Rust, that calls into PG code (e.g. reusing psql or other pg C library) - so there is no point in covering it just yet.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 12, 2024

thx for the links, but i must still be misunderstanding the meaning of "we should unconditionally panic on" -- at the end of the day, panics and error results are nearly identical in the sense that both can be caught and handled by the calling function

One of the error levels that Postgres uses indicates that data corruption is occurring or could occur if the program proceeds further, rescue is not possible, and everything should terminate essentially immediately. It may still run certain termination handlers nonetheless (like unwinding would).

So, no. I agree that we may want to make it easier to handle certain possibly-panicking functions, but mostly because if we do things that encourage people to roll their own logic for catching unwinds in Rust, they will incorrectly try to "handle" an error that should never be handled.

@nyurik
Copy link
Contributor Author

nyurik commented Feb 12, 2024

I absolutely agree that there is a very small (?) class of errors that should bypass the Result structs using Rust's panics. My point is that this should only affect the second case: PG -> Rust -> PG. And I also agree that we should make it extra difficult for the users to capture such cases. For everything else, I think a standard Rust Result<T, E> should be used, which means there should be almost no .unwrap() or any other panic-creating calls in the PGRX. Is my understanding corect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation error-handling ffi-safety question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants