-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait until 0MQ sockets are created before starting R #43
Conversation
583b58c
to
10749f3
Compare
crates/ark/src/shell.rs
Outdated
) -> Self { | ||
let iopub_tx = iopub_tx.clone(); | ||
spawn!("ark-r-main-thread", move || { | ||
// Block until 0MQ is initialised before starting R to avoid | ||
// thread-safety issues. See https://github.com/rstudio/positron/issues/720 | ||
let _ = conn_init_rx.recv().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should use recv_timeout()
so it doesn't have the potential to hang indefinitely -- after some reasonable timeout we should either log an error and exit or go ahead and let R start with a warning (the latter seems better to me but your call!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I agree it's better to let R start with a warning in that case.
But since this message passing is entirely in-process a hang should never happen right? Is the idea to be defensive about future potential programming bugs? After many years of coding in C, I've learned to happily ignore impossible situations that would otherwise crash or hang the program 😅
Also we're using blocking recv()
in other parts of the codebase, do you think we should generally be defensive like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's just defensive programming. Since 0MQ is initialized on another thread there's all kinds of things that could happen (an exception? the initialize calls don't return or deadlock?).
Good question re: recv()
... In most places we use recv()
on a threaded message loop wherein we wait for a message, process it, then wait for the next message. That pattern doesn't use a timeout b/c a long delay between messages is normal.
(that said, very possible that there are a few places that use recv()
that should be recv_timeout()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks
Using a one-off channel that is activated after all 0MQ sockets are created.
Addresses posit-dev/positron#720. I could no longer reproduce the crash using this branch.