-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ark: Should Ark be split in two processes? #802
Comments
For what it's worth, the RStudio rsession works in a somewhat similar way, in that the main thread runs R, and the majority of the tooling around it relies on R's event loop (e.g. We later resolved this by having some subset of RPCs be handled by a separate thread which listens to RPC requests -- it can either handle those eagerly, or push those into a queue for the main R thread to handle when ready. Ideally, I think the R main thread and the LSP should live in the same process (or, potentially, run on the same thread), since the LSP will want to introspect R state -- e.g. the debugger might need to peek into the R context stack, and (if R and the LSP were in different processes) serializing + deserializing that could be expensive + more cumbersome to work with.
FWIW from https://linux.die.net/man/7/pthreads:
We'd need to be cognizant about how we handle signals in the various threads we're spawning. There are some different conventions, but some people recommend having a dedicated signal-handling thread, and to block signals in all other threads. See e.g. https://stackoverflow.com/a/22005820/1342082. See also: |
Haven't seen any more instability since then. I think we can close this for now and reopen the issue if it comes up again in the future. |
This issue summarises a discussion we had this week at the Ark-Sync meeting about splitting Ark into two processes:
One process would be fully single threaded and run R. In this process, Ark would only run synchronously with R when called by frontend methods.
The other process would be multi-threaded and manage the sockets and parts of LSP that require constant work.
The two processes would communicate via an RPC protocol over IPC. The Ark-R process would pull tasks from the Ark-host process and return results using this protocol.
Advantages of a single threaded process
This reorganisation would have costs in terms of overall complexity and heavier message-passing. However it would have several advantages:
It would ensure thread-safety, and alleviate any concerns caused by the crash documented in ark: Weird crash when restarting R 4.1 repeatedly #720 and fixed in Wait until 0MQ sockets are created before starting R ark#43. The investigation of this crash indicated that
localeconv_l()
, called byprintf()
is not thread-safe on macOS.That
printf()
might not be thread safe is scary but also very surprising, and it's possible a third cause escaping the investigation caused the crash instead. Nevertheless, running R in parallel to other threads always carries a risk because R manipulates global state such as environment variables or locales, and this could cause functions reading this state from other threads to crash.Kevin mentioned signal handling to be another area that is made more difficult by having R running in a multithreaded context.
The introspection routines of Ark can't run in parallel to R and so they need to get a lock at yield time (interrupt/read-console). This causes safety issue because we can't currently guarantee that the R API is only called under this lock. Also the details of how the lock works caused various issues in the past (reentrency, fairness). Davis worked on this in Use a
ReentrantMutex
for the R lock ark#48, POC forr_lock!
enforcement on the R API ark#26, and Usebump()
to yield the lock in a fair manner ark#49. These lock issues would be solved if our introspection routines were called from R at yield time instead of trying to get a lock.Routines that need to run on the multithreaded process
Some sockets necessarily need their own thread, for instance the control socket. In general socket handling is made easier with a multithreaded design (though it makes it harder to deal with certain message passing races).
The LSP maintains in-memory models of the opened documents. It is constantly receiving document changes events from the frontend and processing them. This work can't realistically be done at interrupt time and also we don't want to pass all this data across another IPC layer.
Conclusion
The main reason to make that change is thread safety. But overall it doesn't seem worth it to make this change just to be defensive. The crash mentioned above was fixed by ensuring R only starts at a safer point, after initialisation of 0MQ state. But we should be on the lookout for random crashes in the wild that could be explained by thread unsafety.
The text was updated successfully, but these errors were encountered: