-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eager watch mode for exec #6507
Conversation
bin/exec.ml
Outdated
killed process stops. The pid of the killed process will be reaped. *) | ||
let kill_and_reap_process pid = | ||
let pid_int = Pid.to_int pid in | ||
Unix.kill pid_int Sys.sigkill; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigkill
is a bit aggressive. sigstop
is gentler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean sigterm
? sigstop
will "pause" the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially used sigkill
here because on windows it's the only signal emulated. On unix I think we should send sigterm
and then if the process is still running after some period, send sigkill. Scheduler.wait_for_process
waits for a process with a timeout and sends sigkill
if the timeout expires so that might be helpful, but it looks like it raises Build_cancelled
if the build is restarted while it's waiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially used sigkill here because on windows it's the only signal emulated
You can keep using Sygkill on Windows.
Scheduler.wait_for_process waits for a process with a timeout and sends sigkill if the timeout expires so that might be helpful, but it looks like it raises Build_cancelled if the build is restarted while it's waiting.
The behavior that we would like would be:
- Send sigterm
- Wait for a timeout
- Send sigkill if the process isn't dead
It would of course be nice if we could sure that with the rest of the code somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to wait send sigterm and then wait for a timeout before sending sigkill if the process is still running. I implemented this with periodic calls to wait [ WNOHANG ]
because it seems simpler than a blocking call to wait
in a separate thread.
bin/exec.ml
Outdated
let kill_and_reap_process pid = | ||
let pid_int = Pid.to_int pid in | ||
Unix.kill pid_int Sys.sigkill; | ||
let _stopped_pid, _process_status = Unix.waitpid [] pid_int in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the Scheduler primitives for launching and waiting for processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Scheduler
provide a way of starting a process? The only relevant function I see is:
val wait_for_process :
?timeout:float
-> ?is_process_group_leader:bool
-> Pid.t
-> Proc.Process_info.t Fiber.t
for waiting on a process. My goal is to start a process which runs concurrently to dune, and when a file is changed, to kill the child process if it's still running and then wait on it so its pid can be reaped. I don't see a way of doing this with the Scheduler
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's Process
for launching the process. But actually, you don't need that and you're free to use your existing function. You can definitely use Scheduler.wait_for_process
though.
The other operations you've mentioned can be done outside of Scheduler.
let build_and_run_in_child_process | ||
{ get_path_and_build_if_necessary; args; env } = | ||
get_path_and_build_if_necessary () | ||
|> Fiber.map ~f:(Result.map ~f:(spawn_process ~args ~env)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should release the global lock before running the process and then acquire it again. So that the user can start builds while their program is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to take the lock around running the build system
46bbcd1
to
4290314
Compare
7fb56ed
to
7b48f1c
Compare
@rgrinberg do you see any more changes that need to be made to this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks pretty cool! I'll let Rudi approve once his comments are addressed.
990b614
to
488e5cc
Compare
Looks like something is hanging on macos. I'll dig deeper tomorrow. |
On macos it looks like we miss filesystem events if they happen to close together in time. The tests wait until the program has run before modifying its source code to trigger a rebuild, and on macos sometimes a rebuild is not triggered by the change. Adding a delay fixes the problem but I don't want to depend on that in a test so I've disabled the tests on macos. I'll make an issue for this once this change is merged. |
Yeah, the issue on macos is well known. I think we have an assortment of hacks to deal with it in the test suite. I added some cosmetic changes to make it easier to review the code. The old code nested to the right a little too much. I removed releasing the lock. Unfortunately, it's not correct because we aren't unloading and reloading the in memory databases. |
bin/exec.ml
Outdated
restore_cwd_and_execve common prog argv env | ||
(* This will prevent status line printing from interfering with the output of | ||
the running program. *) | ||
Console.Backend.set Console.Backend.dumb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is right because we aren't calling finish
on the backend. I'll remove it for now and this fix can come independently since it also affects normal dune exec
.
Okay, the PR looks good to me. There's still some issues, but they can be addressed separately. |
Signed-off-by: Stephen Sherratt <stephen@sherra.tt>
@rgrinberg thanks for the fixes :) I've squashed all the commits into a single commit and rebased |
This reverts commit 5ff9a4f. This was causing occasional segfaults on macos when running `dune exec` so reverting this until we figure out what's causing that.
This reverts commit 5ff9a4f. This was causing occasional segfaults on macos when running `dune exec` so reverting this until we figure out what's causing that. Signed-off-by: Stephen Sherratt <stephen@sherra.tt>
Fixes #2934
Signed-off-by: Stephen Sherratt stephen@sherra.tt