-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure child creation time is after parent creation time #1
base: master
Are you sure you want to change the base?
Conversation
…king via reused PPids
if err != nil { | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a defer CloseHandle(handle)
here? I think this might leak otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! I added a call to closeHandle(HANDLE(handle))
to reuse the existing function. I also noticed that windows.CloseHandle
is available with the new import, though it can also return an error. Not sure if that way would be better or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not even a beginner with Go, so I'm not qualified to comment on best practice :) The only thing that stands out to me is that you never return anything explicitly, while the rest of the code does. I assume that's purely a stylistic choice and Go does something sensible with the returns (editing previous dumb comment, does it create local variables with the names in the trailing part of the function definition and return those?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that's purely a stylistic choice and Go does something sensible with the returns (editing previous dumb comment, does it create local variables with the names in the trailing part of the function definition and return those?)
Yes, I was also surprised at how this works but the names before the types in the function return type are local variables which will be used as return values when doing a blank return. Seems to be commonly used to avoid having to supply the full (ok, err) tuple at each return, and is used by GetProcs
in this code for example. But I'm certainly not qualified to comment on best practices either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the record I'm not a fan of naked returns (when you specify the variable name as part of the function signature) because they confuse everybody (myself included) who comes across them. I am aware this is my repo so it's strange that the GetProcs function uses naked returns but in truth I hadn't formed this opinion before the PR was originally made (it wasn't made by me)
I'm not much of a windows guy (as you can tell by the fact that the existing code kills random processes) but what are your thoughts on the approach taken here? https://gist.github.com/hallazzang/76f3970bfc949831808bbebc8ca15209?permalink_comment_id=2948162#gistcomment-2948162 |
Using job objects does seem a lot cleaner - children are added to the job by default if the parent is, and you don't have to walk through the processes yourself. I had a go (ha!) at doing something like this, but I kept looking for a way to use cmd.process.handle, which is not accessible (I see the gist code "cheats" with unsafe operations to use that, while @nebel takes a more API-appropriate path and gets the handle again). It's a bit unfortunate that there needs to be some co-operation between the CreateProcess code and the killing code to support this though. With the spawning and killing handled in separate modules, it seems a bit ugly. I also wonder if it's possible to have a race condition this way - if a child is spawned before the new process gets added to the job group, it might never get terminated. With the time it takes to spawn a process, this is probably not a huge concern though. My thought was to start the process suspended until the job object was created and the new process added, and also use the PID to generate a job name (e.g. jduffield_kill_[pid]) so at least the kill interface didn't need to change. It could try to open a job object with that name, and if it failed fall back on the existing behaviour. |
@Jim-with-a-J pretty much summed up my thoughts as well. I actually did test the code in the gist you linked, @jesseduffield, and confirmed that it does work--with the rather large caveat that to kill the process tree we don't actually invoke In addition to the race condition Jim mentioned (which I was also concerned about, however unlikely it seems) there is also the fact that in Windows 7 and earlier a process can't be in multiple jobs. I don't know if we're concerned about supporting Windows 7 but for example if lazygit itself was in a job (which is actually quite possible as the terminal could be in a job), trying to add an additional job to a child process of lazygit would fail. So in that case we might need to fall back to the old kill method. But my main concern is just the overhead in additional code of doing things the correct way, as Jim mentioned. Probably we would need to wrap the mechanisms for both launching (to set the job) and killing (to close the job handle) in OS-specific code, and pass around e.g. a That said, I could try to implement it if you're interested. Being a novice at Go I don't know if my design would be the best, though, and it's also possible we'd need to keep this time-checking tree-diving version in here as a fallback for when job creation fails for some reason. Let me know what you think. |
Is there any way to know how many other projects are using the kill library? Since co-operation is needed from the calling code (and there's still the potential for the job object approach to fail), I think probably this approach using the running time should still be added as the default behaviour. Any client that doesn't want to/can't immediately switch to using job objects doesn't have to, but they still benefit from not killing unrelated processes, and some extra time can be taken to figure out the best way to introduce job objects for Windows. |
Here's an example of what an API for using jobs to kill the process tree might look like: https://gist.github.com/nebel/ff407aa88e5a7b261f5d3c673340a884 It will fall back to the current kill approach if job creation or adding processes to a job fails. Running the test will run cmd.exe from which you can run notepad.exe manually to check how things are being killed. And you can uncomment out various kill strategies here: //go waitKill(wrapped.cmd)
//go waitKillTree(wrapped.cmd)
go waitTerminate(&wrapped) From my testing both I'm not sure what the non-Windows alternative would do. Should we set I will say though that I still don't particularly like how unwieldy this approach is, but if it's preferable then I think it could be used. Again though we'd still need to fix the existing kill method as we probably need it as a fallback. |
Great progress! Sorry, I haven't had much time to dive into the details here. But I have just now done some more googling to see if we've missed anything. Some assorted links: Go issue about support windows job groups: How buildkite handles killing child processes in windows: Pulumi dealing with the same issue: Situation in lazygit where we failed to kill processes due to a race condition: Also for the record I'm happy to ignore windows 7; I doubt many people are still using it. I'm also willing to require a change in lazygit in order to have the best outcome, though it is regrettable. |
Thanks for researching. I took a look at those issues. At first the buildkite method seemed good since it works based on the PID alone (no need for wrapping) but after testing it, it seems like it relies on all subprocesses reacting to the break and shutting themselves down, which may work but I don't know if it's correct to rely on it. It's really more about broadcasting a signal to so- called console processes rather than actually killing them. The pulumi situation is also a little different in that they control their subprocesses so they just ended up polling them via gRPC I guess. At this point I feel like the job solution with a fallback of killing the tree "manually" (but checking process creation time) is probably still the most robust solution, even though it does mean having a wrapper for cmd. I also saw this library https://github.com/kolesnikovae/go-winjob which has examples in which they take the extra step of starting a process suspended then resuming after the job is set, in order to guarantee it can't exit or spawn new processes before the job is assigned. This is probably an even more robust approach, but also increases the complexity further (not so much in the suspend part but certainly in the resume part). |
I would also like to add that this PR in its current state, while not a perfect solution, should dramatically improve the upon the current situation. At least it should make the "killing unrelated long-running processes" problem extremely unlikely--without making any other existing problems any worse. |
any news on this |
Note that I have been using a custom build including these changes on Windows for several months without issue. |
Ensure child creation time is after parent creation time to avoid linking unrelated processes into the kill chain (caused by reused PPids).
See jesseduffield/lazygit#3008
I've marked this as a draft because I've never used go before and wasn't able to test with lazygit, but running this code in isolation seemed to fix the issue for me.