Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when reading/writing to closed pipe #636

Closed
sourishkrout opened this issue Jul 23, 2024 · 6 comments
Closed

Panic when reading/writing to closed pipe #636

sourishkrout opened this issue Jul 23, 2024 · 6 comments
Assignees
Labels
bug Something isn't working runnerv2

Comments

@sourishkrout
Copy link
Member

sourishkrout commented Jul 23, 2024

This is super easy to reproduce from the extension with a non-interactive cell. Once started, you click the stop button (see video), the runme kernel server dies due to a panic with the stack:

io: read/write on closed pipe
github.com/stateful/runme/v3/internal/runnerv2service.(*execution).Write
	/Users/sourishkrout/Projects/stateful/oss/runme/internal/runnerv2service/execution.go:215
github.com/stateful/runme/v3/internal/runnerv2service.(*runnerService).Execute.func1
	/Users/sourishkrout/Projects/stateful/oss/runme/internal/runnerv2service/service_execute.go:90
runtime.goexit
	/opt/homebrew/Cellar/go/1.22.5/libexec/src/runtime/asm_arm64.s:1222

Here's a log: https://gist.github.com/sourishkrout/ba1d6fac2fcdde464ec5a0bf4cbcb5e3

Screen.Recording.2024-07-23.at.3.35.10.PM.mov
@sourishkrout sourishkrout added bug Something isn't working runnerv2 labels Jul 23, 2024
@adambabik
Copy link
Collaborator

@sourishkrout how does the extension implement the "stop" button? We don't have a separate call so I assume that the gRPC stream does something to the existing client->server stream.

@adambabik
Copy link
Collaborator

From the logs it's clear that something like this happens:

  1. The client sends a ExecuteRequest message with the stop field.
  2. The client kills the existing client->server stream immediately.
  3. Server receives the request and attempts to stop the command but the server exits.

I was able to reproduce it and it seems that the problem is with the fact that the signal is sent to the whole process group. The server process and its subprocesses (commands) are in the same process group. I will send a PR to disable sending a signal to the group. The drawback is that when the command has its own subprocesses, they may leak and become zombie.

Also, we likely should fix the communication between the client and the server. The client should await a confirmation that the stop request was handled and close the connection only after that, or after a timeout. Otherwise, some command output might be missed.

@adambabik
Copy link
Collaborator

Consider the following solutions:

  • Create a new process group for the command
  • Handle SIGCHLD using SIG_IGN
  • Set the SA_NOCLDWAIT flag

@sourishkrout
Copy link
Member Author

@sourishkrout how does the extension implement the "stop" button? We don't have a separate call so I assume that the gRPC stream does something to the existing client->server stream.

Here's how. Btw, this the same for Runner v1 vs v2 (not saying it's correct):

https://github.com/stateful/vscode-runme/blob/main/src/extension/runner/index.ts#L635-L651

@sourishkrout
Copy link
Member Author

Also, we likely should fix the communication between the client and the server. The client should await a confirmation that the stop request was handled and close the connection only after that, or after a timeout. Otherwise, some command output might be missed.

Runner v2 is still behind a experiment flag so we could make APIs changes any time. In fact, now is the time if API changes are required to make this work.

sourishkrout pushed a commit that referenced this issue Jul 29, 2024
This change also fixes a regression that broken running interactive
commands, like runme in runme, run from the CLI in
da2c38c.

It also disables FIFO which should mitigate
#635 but a proper fix is needed.

Relates to #636
@sourishkrout
Copy link
Member Author

Fixed in #637.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working runnerv2
Projects
None yet
Development

No branches or pull requests

2 participants