-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Cmd IO handling #1937
Update Cmd IO handling #1937
Conversation
I'm having trouble verifying what you're fixing here. Can you give more detail on the bug you're addressing? From what I can see, job containers use an |
thats a typo on my side, sorry ( Ultimately, that means waiting on a |
// Terminate the IO if the copy does not complete in the requested time. | ||
timeoutErrCh := make(chan error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having a separate channel for timeout error, can we just have a separate error variable that will be set only in the <-t.C:
block? That way we don't need to worry about closing the channel and the following code block:
if tErr := <-timeoutErrCh; ioErr == nil {
ioErr = tErr
}
can just be
if ioErr == nil {
ioErr = tErr
}
This approach seems easier to read/understand for me at least. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to block on timeoutErrCh
regardless, so we need some mechanism to have Wait
actually wait for the timeout
the alternatives (imo) would be to add a function that selects
on <-allDoneCh
and ctx.Done()
, and pass it a context with the CopyAfterExitTimeout
or something similar, which i would rather save for when we refactor Cmd
wholesale to properly thread contexts everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on some testing I'm a little confused about the error handling behavior you're fixing.
I created a test that opens a connected Reader
/Writer
, then closes the Writer
, and attempts a read from the Reader
. I tried this in cases where they were backed by os.Pipe
, and by a named pipe (created via winio
). In both cases, the read returned io.EOF
.
Do you recall some more context on what you were seeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment (about using variables instead of channels) that might help simplify the (c *Cmd) Start()
function. But don't want to hold the PR for that. LGTM otherwise.
If you run the host process IO tests from #1964 (specifically (test src) |
I dug into this a bit. I learned that After investigating, it seems you've actually found a bug in how job containers handle IO pipes :D So these test cases like The real problem here is that job containers are not following the contract expected by
In the case of job containers (or anything using The proper fix here is to not close the parent end of the pipes when the process exits, and let it be handled by |
Here's the PR for the fix I mentioned: #2021 |
#2021 has been merged. You should be unblocked now. |
Update `Cmd.Wait` to return a known error value if it times out waiting on IO copy after the command exits (and update `TestCmdStuckIo` to check for that error). Prior, the test checked for an `io.ErrClosedPipe`, which: 1. is not the best indicator that IO is stuck; and 2. is now ignored as an error value raised during IO relay. Update `stuckIOProcess` logic in `cmd_test.go` to mirror logic in `interal/exec.Exec`, using `os.Pipe` for std io that returns an `io.EOF` (instead of `io.Pipe`, which does not). Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Update `Cmd.Wait` to return a known error value if it times out waiting on IO copy after the command exits (and update `TestCmdStuckIo` to check for that error). Prior, the test checked for an `io.ErrClosedPipe`, which: 1. is not the best indicator that IO is stuck; and 2. is now ignored as an error value raised during IO relay. Update `stuckIOProcess` logic in `cmd_test.go` to mirror logic in `interal/exec.Exec`, using `os.Pipe` for std io that returns an `io.EOF` (instead of `io.Pipe`, which does not). Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Have
internal\cmd.Cmd
ignore relay and close errors from the underlying IO channel being closed, since not allio.Reader
/io.Writer
s return anio.EOF
if the are closed during an IO operation.This standardizes behavior between an
hcs
/gcs
Process
(which use ago-winio.win32File
for their IO channels, and returnio.EOF
for read and write operations on a closed handle) andJobProcess
(which uses anos.Pipe
that instead return anos.ErrClosed
).Additionally, ignore errors from closing an already-closed std IO channel.
Update
Cmd.Wait
to return a known error value if it times out waiting on IO copy after the command exits (and updateTestCmdStuckIo
to check for that error).Prior, the test checked for an
io.ErrClosedPipe
, which: