Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain stdin on attach #991

Merged
merged 1 commit into from
Jan 11, 2023

Conversation

saschagrunert
Copy link
Member

@saschagrunert saschagrunert commented Jan 10, 2023

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

The Kubernetes e2e test flakes because of a token cancellation race between stdin and attach read:

[FAIL] [sig-cli] Kubectl client Simple pod [It] should support inline execution and attach
…
  << Timeline

  [FAILED] Expected
      <string>: read:stdin closed

  to contain substring
      <string>: read:value
  In [It] at: test/e2e/kubectl/kubectl.go:764 @ 01/04/23 15:09:48.618

https://github.com/kubernetes/kubernetes/blob/5cbd6960c805c22e1c741dde3ce9f2e10f869bce/test/e2e/kubectl/kubectl.go#L757-L766

The issue is that we cancel the token immediately which stops reading from attach. But the container command echo -n read: && cat && echo 'stdin closed' within the test provides additional data, which gets no time to be delivered.

We now drain stdin accordingly and wait for all data to be passed down to the receiver.

Which issue(s) this PR fixes:

Refers to #736

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?

Correctly drain stdin on attach.

Copy link
Member

@sohankunkerkar sohankunkerkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: saschagrunert, sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Contributor

Doesn't this just make flakes less likely? I am not familiar with all the details here, just a general suspicion of fixing race conditions with sleep(). Offhand wouldn't we need to read from the fd to end?

@saschagrunert
Copy link
Member Author

Doesn't this just make flakes less likely?

That's true. One way to fix it would be to increase the complexity by having cancellation tokens for input and output streams. I can follow-up on that, but I'm not sure if the effort is comparable to the test fix.

@rphillips
Copy link
Collaborator

+1 to not sleeping.

The cancellation routine could loop on the data read to drain the buffer, and return on error or EOF.

@haircommander
Copy link
Collaborator

+1 to not sleeping.

The cancellation routine could loop on the data read to drain the buffer, and return on error or EOF.

I think that's basically what current conmon does

The Kubernetes e2e test flakes because of a token cancellation race
between stdin and attach read:

```
[FAIL] [sig-cli] Kubectl client Simple pod [It] should support inline execution and attach
…
  << Timeline

  [FAILED] Expected
      <string>: read:stdin closed

  to contain substring
      <string>: read:value
  In [It] at: test/e2e/kubectl/kubectl.go:764 @ 01/04/23 15:09:48.618
```

The issue is that we cancel the token immediately which stops reading
from attach. But the container command `echo -n read: && cat && echo
'stdin closed'` within the test provides additional data, which gets no
time to be delivered.

We now drain stdin accordingly and wait for all data to be passed down
to the receiver.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
@saschagrunert saschagrunert changed the title Wait up to 1s for additional attach data Drain stdin on attach Jan 11, 2023
@saschagrunert
Copy link
Member Author

I think we can simply attach.read() and wait blocking because we already drop the read half in case of EOF. PTAL again @rphillips @haircommander @cgwalters

@rphillips
Copy link
Collaborator

lgtm

@haircommander
Copy link
Collaborator

/lgtm

good idea!

@openshift-ci openshift-ci bot added the lgtm label Jan 11, 2023
@openshift-merge-robot openshift-merge-robot merged commit 1a2248a into containers:main Jan 11, 2023
@saschagrunert saschagrunert deleted the attach-wait branch January 11, 2023 15:35
@saschagrunert saschagrunert restored the attach-wait branch January 12, 2023 14:23
@saschagrunert saschagrunert deleted the attach-wait branch January 12, 2023 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants