Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended.[k8s.io] Kubectl client [k8s.io] Kubectl expose should create services for rc [Conformance] #9444

Closed
csrwng opened this issue Jun 20, 2016 · 18 comments
Assignees
Labels
area/tests component/kubernetes kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2

Comments

@csrwng
Copy link
Contributor

csrwng commented Jun 20, 2016

Test flake: kubectl expose test fails while waiting for redis container

Version

v1.3.0-alpha.1-380-g4965f56

Steps To Reproduce
  1. Run extended tests core
Current Result

Extended test failure:

• Failure [111.412 seconds]
[k8s.io] Kubectl client
/data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/test/e2e/framework/framework.go:505
  [k8s.io] Kubectl expose
  /data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/test/e2e/framework/framework.go:505
    should create services for rc [Conformance] [It]
    /data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/test/e2e/kubectl.go:798

    Jun 20 15:08:44.916: No pods matched the filter.
Expected Result

Test passes

Additional Information

Times out waiting for redis RC:

STEP: creating Redis RC
Jun 20 15:07:14.515: INFO: namespace e2e-tests-kubectl-h1cxa
Jun 20 15:07:14.515: INFO: Running '/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl --server=https://172.18.14.213:8443 --kubeconfig=/tmp/openshift/openshift/test-extended/core/openshift.local.config/master/admin.kubeconfig create -f /data/src/github.com/openshift/origin/Godeps/_workspace/src/k8s.io/kubernetes/test/e2e/testing-manifests/kubectl/redis-master-controller.json --namespace=e2e-tests-kubectl-h1cxa'
Jun 20 15:07:14.902: INFO: stderr: ""
Jun 20 15:07:14.902: INFO: stdout: "replicationcontroller \"redis-master\" created\n"
STEP: Waiting for Redis master to start.
Jun 20 15:07:15.968: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:07:15.968: INFO: Found 0 / 1
Jun 20 15:08:38.908: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:08:38.908: INFO: Found 0 / 1
Jun 20 15:08:39.942: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:08:39.942: INFO: Found 0 / 1
Jun 20 15:08:40.905: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:08:40.905: INFO: Found 0 / 1
Jun 20 15:08:41.905: INFO: Selector matched 1 pods for map[app:redis]
...
Jun 20 15:08:41.905: INFO: Found 0 / 1
Jun 20 15:08:42.905: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:08:42.905: INFO: Found 0 / 1
Jun 20 15:08:43.905: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:08:43.905: INFO: Found 0 / 1
Jun 20 15:08:44.905: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:08:44.905: INFO: Found 0 / 1
Jun 20 15:08:44.910: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:08:44.910: INFO: Found 0 / 1
Jun 20 15:08:44.910: INFO: WaitFor completed with timeout 1m30s.  Pods found = 0 out of 1
Jun 20 15:08:44.916: INFO: Selector matched 1 pods for map[app:redis]
Jun 20 15:08:44.916: INFO: No pods matched the filter.

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance/2306/

@ncdc
Copy link
Contributor

ncdc commented Jun 21, 2016

The openshift log was truncated at 20MB and doesn't contain anything related to the failing test case. The Jenkins console has this as the last event related to the pod:

Jun 20 15:08:44.921: INFO: At {2016-06-20 15:07:35 -0400 EDT} - event for redis-master-h8d1a: {kubelet 172.18.14.213} Pulling: pulling image "gcr.io/google_containers/redis:e2e"

I'm guessing this is a gcr.io flake but without logs, I can't say for sure. Please reopen if this happens again, and maybe we'll have more useful logs.

@ncdc ncdc closed this as completed Jun 21, 2016
@0xmichalis
Copy link
Contributor

@ncdc
Copy link
Contributor

ncdc commented Jun 24, 2016

This is what I suspected - pulling the image is just taking too long.

Jun 24 05:02:01.070: INFO: stdout: "replicationcontroller \"redis-master\" created\n"
STEP: Waiting for Redis master to start.
[...]
Jun 24 05:03:31.077: INFO: WaitFor completed with timeout 1m30s.  Pods found = 0 out of 1
Jun 24 05:03:31.084: INFO: At {2016-06-24 05:02:01 -0400 EDT} - event for redis-master: {replication-controller } SuccessfulCreate: Created pod: redis-master-ukphz
Jun 24 05:03:31.084: INFO: At {2016-06-24 05:02:01 -0400 EDT} - event for redis-master-ukphz: {default-scheduler } Scheduled: Successfully assigned redis-master-ukphz to 172.18.12.236
Jun 24 05:03:31.084: INFO: At {2016-06-24 05:02:16 -0400 EDT} - event for redis-master-ukphz: {kubelet 172.18.12.236} Pulling: pulling image "gcr.io/google_containers/redis:e2e"
Jun 24 05:03:31.084: INFO: At {2016-06-24 05:03:29 -0400 EDT} - event for redis-master-ukphz: {kubelet 172.18.12.236} Pulled: Successfully pulled image "gcr.io/google_containers/redis:e2e"
Jun 24 05:03:31.084: INFO: At {2016-06-24 05:03:30 -0400 EDT} - event for redis-master-ukphz: {kubelet 172.18.12.236} Created: Created container with docker id 707e37a9807e
Jun 24 05:03:31.085: INFO: At {2016-06-24 05:03:30 -0400 EDT} - event for redis-master-ukphz: {kubelet 172.18.12.236} Started: Started container with docker id 707e37a9807e

Upstream has taken to pre-pulling all the images used in tests to avoid situations like this.

@ncdc
Copy link
Contributor

ncdc commented Jun 24, 2016

@ncdc ncdc assigned aveshagarwal and unassigned ncdc Jun 24, 2016
@ncdc
Copy link
Contributor

ncdc commented Jun 24, 2016

@derekwaynecarr do you think we should just try to run the upstream image puller manifest in origin's e2e test?

@derekwaynecarr
Copy link
Member

👍 on pre-pulling e2e images

@ncdc
Copy link
Contributor

ncdc commented Jun 24, 2016

Ok, we can't reuse the upstream e2e-image-puller pod as is, at least not on Fedora 24, because our /usr/bin/docker is dynamically linked, and trying to run it bind-mounted into a busybox container results in unresolved shared libraries. I modified the manifest to use fedora:24 and everything is pulling as it should be. We just need to decide how we want to approach this. I assume we don't have the e2e-image-puller.manifest file available by default, so maybe we'll need to download it, run sed to change the image, and then create it.

It's also pulling the images serially, which might not be optimal from a timing perspective. It took about 6.5 minutes on my 50Mbps FIOS connection.

@ncdc
Copy link
Contributor

ncdc commented Jun 24, 2016

It also looks like it's over 5GB of image data, according to docker info.

@bparees
Copy link
Contributor

bparees commented Jun 29, 2016

@ncdc
Copy link
Contributor

ncdc commented Jun 29, 2016

We have a wip pr to pre pull

On Wednesday, June 29, 2016, Ben Parees notifications@github.com wrote:

hit again here:

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance/2775/consoleFull


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#9444 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAABYskRtOkCxMGVe0BsztILQDGKySBgks5qQu6TgaJpZM4I6HLk
.

@mfojtik
Copy link
Contributor

mfojtik commented Jun 30, 2016

@bparees
Copy link
Contributor

bparees commented Jun 30, 2016

@ncdc
Copy link
Contributor

ncdc commented Jun 30, 2016

No need to keep linking 😄

@bparees
Copy link
Contributor

bparees commented Jun 30, 2016

@ncdc sorry for the spam, i forgot this one was actually understood/being fixed. :)

@bparees
Copy link
Contributor

bparees commented Jul 2, 2016

We have a wip pr to pre pull

is that wip PR linked to this issue anywhere? or it's still upstream?

@ncdc
Copy link
Contributor

ncdc commented Jul 2, 2016

#9622

On Saturday, July 2, 2016, Ben Parees notifications@github.com wrote:

We have a wip pr to pre pull

is that wip PR linked to this issue anywhere? or it's still upstream?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9444 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAABYnEOF3pucsG5B9zZqly4olrpEOD4ks5qReOdgaJpZM4I6HLk
.

@bparees
Copy link
Contributor

bparees commented Jul 2, 2016

@ncdc thanks. i took the liberty of updating that PR to indicated it'll fix this issue.

@bparees bparees changed the title extended test flake: kubectl expose conformance test fails with a timeout waiting for redis Extended.[k8s.io] Kubectl client [k8s.io] Kubectl expose should create services for rc [Conformance] Dec 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests component/kubernetes kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants