Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test flake in test/cmd/images.sh #6259

Closed
0xmichalis opened this issue Dec 10, 2015 · 16 comments
Closed

Test flake in test/cmd/images.sh #6259

0xmichalis opened this issue Dec 10, 2015 · 16 comments
Assignees
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1

Comments

@0xmichalis
Copy link
Contributor

!!! Error in test/cmd/../../hack/cmd_util.sh:409
'return 1' exited with status 1
Call stack:
1: test/cmd/../../hack/cmd_util.sh:409 os::cmd::try_until_success(...)
2: test/cmd/images.sh:49 main(...)
Exiting with status 1
!!! Error in hack/test-cmd.sh:284

FAILURE after 60.332s: test/cmd/images.sh:49: executing 'oc get imagestreamtags wildfly:latest' expecting success; re-trying every 0.2s until completion or 60.000s: the command returned the wrong error code

https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/4284/console

@bparees
Copy link
Contributor

bparees commented Dec 14, 2015

This is an image import problem (or just a dockerhub unavailability problem, but at the end of the day the failure is that the imagestream did not import despite being created, whatever the reason). Routing to @pweil- to delegate.

@bparees bparees assigned pweil- and unassigned bparees Dec 14, 2015
@pweil- pweil- assigned soltysh and unassigned pweil- Dec 14, 2015
@stevekuznetsov
Copy link
Contributor

Similar issue here:

FAILURE after 60.011s: hack/../test/cmd/newapp.sh:21: executing 'oc get imagestreamtags mysql:latest' expecting success; re-trying every 0.2s until completion or 60.000s: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
Error from server: imageStreamTag "mysql:latest" not found

@deads2k
Copy link
Contributor

deads2k commented Dec 17, 2015

@stevekuznetsov volunteered and @soltysh is doing a rebase so, I'll reassign for now.

@stevekuznetsov
Copy link
Contributor

We never see the ImageImportController import the mysql image, but we see the rest of them before mysql in the create file in the log:

I1217 11:45:58.286003   31579 controller.go:61] Importing stream cmd-newapp/ruby...
I1217 11:45:58.983780   31579 controller.go:61] Importing stream cmd-newapp/nodejs...
I1217 11:45:59.329155   31579 controller.go:61] Importing stream cmd-newapp/perl...
I1217 11:46:00.400930   31579 controller.go:61] Importing stream cmd-newapp/php...
I1217 11:46:01.037134   31579 controller.go:61] Importing stream cmd-newapp/python...
I1217 11:46:02.023489   31579 controller.go:61] Importing stream cmd-newapp/wildfly...

The sever gets the shutdown signal milliseconds after the wilfdly image is imported, so maybe it would have also imported the rest of the images given enough time? We don't have timestamps on the oc create so I can't see why the minute wait is too short.

Also, we do not clean up well enough, it looks like. A build from test-cmd/admin is constantly being re-tried...

@bparees
Copy link
Contributor

bparees commented Dec 17, 2015

sounds like it's worth bumping it then, make it 2 minutes. that should future proof us for a while, anyway.

@knobunc
Copy link
Contributor

knobunc commented Dec 17, 2015

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/7886/

Running hack/../test/cmd/newapp.sh:21: executing 'oc get imagestreamtags mysql:latest' expecting success;FAILURE after 59.405s: hack/../test/cmd/newapp.sh:21: executing 'oc get imagestreamtags mysql:latest' expecting success; re-trying every 0.2s until completion or 60.000s: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
Error from server: imageStreamTag "mysql:latest" not found

@liggitt
Copy link
Contributor

liggitt commented Dec 18, 2015

The image import controller is hanging attempting to import tags from the internal registry, which is not running in test-cmd.sh (no node or registry is running).

The controller attempts twice, each time taking 30 seconds to time out.

Detailed logs at
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/7943/consoleFull
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/7943/artifact/origin/artifacts/cmd/cVhl/logs/test-cmd-master.log

The failed test is in the cmd-newapp namespace:

Running hack/../test/cmd/newapp.sh:14: executing 'oc create -f examples/image-streams/image-streams-centos7.json' expecting success...
SUCCESS after 0.193s: hack/../test/cmd/newapp.sh:14: executing 'oc create -f examples/image-streams/image-streams-centos7.json' expecting success
Running hack/../test/cmd/newapp.sh:16: executing 'oc new-app library/php mysql -o yaml' expecting success and text '3306'...
SUCCESS after 0.845s: hack/../test/cmd/newapp.sh:16: executing 'oc new-app library/php mysql -o yaml' expecting success and text '3306'
Running hack/../test/cmd/newapp.sh:17: executing 'oc new-app unknownhubimage -o yaml' expecting failure...
SUCCESS after 0.376s: hack/../test/cmd/newapp.sh:17: executing 'oc new-app unknownhubimage -o yaml' expecting failure
Running hack/../test/cmd/newapp.sh:19: executing 'oc new-app mongo -o yaml' expecting success and text 'library/mongo'...
SUCCESS after 0.643s: hack/../test/cmd/newapp.sh:19: executing 'oc new-app mongo -o yaml' expecting success and text 'library/mongo'
Running hack/../test/cmd/newapp.sh:21: executing 'oc get imagestreamtags mysql:latest' expecting success; re-trying every 0.2s until completion or 60.000s...
FAILURE after 59.806s: hack/../test/cmd/newapp.sh:21: executing 'oc get imagestreamtags mysql:latest' expecting success; re-trying every 0.2s until completion or 60.000s: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
Error from server: imageStreamTag "mysql:latest" not found

server logs referencing image streams start around I1218 01:36:39

I1218 01:36:39.283548   31508 controller.go:84] Updated all DeploymentConfigs for trigger on ImageStream cmd-newapp/ruby

Image import controller is still busy with other namespaces. You can see 30-second periods of inactivity followed by an i/o timeout trying to hit the internal registry (which is not running):

I1218 01:36:36.559221   31508 controller.go:61] Importing stream cmd-images/test-stream...
I1218 01:36:36.559258   31508 controller.go:130] Connecting to 172.30.181.223:5000...
I1218 01:36:36.559271   31508 controller.go:137] Fetching tags for test/test-stream...
...
I1218 01:37:06.559642   31508 controller.go:145] Error fetching tags for test/test-stream: error checking for V2 registry at https://172.30.181.223:5000/v2/: Get https://172.30.181.223:5000/v2/: dial tcp 172.30.181.223:5000: i/o timeout
I1218 01:37:06.559681   31508 controller.go:202] Importing tag installable from cmd-images/test-stream...
I1218 01:37:06.570774   31508 client.go:321] Found registry v2 API at https://registry-1.docker.io/v2/
E1218 01:37:07.127888   31508 factory.go:49] ImageStreamMapping "test-stream" is forbidden: namespace cmd-images does not exist
I1218 01:37:07.127919   31508 controller.go:61] Importing stream cmd-images/test-stream...
I1218 01:37:07.127934   31508 controller.go:130] Connecting to 172.30.181.223:5000...
I1218 01:37:07.127949   31508 controller.go:137] Fetching tags for test/test-stream...
...
I1218 01:37:37.128084   31508 controller.go:145] Error fetching tags for test/test-stream: error checking for V2 registry at https://172.30.181.223:5000/v2/: Get https://172.30.181.223:5000/v2/: dial tcp 172.30.181.223:5000: i/o timeout
I1218 01:37:37.128112   31508 controller.go:202] Importing tag installable from cmd-images/test-stream...
I1218 01:37:37.135108   31508 client.go:321] Found registry v2 API at https://registry-1.docker.io/v2/
E1218 01:37:37.395336   31508 factory.go:49] ImageStreamMapping "test-stream" is forbidden: namespace cmd-images does not exist

Image import controller finally starts importing from the cmd-newapp image streams almost a full minute after they are created, but by then it is too late:

I1218 01:37:37.395369   31508 controller.go:61] Importing stream cmd-newapp/nodejs...
I1218 01:37:37.395381   31508 controller.go:202] Importing tag 0.10 from cmd-newapp/nodejs...
I1218 01:37:37.400991   31508 client.go:321] Found registry v2 API at https://registry-1.docker.io/v2/
...
I1218 01:37:38.865441   31508 controller.go:61] Importing stream cmd-newapp/wildfly...
I1218 01:37:38.865455   31508 controller.go:202] Importing tag 8.1 from cmd-newapp/wildfly...
I1218 01:37:38.870064   31508 client.go:321] Found registry v2 API at https://registry-1.docker.io/v2/
...
I1218 01:37:39.355216   31508 controller.go:61] Importing stream cmd-newapp/jenkins...
I1218 01:37:39.355234   31508 controller.go:202] Importing tag 1 from cmd-newapp/jenkins...
I1218 01:37:39.361699   31508 client.go:321] Found registry v2 API at https://registry-1.docker.io/v2/
...
I1218 01:37:39.728955   31508 controller.go:61] Importing stream cmd-newapp/mongodb...
I1218 01:37:39.728975   31508 controller.go:202] Importing tag 2.6 from cmd-newapp/mongodb...
I1218 01:37:39.734687   31508 client.go:321] Found registry v2 API at https://registry-1.docker.io/v2/
...
I1218 01:37:40.158180   31508 controller.go:202] Importing tag 2.4 from cmd-newapp/mongodb...
...
I1218 01:37:40.469460   31508 controller.go:61] Importing stream cmd-newapp/php...
I1218 01:37:40.469474   31508 controller.go:202] Importing tag 5.5 from cmd-newapp/php...
I1218 01:37:40.474684   31508 client.go:321] Found registry v2 API at https://registry-1.docker.io/v2/
...
2015-12-18 01:37:41.062295 N | osutil: received terminated signal, shutting down...

@stevekuznetsov
Copy link
Contributor

If the reason for the flakes is that we hit a 30s timeout twice only because of the way the test environment is set up here, do we need to do anything other than changing the wait time? Is the logic for re-trying the import sound?

@liggitt
Copy link
Contributor

liggitt commented Dec 18, 2015

changing the wait time is not the right solution. I would like to do the following things:

  • shorten the timeout to 5-10 seconds (we retry import failures due to connect failures anyway)
  • parallelize the image import (up to some sane number, like 5-10)
  • stop the import controller from retrying import for things that have been deleted

@stevekuznetsov
Copy link
Contributor

stop the import controller from retrying import for things that have been deleted

There seem to be a lot of problems with this sort of behavior - the logs for test-cmd show builds, deployments, etc, still being re-tried after their project is deleted.

@deads2k
Copy link
Contributor

deads2k commented Dec 18, 2015

I'll parallelize and operate against live data after the cheap check

@stevekuznetsov
Copy link
Contributor

Looks like we sometimes import the stream but not the tag?

flake on getting an image stream tag here:

FAILURE after 59.775s: hack/../test/cmd/builds.sh:76: executing 'oc get is ruby-22-centos7' expecting any result and text 'latest'; re-trying every 0.2s until completion or 60.000s: the command timed out
Standard output from the command:
NAME              DOCKER REPO              TAGS      UPDATED
ruby-22-centos7   centos/ruby-22-centos7
... repeated 161 times
There was no error output from the command.

@eparis
Copy link
Member

eparis commented Jan 14, 2016

@danmcp
Copy link

danmcp commented Jan 21, 2016

Increasing priority due to prevalence:

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/bfa-proj-graphs/detailedgraphs

(scroll down)

@smarterclayton
Copy link
Contributor

The new flakes (caused by docker hub timing out) are covered in other issues, closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1
Projects
None yet
Development

No branches or pull requests