Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new-build test flake #6818

Closed
0xmichalis opened this issue Jan 25, 2016 · 20 comments
Closed

new-build test flake #6818

0xmichalis opened this issue Jan 25, 2016 · 20 comments
Assignees
Labels
component/build kind/test-flake Categorizes issue or PR as related to test flakes. priority/P3

Comments

@0xmichalis
Copy link
Contributor

FAILURE after 30.239s: hack/../test/cmd/builds.sh:68: executing 'oc new-build -D "FROM centos:7" -o json | python -m json.tool' expecting success: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
error: only a partial match was found for "centos:7": "cmd-builds/centos"
No JSON object could be decoded
!!! Error in hack/../test/cmd/../../hack/cmd_util.sh:195
    'return 1' exited with status 1
Call stack:
    1: hack/../test/cmd/../../hack/cmd_util.sh:195 os::cmd::expect_success(...)
    2: hack/../test/cmd/builds.sh:68 main(...)
Exiting with status 1
!!! Error in hack/test-cmd.sh:289
    '${test}' exited with status 1
Call stack:
    1: hack/test-cmd.sh:289 main(...)

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin/9382/consoleFull

@0xmichalis
Copy link
Contributor Author

@bparees
Copy link
Contributor

bparees commented Jan 25, 2016

that doesn't seem like a flake. i would expect that to always fail if it's going to fail.

@bparees
Copy link
Contributor

bparees commented Jan 25, 2016

alternatively it flaked because dockerhub didn't respond so we didn't find the "centos:7" exact match on dockerhub i guess.

@bparees
Copy link
Contributor

bparees commented Jan 25, 2016

yeah only conclusion i can reach here is dockerhub flake.

@mfojtik
Copy link
Contributor

mfojtik commented Jan 26, 2016

maybe we should improve the error message in that case?

@bparees
Copy link
Contributor

bparees commented Jan 26, 2016

@mfojtik how do you propose doing that?

the only improvement I can see is that if we fail to reach dockerhub, we ought to report that as a warning on the output. (arguably it ought to even be an error because if we exact-match something else when we would have exact matched on dockerhub but we couldn't reach it, we're going to get unpredictable behavior. The downside to making it an error is a customer who has no access to dockerhub would always just get that error).

@smarterclayton is this behavior(when dockerhub is unreachable) changing w/ all your refactoring?

@mfojtik
Copy link
Contributor

mfojtik commented Jan 26, 2016

@bparees can we check if you have access to dockerhub by pinging it? it that fails, we don't consider this as an error. If dockerhub is available and we fail to pull the image, we report that as an error to user.

@bparees
Copy link
Contributor

bparees commented Jan 26, 2016

@mfojtik if dockerhub is unavailable that doesn't mean it's always unavailable. so it could still be an error.

@smarterclayton
Copy link
Contributor

My refactoring should take it into account, but is not complete. Basically
it needs to be possible to find, retrieve, and list outcomes even in the
presence of failures. But, the presence of a failure should not result in
a materially different outcome - i.e. the registry being connection denied
is not the same error as getting a not found on an endpoint or an
"unauthorized".

On Tue, Jan 26, 2016 at 10:09 AM, Ben Parees notifications@github.com
wrote:

@mfojtik https://github.com/mfojtik if dockerhub is unavailable that
doesn't mean it's always unavailable. so it could still be an error.


Reply to this email directly or view it on GitHub
#6818 (comment).

@bparees
Copy link
Contributor

bparees commented Jan 26, 2016

But, the presence of a failure should not result in a materially different outcome

I don't follow that statement.

if i have a "centos" imagestream and there is a "centos" dockerhub image, then under normal circumstances (i can reach dockerhub) i'd expect a "multiple matches" error (there are two exact matches).

if dockerhub is unreachable because my network is temporarily down, there are two possible outcomes:

  1. we match the imagestream and move on (with or without a warning)
  2. we error out saying "we couldn't reach dockerhub so we can't guarantee we've searched all options. if you want to ignore this error, specify --no-dockerhub or specify the imagestream to use explicitly"

if dockerhub is unreachable because we got a permission denied error we also have the same possible outcomes, though we could make more guesses about the "right" thing to do since we can assume it's a permanent failure to reach dockerhub, not a temporary one.

@danmcp danmcp added component/build kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2 labels Jan 26, 2016
@bparees
Copy link
Contributor

bparees commented Jan 26, 2016

p3 because (I believe) it's a dockerhub flake and tests should/will always fail if we can't reach dockerhub. leaving this issue open to address the question of new-app behavior when dockerhub is unreachable.

@smarterclayton
Copy link
Contributor

"materially different outcome" = we should not create an app that we would not normally create if a transient error occurs

The other statement is that if one or more sources is permafail, it should always be possible to create an app on at least one match (which I think the flags would allow today, but we probably need more test cases in that path).

@smarterclayton
Copy link
Contributor

Ugh, it gets even worse:

FAILURE after 15.406s: hack/../test/cmd/builds.sh:64: executing 'oc new-build -D $'FROM centos:7' --no-output' expecting success: the command returned the wrong error code
There was no output from the command.
Standard error from the command:
error: only a partial match was found for "centos:7": "centos:7"

Looks like it thinks an exact match is a partial match? Probably another bug caused by a flake, but in the other direction.

@bparees
Copy link
Contributor

bparees commented Jan 27, 2016

i don't get why it would think that one is a partial match. we need to print out more match information on partial matches, like we do for multiple matches.

@stevekuznetsov
Copy link
Contributor

More nonsense matching:

error: only a partial match was found for "centos:7": "cmd-builds/centos"

@bparees
Copy link
Contributor

bparees commented Jan 27, 2016

@stevekuznetsov that's the original error reported on this issue....

@bparees
Copy link
Contributor

bparees commented Jan 27, 2016

@smarterclayton can you include more information for partial matches in the error handling changes you're making? i'd expect it to print something more like "partial match for foo: [imagestream/image/template] [namespace]/foo:bar"

@smarterclayton
Copy link
Contributor

smarterclayton commented Jan 27, 2016 via email

@bparees bparees assigned smarterclayton and unassigned bparees Jan 27, 2016
@bparees
Copy link
Contributor

bparees commented Jan 27, 2016

@smarterclayton is heavily reworking error handling right now so assigning to him in hopes he'll address how we handle dockerhub failures. Note that we should probably be reporting failures to interrogate dockerhub, also. (as a warning).

@smarterclayton
Copy link
Contributor

This hasn't happened since the last failure, and is basically either the etcd flake or the image import flake. Simplifying down to one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/build kind/test-flake Categorizes issue or PR as related to test flakes. priority/P3
Projects
None yet
Development

No branches or pull requests

6 participants