-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing(and broken?) deployment behavior #18293
Comments
It should wait for the image to be injected on the first deployment. In addition we now have explicit check not to deploy when empty image is " ". So if it was really caused by that, it's not intentional. I am out today but will check master logs when I am back what caused those deployments. |
this happened again here: build starts:
build completes/frontend-1 deployment starts
frontend deployment 2 starts a few seconds later and cancels deployment 1
|
Finally managed to download logs from the first case; (conference wifi sucks) Both are caused by config change and around the same time.
@mfojtik this reminds me the case of the registry decorator. Once it injects image with registry IP and the second one with registry DNS name. WDYT? |
It does look like the registry url is not configured:
i'm not sure why that is, seems like the extended test job is not deploying the registry properly (ie with a registry url configured). Which is odd since I thought it used the ansible installer to set up the cluster and the ansible installer definitely sets that value. i will dig into this some more from my side as well. |
Though the ansible installer does appear to have set it:
it looks like there is an issue w/ the registry not respecting that value, i've opened a PR to temporarily revert a recent ansible change as well as an issue against the registry since it should be respecting it. openshift/openshift-ansible#6913 assigning to myself until i see evidence that these multiple deploys still happen after the reversion. thanks for the triage, @tnozicka |
@bparees double-check what version of registry we run in the CI, didn't you consolidated the variables recently? We can perhaps fix the create call for image stream to automatically rewrite the pull spec, similar to decorator... If you change the DNS for registry later, you will be broken anyway, but we can fix the case of double-triggered DC. |
I fixed ansible to use what is supposed to be the "current" variable for setting the url, but the registry doesn't appear to actually respect that variable (hence the issue i opened above, and reverting the change in ansible for now).
you mean rewrite it based on the hostname in the push request like we discussed the other day? yeah, that would be ideal, but for now i think the short term solution is still to make sure the master+registry are aligned on what the url is, and at the moment i think the problem is that the registry isn't getting the right url set/reading the variable correctly (per the issue i opened). |
This looks to be fixed by my reverting the ansible changes, so we'll chase it from there. |
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18278/test_pull_request_origin_extended_image_ecosystem/450/
specifically the
[image_ecosystem][ruby][Slow] hot deploy for openshift ruby image Rails example should work with hot deploy [Suite:openshift] 10m29s
testIn this test we create a buildconfig and DC that is triggered by that build. The DC also has a config change trigger.
We expect exactly 1 deployment to occur (when the build completes), but we appear to be getting two. This did not used to happen.
Looking at the events, we see:
the build starts at 23:09:25
Roughly simultaneously, the build completes and a deployment 2 is created (not sure where the event for deployment 1 is??) which cancels deployment 1. Presumably deployment 2 is being triggered by the newly built/pushed image.
So my question is:
Assuming deployment 2 was caused by the build completion, what caused deployment 1? Was it the config change trigger (despite the imagestreamtag not existing yet?)
Should deployment 1 have been created? It didn't used to work that way.
you can see the template w/ the DC in question here:
https://github.com/openshift/rails-ex/blob/6a59aa15bf863fde71e0bbfa43c5344290eed8f6/openshift/templates/rails-postgresql.json#L146
And note that this test just ran, so it should have included the informer stale cache fix.
My impression is that deployment 1 got created by the configchangetrigger, and then hung waiting for the imagestreamtag to resolve. When the build updates the imagestreamtag, deployment 1 starts to proceed, but gets canceled because deployment 2 is triggered.
@tnozicka @mfojtik @Kargakis
(marking as bug because it appears to be a change in behavior. if it's expected we can live with it, but i need to know it's intentional)
The text was updated successfully, but these errors were encountered: