-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LimitRange defaults flake #11094
Comments
@derekwaynecarr are you working on this? If not I would like to take a stab at it? |
Please do. On Wednesday, October 5, 2016, Hemant Kumar notifications@github.com
|
bumping the priority as this can be seen more and more often. |
So far from my debugging, I have found following:
And yet the conformance test logs print:
Which makes it look like 5 minute timeout is sometimes not enough for namespace to be deleted. I am thinking of increasing Namespace deletion timeout just for this e2e test case. |
I have opened a PR in upstream repo - kubernetes/kubernetes#34614 |
So I ran some comparison between time taken to delete a namespace in openshift vs k8s: For OpenShift
For k8s:
This is running on same hardware btw. @liggitt What you think? Can we merge the flake PR? |
those times are in seconds? is this test creating any openshift resources or just kube resources? |
The times are in seconds yes. The tests are only creating kube resources. In fact - e2e test file(limit_range.go) in kube and origin are identical. But when I did |
@derekwaynecarr do you know why the original namespace finalizer would add 1-2 minutes to clean up a namespace containing no origin resources? Seems like it would be a bunch of list calls returning nothing, then a single finalize call to remove the origin finalizer. Both controllers already retry the finalize call on conflict errors. |
Just to make it clearer - please take above benchmarks with pinch of salt. I only posted highest values I saw. The time taken varies between 20s to 240s in case of origin and 20s to 205s in case of k8s. I am looking at bits of code that gets invoked when a namespace is deleted. |
Right, but I'd expect 2-3 seconds overhead for the origin namespace max |
@liggitt @derekwaynecarr so I did some more debugging around this and found that - origin finalizer is actually running multiple times when a namespace is deleted. And it isn't running multiple times because previous invocation of finalizer failed or something, but it is running even if previous invocation deleted all the origin resources and was successful. So basically this https://github.com/openshift/origin/blob/master/pkg/project/admission/lifecycle/admission.go#L52 code runs again and again (like 5-10 times) while the namespace is being deleted and it adds back the previously deleted openshift finalizer. I tried printing attributes of admission when this happens and got something like:
|
caused by the project lifecycle admission plugin re-adding the origin finalizer when admitting creations to a terminating namespace the plugin used to check if the namespace was terminating and reject out-of-hand, but that meant a rapid (delete ns, create ns, create resource) would either fail (if the plugin rejected if it thought the ns was terminating), or incorrectly not add the origin finalizer (if the plugin skipped adding the finalizer if it thought the ns was terminating) mustfix for 1.4 |
debugged back to before the rebase onto kube 1.4 and still encountered the issue |
as seen in:
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance/6466/consoleFull
The text was updated successfully, but these errors were encountered: