-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect volume name when reusing PVCs #1122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the label fix, maybe not that much with the e2e test approach.
At first I thought the storageClass patch was not the right way to solve the problem: patching our e2e tests to patch the default GKE storageClass does not help ECK users to not have this problem.
But looking more at the doc around PVCs it looks like using PVs without volumeBindingMode: waitForFirstConsumer
is kind of broken by design. There is no other way to deal with zonal volumes and affinity rules.
I think maybe we should stick to using GKE 1.11 by default (it's still GCP default?). Skip the multi-PV test on k8s<1.12, create a new storageClass with a custom name with volumeBindingMode: waitForFirstConsumer
only in the test that requires it, and use it only there.
This way other tests still rely on GKE defaults, and help us notice anything wrong for a "default" usage?
operators/test/e2e/failure_test.go
Outdated
} | ||
|
||
func TestKillCorrectPVReuse(t *testing.T) { | ||
s := stack.NewStackBuilder("test-failure-pvc"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should maybe patch the storageClass for this test only (like: in the test).
Also skip this test if the k8s version we are testing against is <1.12.
I think we should take into consideration the pod name label when retrieving several PVCs for a single pod. |
Do you think we should try to address this in this PR? |
I'm fine with doing it in a follow-up PR, but I think #877 cannot be considered fixed until we do it (or another issue is needed). |
@sebgl I think I have a slight preference for doing another PR. I changed this PRs description so that it does not auto-close the issue. |
@sebgl can you take another look at this PR. I tried to address your feedback. |
@sebgl I removed the external definition of a provider specific storage class and the corresponding flag. Instead I am using the existing default storage class as a template to create a derivative with late volume binding as discussed. 👍 for the idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes 👍 it's nice to keep these tests independent of any cloud provider.
I left 2 minor comments, and one I think is very important (hence "changes requested"). Otherwise LGTM.
} | ||
for _, pod := range pods { | ||
if stringsutil.StringInSlice(pod.Name, survivingPodNames) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point chances are the deleted pod is not back into the cluster yet, so we only iterate on pods we don't care about, continue on each one, then return nil? Which skips the test entirely?
Should we run WithSteps(stack.CheckStackSteps(s, k)...).
first, then this test? Also probably simpler to get the expected pod through its name directly instead of filtering on pods we don't care about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was pure gold. You should get a🥇 for that. It surfaced that the actual fix was not fixing anymore since I merged master into this branch 😞 which was hidden by this flaw in the test ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Part of #877
I ran into an issue with scheduling pods with more than one PVC, because we use
volumeBindingMode: Immediate
more often than not the volumes would be allocated in two different zones which meant that the pod became unschedulable as it cannot be in two zones at once ...The solution I found was to switch to
volumeBindingMode: WaitForFirstConsumer
which creates the volumes only once the pod has been scheduled to a node. The only problem with this is that in k8s 1.11 this does not work because volumes need to be pre-provisioned. In 1.12DynamicProvisioningScheduling
is no longer feature gated and volume provisioning with late binding works as expected. So this PR alsoWaitForFirstConsumer