-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix single rule deletion for NodePortLocal on Linux #6284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
// If conditionFn is nil, we will assume you are looking for a non-existing annotation. | ||
// If you want to match all, conditionMatchAll as the conditionFn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If you want to match all, conditionMatchAll as the conditionFn. | |
// If you want to match all, use conditionMatchAll as the conditionFn. |
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes antrea-io#6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
86ca00c
to
6386d2c
Compare
// This function will be executed synchronously when DeleteRule is called for the first time | ||
// and we simulate a failure. It restores the second target port for the Service, which was | ||
// deleted previously, and waits for the change to be reflected in the informer's | ||
// store. After that, we know that the next time the NPL controller processes the test Pod, | ||
// it will need to ensure that both NPL mappings are configured correctly. Because one of | ||
// the rules will be marked as "defunct", it will first need to delete the rule properly | ||
// before adding it back. | ||
restoreServiceTargetPorts := func() { | ||
testSvc.Spec.Ports = ports | ||
_, err := testData.k8sClient.CoreV1().Services(defaultNS).Update(context.TODO(), testSvc, metav1.UpdateOptions{}) | ||
if !assert.NoError(t, err) { | ||
return | ||
} | ||
assert.EventuallyWithT(t, func(c *assert.CollectT) { | ||
obj, exists, err := testData.svcInformer.GetIndexer().GetByKey(testSvc.Namespace + "/" + testSvc.Name) | ||
if !assert.NoError(t, err) || !assert.True(t, exists) { | ||
return | ||
} | ||
svc := obj.(*corev1.Service) | ||
assert.Len(t, svc.Spec.Ports, 2) | ||
}, 2*time.Second, 50*time.Millisecond) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tnqn do you think this is ok? I couldn't think of an alternative, besides moving these tests to pkg/agent/nodeportlocal/k8s
and calling handleAddUpdatePod
directly, which would have been a bigger change and I wanted to keep this patch small. I am planning to re-organize some of the NPL code in a subsequent PR (that won't be backported), so maybe I can improve this test as part of the re-org PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current test works for me. The only way that can makes it simper I can think of is to construct a defunct entry in advance, then verify it won't be reused and will be deleted eventually once iptables succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, but it would require a test-specific function (exported) to mark the entry as defunct. I'll keep that in mind for the refactoring.
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// This function will be executed synchronously when DeleteRule is called for the first time | ||
// and we simulate a failure. It restores the second target port for the Service, which was | ||
// deleted previously, and waits for the change to be reflected in the informer's | ||
// store. After that, we know that the next time the NPL controller processes the test Pod, | ||
// it will need to ensure that both NPL mappings are configured correctly. Because one of | ||
// the rules will be marked as "defunct", it will first need to delete the rule properly | ||
// before adding it back. | ||
restoreServiceTargetPorts := func() { | ||
testSvc.Spec.Ports = ports | ||
_, err := testData.k8sClient.CoreV1().Services(defaultNS).Update(context.TODO(), testSvc, metav1.UpdateOptions{}) | ||
if !assert.NoError(t, err) { | ||
return | ||
} | ||
assert.EventuallyWithT(t, func(c *assert.CollectT) { | ||
obj, exists, err := testData.svcInformer.GetIndexer().GetByKey(testSvc.Namespace + "/" + testSvc.Name) | ||
if !assert.NoError(t, err) || !assert.True(t, exists) { | ||
return | ||
} | ||
svc := obj.(*corev1.Service) | ||
assert.Len(t, svc.Spec.Ports, 2) | ||
}, 2*time.Second, 50*time.Millisecond) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current test works for me. The only way that can makes it simper I can think of is to construct a defunct entry in advance, then verify it won't be reused and will be deleted eventually once iptables succeeds.
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes antrea-io#6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes antrea-io#6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes antrea-io#6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes antrea-io#6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes #6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes #6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes #6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop. The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue. This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of. Fixes #6281 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
The logic for deleting an individual NPL mapping was broken. It incorrectly believed that the protocol socket was still in use, and the mapping could never be deleted, putting the NPL controller in an endless error loop.
The State field in ProtocolSocketData was left over from pre Antrea v1.7, back when we would always use the same port number for multiple protocols, for a give Pod IP + port. With the current version of the NPL implementation, this field is not needed and should be removed. By removing the field, we avoid the deletion issue.
This patch also ensures that if a rule is only partially cleaned-up, we can attempt to delete it again, by making DeleteRule idempotent. To identify that a prior deletion attempt failed, we introduce a "defunct" field in the NPL rule data. If this field is set, the controller knows that the rule has been partially deleted and deletion needs to be attempted again. Without this, it would be possible for the controller (with the right sequence of updates) to assume that a partially-deleted rule is still valid, which would break the datapath. I plan on improving the NPL code further with a follow-up patch, but in order to keep this patch small (for back-porting), I went with the simplest solution I could think of.
Fixes #6281