Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPAllocated and IPAssigned conditions to Egress status #5282

Merged
merged 2 commits into from
Oct 23, 2023

Conversation

AJPL88
Copy link
Contributor

@AJPL88 AJPL88 commented Jul 20, 2023

When EgressIP is successfully allocated by antrea-controller, IPAllocated condition in Egress status is updated

When Egress is assigned to a Node by antrea-agent, IPAssigned condition in Egress status is updated

For #4614

build/charts/antrea/crds/egress.yaml Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/apis/crd/v1alpha2/types.go Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/apis/crd/v1alpha2/types.go Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/memberlist/cluster.go Outdated Show resolved Hide resolved
pkg/agent/memberlist/cluster.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/memberlist/cluster.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/memberlist/cluster.go Outdated Show resolved Hide resolved
pkg/apis/crd/v1alpha2/types.go Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
@luolanzone luolanzone added this to the Antrea v1.14 release milestone Jul 27, 2023
@AJPL88 AJPL88 force-pushed the add-egress-condition branch from 8b85374 to 1270a43 Compare July 31, 2023 06:37
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IPAssigned's handling almost looks good to me. We could do something similar for IPAllocated to be more robust.

pkg/agent/controller/egress/egress_controller.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller_test.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
@@ -766,3 +766,81 @@ func checkExternalIPPoolUsed(t *testing.T, controller *egressController, poolNam
})
assert.NoError(t, err)
}

func TestEgressStatus(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the above change, we could just have an unit test for updateEgressAllocatedCondition, which just focus on validating the condition based on given egress and error.

pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
pkg/controller/egress/controller.go Show resolved Hide resolved
pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
tnqn
tnqn previously approved these changes Aug 3, 2023
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's see if all tests pass

@tnqn
Copy link
Member

tnqn commented Aug 3, 2023

/test-all

@tnqn
Copy link
Member

tnqn commented Aug 3, 2023

unit test is failing

@AJPL88
Copy link
Contributor Author

AJPL88 commented Aug 3, 2023

I’ll take a look and try to resolve the issues.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, tests could be simpler and more focused.

pkg/agent/controller/egress/egress_controller_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/egress/egress_controller_test.go Outdated Show resolved Hide resolved
assert.NoError(t, err)

if tt.expectedUpdate {
assert.Eventually(t, func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syncEgress is synchronous, there is no need to perform eventual check.

@@ -766,3 +766,100 @@ func checkExternalIPPoolUsed(t *testing.T, controller *egressController, poolNam
})
assert.NoError(t, err)
}

func TestEgressAllocatedStatus(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little duplicate to call other functions addEgress, syncEgress in this test. Since we add updateEgressAllocatedCondition, the test could focus on it:

func TestUpdateEgressAllocatedCondition(t *testing.T) {
	tests := []struct {
		name           string
		inputEgress    *v1beta1.Egress
		inputErr       error
		expectedStatus v1beta1.EgressStatus
	}{
		...
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			stopCh := make(chan struct{})
			defer close(stopCh)
			controller := newController(nil, []runtime.Object{tt.inputEgress})
			controller.updateEgressAllocatedCondition(tt.inputEgress, tt.inputErr)
			gotEgress, err := controller.crdClient.CrdV1beta1().Egresses().Get(context.TODO(), tt.inputEgress.Name, metav1.GetOptions{})
			require.NoError(t, err)
			assert.True(t, semanticIgnoringTime.DeepEqual(tt.expectedStatus, gotEgress.Status), "Expected %v, got %v", tt.expectedStatus, gotEgress.Status)
		})
	}
}

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AJPL88, please try to squash 12 commits into one. Thanks.

type: string
status:
type: string
lastTransitionTime:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add format: date-time for this field. You can check here https://speakeasyapi.dev/post/openapi-tips-data-type-formats/ if you'd like to learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the existing APIs don't have it, I think we can keep them consistent for now and update them separately together.

type EgressConditionType string

const (
IPAllocated EgressConditionType = "IPAllocated"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add comments about what does these types mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/apis/crd/v1beta1/types.go Show resolved Hide resolved
@tnqn tnqn force-pushed the add-egress-condition branch 2 times, most recently from 335b354 to d93924d Compare October 16, 2023 11:04
@luolanzone luolanzone mentioned this pull request Oct 18, 2023
3 tasks
When EgressIP is successfully allocated by antrea-controller,
IPAllocated condition in Egress status is updated.

When EgressIP is assigned to a Node by antrea-agent, IPAssigned
condition in Egress status is updated.

Signed-off-by: Alan Jiang <accelerator5460@gmail.com>
@tnqn tnqn force-pushed the add-egress-condition branch from d93924d to df8a2f7 Compare October 18, 2023 12:24
@tnqn tnqn changed the title Add Egress IPAllocated, IPAssigned conditions Add IPAllocated and IPAssigned conditions to Egress status Oct 18, 2023
@tnqn tnqn force-pushed the add-egress-condition branch from df8a2f7 to 5c0319c Compare October 18, 2023 14:44
@tnqn tnqn assigned xliuxu and luolanzone and unassigned xliuxu and luolanzone Oct 18, 2023
@tnqn tnqn requested a review from xliuxu October 18, 2023 14:44
@tnqn tnqn requested review from luolanzone and antoninbas October 18, 2023 14:44
@tnqn tnqn added api-review Categorizes an issue or PR as actively needing an API review. area/transit/egress Issues or PRs related to Egress (SNAT for traffic egressing the cluster). action/release-note Indicates a PR that should be included in release notes. labels Oct 18, 2023
@@ -23,6 +23,7 @@ import (
"sync"
"time"

v1 "k8s.io/api/core/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd prefer corev1 as the name for consistency with other imports (coreinformers, metav1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
} else if egressIP == "" {
// Select one Node to update false status among all Nodes.
nodeToUpdateStatus, err := c.cluster.SelectNodeForIP(egress.Spec.EgressIP, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a guarantee here that egress.Spec.EgressIP is not nil (I know that addEgress has a check for that)? If so, would be nice to have a comment to that effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed, but we don't care the value of egress.Spec.EgressIP, just use it to reach consensus among all agents about which one should do the update.
Added a comment for it.

}
}
} else {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have a comment here, to explain which case this is (static Egress IP non-assigned to this Node?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added "The Egress IP is assigned to a Node (egressIP != "") but it's not this Node (isLocal == false), do nothing."

desiredStatus.EgressIP = ""
// If the error is nil, it means the Egress hasn't been processed yet. Therefore, we only set IPAssigned
// condition to false when there is an error.
if scheduleErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if scheduleErr is nil shouldn't we have the condition anyway with a message explaining that the Egress hasn't been processed yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interval will just be a few ms. And there could be some bad side effects, so I intentionally add the check:

  1. One intermediate state will be generated and overwriten very soon.
  2. Two agents may try to update it and cause some backoff retries, which may delay the update of the right condition. One agent is selected due to fallback mechanism (when the Egress is not scheduled to any node because it's not processed yet), one agent is the right owner (after it's processed).

The state an Egress is not processed will last only a very short time. It's because the scheduler may receive the event at the same time as the controller. So controller may process it before the scheduler gets a result. But as long as the Egress is there, the scheduler will eventually have a result for it very soon.

Also added comments for it.

@@ -1026,3 +1072,19 @@ func (c *EgressController) GetEgress(ns, podName string) (string, string, error)
func isEgressSchedulable(egress *crdv1b1.Egress) bool {
return egress.Spec.EgressIP != "" && egress.Spec.ExternalIPPool != ""
}

// compareEgressStatus compares two Egress Statuses, ignoring LastTransitionTime and conditions other than IPAssigned, returns true if they are equal.
func compareEgressStatus(currentStatus, desiredStatus crdv1b1.EgressStatus) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason not to use pointer parameters for this function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to pointer

if compareEgressStatus(toUpdate.Status, *desiredStatus) {
return nil
}
statusToUpdate := desiredStatus.DeepCopy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we make yet another copy here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment:

// Must make a copy here as we will append more conditions. If it's appended to desiredStatus directly, there
// would be duplicate conditions when the function retries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@@ -205,6 +205,27 @@ type EgressStatus struct {
// EgressIP indicates the effective Egress IP for the selected workloads. It could be empty if the Egress IP in spec
// is not assigned to any Node. It's also useful when there are more than one Egress IP specified in spec.
EgressIP string `json:"egressIP"`

Conditions []EgressCondition `json:"conditions,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn is it required to also add this to the v1alpha2 version of the API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is no drawback in doing it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I noticed it's added to v1alpha2, I thought it's harmless so just leave it as is.

@tnqn tnqn force-pushed the add-egress-condition branch from 5c0319c to c1c2928 Compare October 19, 2023 17:54
@antoninbas
Copy link
Contributor

@tnqn It doesn't look like you pushed your latest changes?

@tnqn tnqn force-pushed the add-egress-condition branch from c1c2928 to ce92b04 Compare October 19, 2023 23:41
@tnqn
Copy link
Member

tnqn commented Oct 19, 2023

@tnqn It doesn't look like you pushed your latest changes?

done, sorry for the confusion.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
One follow-up question: if the actual IP assignment fails (call to c.ipAssigner.AssignIP in the syncEgress function), will this ever be reflected in the status?

}
} else if egressIP == "" {
// Select one Node to update false status among all Nodes.
// We don't care the value of egress.Spec.EgressIP, just use it to reach a consensus among all agents about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, hope it’s the last time I made the same mistake :)

if compareEgressStatus(toUpdate.Status, *desiredStatus) {
return nil
}
statusToUpdate := desiredStatus.DeepCopy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@tnqn
Copy link
Member

tnqn commented Oct 20, 2023

One follow-up question: if the actual IP assignment fails (call to c.ipAssigner.AssignIP in the syncEgress function), will this ever be reflected in the status?

Not yet, the error shouldn't happen in practice. As of now I've never seen it. We can add it when it turns out to be possible.

@tnqn tnqn force-pushed the add-egress-condition branch from ce92b04 to 3a03a93 Compare October 20, 2023 13:44
@tnqn
Copy link
Member

tnqn commented Oct 20, 2023

/test-all
/test-ipv6-all
/test-ipv6-only-all

antoninbas
antoninbas previously approved these changes Oct 20, 2023
1. Avoid generating a transient IPAssigned failure by differentiating
scheduling failure from unprocessed case.
2. Fix duplicate IPAllocated conditions.
3. Add/update unit tests and e2e tests.

Signed-off-by: Quan Tian <qtian@vmware.com>
@tnqn
Copy link
Member

tnqn commented Oct 23, 2023

/test-all

Copy link
Contributor

@xliuxu xliuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn tnqn merged commit 0ff5d8c into antrea-io:main Oct 23, 2023
43 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. api-review Categorizes an issue or PR as actively needing an API review. area/transit/egress Issues or PRs related to Egress (SNAT for traffic egressing the cluster).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants