Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix redundant reconciles in FQDN controller #5893

Merged
merged 1 commit into from
Jan 22, 2024

Conversation

Dyanngg
Copy link
Contributor

@Dyanngg Dyanngg commented Jan 18, 2024

Recently an issue was raised in the Kubernetes Antrea channel, where @krobbo sees very frequent reconciliations of the same FQDN rule in very short succession. It was also disclosed that the specific rule in questions contains several FQDN selectors, and the FQDNs are S3 endpoints which have dynamic IPs and relatively short TTL.

From the logs excerpts, I found that for some DNS response handling, the FQDN controller actually reconciles the same rule immediately after it finishes syncing in datapath:

4914 I0106 10:43:51.785381       1 networkpolicy_controller.go:663] "Rule realization was done" ruleID="e7b262670e58c0ee"
4915 I0106 10:43:51.785408       1 networkpolicy_controller.go:619] "Finished syncing rule" ruleID="e7b262670e58c0ee" duration="4.593564ms"
4916 I0106 10:43:51.785430       1 reconciler.go:298] "Reconciling NetworkPolicy rule" rule="e7b262670e58c0ee" policy="AntreaNetworkPolicy:lab/egress-to-external-services"
4917 I0106 10:43:51.785449       1 reconciler.go:427] "Assigning OFPriority to rule" rule="e7b262670e58c0ee" priority=14900
4918 I0106 10:43:51.785464       1 reconciler.go:715] "Updating existing rule" rule="e7b262670e58c0ee (Direction: Out, Targets: 4, ToAddressGroups: 0, ToIPBlocks: 0, ToAddresses: 0, Services: 1, PolicyPriority:0xc000c073f0, RulePriority: 0)"

Upon closer look at the current code, I suspect it is due to cases where some FQDN matches several selectors in the same rule, and in the sync logic the rule ID is not deduplicated.

This PR fixes this specific issue by deduplicating the rule IDs before syncing.

@Dyanngg Dyanngg changed the title [WIP] Fix redundant reconciles in FQDN controller Fix redundant reconciles in FQDN controller Jan 18, 2024
@Dyanngg Dyanngg requested review from GraysonWu and tnqn January 18, 2024 23:05
@Dyanngg Dyanngg added this to the Antrea v1.15 release milestone Jan 18, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queue.Add() could deduplicate keys when they are added before the worker gets them out of queue, and reconciling a rule should be much slower than queue.Add(), so the previous code should only be responsible for two consecutive reconcilation. If there is 3rd reconcilation right after the 2nd one, it should be something else. But the patch is definitely one of the right things.

pkg/agent/controller/networkpolicy/fqdn.go Show resolved Hide resolved
Signed-off-by: Dyanngg <dingyang@vmware.com>
@Dyanngg Dyanngg force-pushed the fix-fqdn-redundant-reconcile branch from b3a4f88 to 01a7a8c Compare January 19, 2024 18:21
@Dyanngg Dyanngg requested a review from tnqn January 19, 2024 18:42
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jan 22, 2024

/test-all

@tnqn tnqn merged commit 6128753 into antrea-io:main Jan 22, 2024
49 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants