Fix redundant reconciles in FQDN controller #5893

Dyanngg · 2024-01-18T18:19:02Z

Recently an issue was raised in the Kubernetes Antrea channel, where @krobbo sees very frequent reconciliations of the same FQDN rule in very short succession. It was also disclosed that the specific rule in questions contains several FQDN selectors, and the FQDNs are S3 endpoints which have dynamic IPs and relatively short TTL.

From the logs excerpts, I found that for some DNS response handling, the FQDN controller actually reconciles the same rule immediately after it finishes syncing in datapath:

4914 I0106 10:43:51.785381       1 networkpolicy_controller.go:663] "Rule realization was done" ruleID="e7b262670e58c0ee"
4915 I0106 10:43:51.785408       1 networkpolicy_controller.go:619] "Finished syncing rule" ruleID="e7b262670e58c0ee" duration="4.593564ms"
4916 I0106 10:43:51.785430       1 reconciler.go:298] "Reconciling NetworkPolicy rule" rule="e7b262670e58c0ee" policy="AntreaNetworkPolicy:lab/egress-to-external-services"
4917 I0106 10:43:51.785449       1 reconciler.go:427] "Assigning OFPriority to rule" rule="e7b262670e58c0ee" priority=14900
4918 I0106 10:43:51.785464       1 reconciler.go:715] "Updating existing rule" rule="e7b262670e58c0ee (Direction: Out, Targets: 4, ToAddressGroups: 0, ToIPBlocks: 0, ToAddresses: 0, Services: 1, PolicyPriority:0xc000c073f0, RulePriority: 0)"

Upon closer look at the current code, I suspect it is due to cases where some FQDN matches several selectors in the same rule, and in the sync logic the rule ID is not deduplicated.

This PR fixes this specific issue by deduplicating the rule IDs before syncing.

tnqn

queue.Add() could deduplicate keys when they are added before the worker gets them out of queue, and reconciling a rule should be much slower than queue.Add(), so the previous code should only be responsible for two consecutive reconcilation. If there is 3rd reconcilation right after the 2nd one, it should be something else. But the patch is definitely one of the right things.

pkg/agent/controller/networkpolicy/fqdn.go

Signed-off-by: Dyanngg <dingyang@vmware.com>

tnqn

LGTM

tnqn · 2024-01-22T05:58:52Z

/test-all

Dyanngg changed the title ~~[WIP] Fix redundant reconciles in FQDN controller~~ Fix redundant reconciles in FQDN controller Jan 18, 2024

Dyanngg requested review from GraysonWu and tnqn January 18, 2024 23:05

Dyanngg added this to the Antrea v1.15 release milestone Jan 18, 2024

tnqn reviewed Jan 19, 2024

View reviewed changes

pkg/agent/controller/networkpolicy/fqdn.go Show resolved Hide resolved

Fix redundant reconciles in FQDN controller

01a7a8c

Signed-off-by: Dyanngg <dingyang@vmware.com>

Dyanngg force-pushed the fix-fqdn-redundant-reconcile branch from b3a4f88 to 01a7a8c Compare January 19, 2024 18:21

Dyanngg requested a review from tnqn January 19, 2024 18:42

tnqn approved these changes Jan 22, 2024

View reviewed changes

tnqn merged commit 6128753 into antrea-io:main Jan 22, 2024
49 of 54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix redundant reconciles in FQDN controller #5893

Fix redundant reconciles in FQDN controller #5893

Dyanngg commented Jan 18, 2024 •

edited

Loading

tnqn left a comment

tnqn left a comment

tnqn commented Jan 22, 2024

Fix redundant reconciles in FQDN controller #5893

Fix redundant reconciles in FQDN controller #5893

Conversation

Dyanngg commented Jan 18, 2024 • edited Loading

tnqn left a comment

Choose a reason for hiding this comment

tnqn left a comment

Choose a reason for hiding this comment

tnqn commented Jan 22, 2024

Dyanngg commented Jan 18, 2024 •

edited

Loading