-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify TCP and UDP DNS interception flows #5392
Conversation
48051c3
to
e7ff6aa
Compare
Realized I shouldn't completely remove the tcp_flags match. Because we don't want syn and fin packets also being sent to the agent. Will add flags that look like |
I wouldn't assume that SYN / FIN packets cannot carry data. I believe that the server is allowed to set the FIN flag on the last data segment. And this RFC mentions TCP Fast Open for DNS, so I think the SYN + ACK from the server could also theoretically carry data. |
Make sense. Then I will keep it for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be adding an e2e test for a multi-packet TCP DNS response. It doesn't have to be as part of this PR, but we should add one before the 1.14 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add @wenyingd for review as well
4c16e5e
to
65cf562
Compare
65cf562
to
2e9a0e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but other people should review as well
// Meter Entry Rate. It is represented as number of events per second. | ||
// Packets which exceed the rate will be dropped. | ||
PacketInMeterRateNP = 100 | ||
PacketInMeterRateTF = 100 | ||
PacketInMeterRateNP = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plan to make the meter rate configurable? A concern is the DNS response packets may be dropped by meter if a burst of Pods traffic happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update #5358 when this is merged.
Note that based on some recent experiments I did, CPU usage increases very quickly when the rate is increased. There is potentially some improvements that can be made in the agent to process packets more efficiently.
88f8609
to
a321c66
Compare
/test-all |
/test-ipv6-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
/test-all |
PacketInMeterRateTF = 100 | ||
PacketInMeterRateNP = 100 | ||
PacketInMeterRateTF = 100 | ||
PacketInMeterRateDNS = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned if this could affect normal traffic. A node could run 110 Pods, if all of them use FQDN rule, each Pod is only allowed to get less than 1 DNS response per second. But even for in-cluster DNS UDP queries, it usually needs 3~5 rounds to get a valid result due the search names. If it's a DNS TCP query, it could also be several responses. With the rate-limiting, I think it's very easy to hit the limit and cause applications can't resolve domains. Even the networkpolicy e2e test could cause tens of DNS responses per second according to my experience.
I wonder which process is more expensive: the packedin, or the userspace code that processes the packet. If it's latter, I wonder if we should use the previous way that uses a channel to rate limit the packets, and resume the transmission of the ones that fail to be queued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't understand the channel story. I know we already have a rate limit channel for packetin. What channel do you propose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a similar concern.
I feel like we need more investigation and experimenting. Just getting the packets to the antrea-agent handling code is quite expensive in terms of CPU time, so I am hoping we can improve that.
We could use a larger rate. However, I am pretty sure that this patch has no effect on the maximum rate of DNS packets we can handle. The reason is that we also have a rate-limited queue in the antrea-agent process, and each PacketIn category is rate-limited to 100 pps:
antrea/pkg/agent/openflow/packetin.go
Lines 113 to 118 in e04c95c
func newFeatureStartPacketIn(category uint8, stopCh <-chan struct{}) *featureStartPacketIn { | |
featurePacketIn := featureStartPacketIn{category: category, stopCh: stopCh} | |
featurePacketIn.packetInQueue = openflow.NewPacketInQueue(PacketInQueueSize, rate.Limit(PacketInQueueRate)) | |
return &featurePacketIn | |
} |
DNS responses use the PacketInCategoryDNS
category.
The OVS meter should prevent high CPU usage in ovs-vswitchd and antrea-agent, but there should be no change in the effective rate of DNS responses we can process.
I suggest that after merging it, we revisit these parameters. We could have a higher OVS rate for DNS packets, or at least a higher tolerance for bursts. At the moment I think the OVS meter has a burst size of 200 packets, and the software queue also has capacity 200.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems 100 or 200 pps is far from enough. I did a statistic when analyzing a pcap got from a production cluster, even a single Pod's DNS response could exceed 100:
# tcpdump -r dns.pcap -n src port 53 | awk -F "." '{print $1}' | uniq -c |sort -r | head
reading from file dns.pcap, link-type EN10MB (Ethernet)
101 15:38:43
100 15:38:56
100 15:38:40
100 15:34:01
99 15:38:44
99 15:34:03
99 15:33:58
98 15:38:58
98 15:38:04
98 15:37:48
The above pcap has only UDP DNS traffic, the maximum value around 100 reminds me if it's already limited by the userspace rate limiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the userspace rate-limiting was always there, let's merge this PR, and work on increasing the limit as a follow-up. @GraysonWu could you also work on this? I think settings it to a default of 500pps for DNS packets and making the rate configurable would be a good place to start. I think you should also run some experiments on a cluster to see how much CPU is consumed at 500pps. Based on some previous experiments I did, 500pps can consume one full vCPU (ovs-vswitchd + antrea-agent). Maybe there is some room to improve the implementation. Of course, in the long term, we could always consider a different implementation than packet-in (e.g. similar to what we do with Suricata for IDS).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will work on that and open a PR ASAP.
I did an experiment with 500 pps it takes around 70% of 1 vCPU in my env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I want to wait and see if @tnqn has any more comments regarding #5392 (comment)
@GraysonWu can you fix the conflict? |
1. Change to use ct_state to match the DNS responses that need further interception. 2. Add OF meter for DNS interception. Signed-off-by: graysonwu <wgrayson@vmware.com>
2d131de
to
9e846db
Compare
/test-all |
@GraysonWu please backport to 1.13 |
Fixes #5387