Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify TCP and UDP DNS interception flows #5392

Merged
merged 1 commit into from
Aug 23, 2023

Conversation

GraysonWu
Copy link
Contributor

Fixes #5387

  1. Change to use ct_state to match the DNS responses that need further interception.
  2. Add OF meter for DNS interception.

@GraysonWu GraysonWu force-pushed the fqdn-uniform branch 3 times, most recently from 48051c3 to e7ff6aa Compare August 16, 2023 02:32
@GraysonWu
Copy link
Contributor Author

Realized I shouldn't completely remove the tcp_flags match. Because we don't want syn and fin packets also being sent to the agent. Will add flags that look like tcp_flags=+ack-syn-fin-rst. Just let the reviewers know, I will commit this change in a few hours. I'm having some internet issues right now.

@antoninbas
Copy link
Contributor

Realized I shouldn't completely remove the tcp_flags match. Because we don't want syn and fin packets also being sent to the agent. Will add flags that look like tcp_flags=+ack-syn-fin-rst. Just let the reviewers know, I will commit this change in a few hours. I'm having some internet issues right now.

I wouldn't assume that SYN / FIN packets cannot carry data. I believe that the server is allowed to set the FIN flag on the last data segment. And this RFC mentions TCP Fast Open for DNS, so I think the SYN + ACK from the server could also theoretically carry data.

@GraysonWu
Copy link
Contributor Author

Realized I shouldn't completely remove the tcp_flags match. Because we don't want syn and fin packets also being sent to the agent. Will add flags that look like tcp_flags=+ack-syn-fin-rst. Just let the reviewers know, I will commit this change in a few hours. I'm having some internet issues right now.

I wouldn't assume that SYN / FIN packets cannot carry data. I believe that the server is allowed to set the FIN flag on the last data segment. And this RFC mentions TCP Fast Open for DNS, so I think the SYN + ACK from the server could also theoretically carry data.

Make sense. Then I will keep it for review.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be adding an e2e test for a multi-packet TCP DNS response. It doesn't have to be as part of this PR, but we should add one before the 1.14 release.

pkg/agent/openflow/network_policy.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add @wenyingd for review as well

pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but other people should review as well

pkg/agent/openflow/network_policy.go Outdated Show resolved Hide resolved
// Meter Entry Rate. It is represented as number of events per second.
// Packets which exceed the rate will be dropped.
PacketInMeterRateNP = 100
PacketInMeterRateTF = 100
PacketInMeterRateNP = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plan to make the meter rate configurable? A concern is the DNS response packets may be dropped by meter if a burst of Pods traffic happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update #5358 when this is merged.

Note that based on some recent experiments I did, CPU usage increases very quickly when the rate is increased. There is potentially some improvements that can be made in the agent to process packets more efficiently.

pkg/agent/openflow/network_policy.go Outdated Show resolved Hide resolved
@GraysonWu GraysonWu force-pushed the fqdn-uniform branch 2 times, most recently from 88f8609 to a321c66 Compare August 17, 2023 05:32
@GraysonWu
Copy link
Contributor Author

/test-all

@GraysonWu
Copy link
Contributor Author

/test-ipv6-e2e

Copy link
Contributor

@Dyanngg Dyanngg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

pkg/agent/openflow/network_policy.go Show resolved Hide resolved
@GraysonWu
Copy link
Contributor Author

/test-all
/test-ipv6-e2e

PacketInMeterRateTF = 100
PacketInMeterRateNP = 100
PacketInMeterRateTF = 100
PacketInMeterRateDNS = 100
Copy link
Member

@tnqn tnqn Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned if this could affect normal traffic. A node could run 110 Pods, if all of them use FQDN rule, each Pod is only allowed to get less than 1 DNS response per second. But even for in-cluster DNS UDP queries, it usually needs 3~5 rounds to get a valid result due the search names. If it's a DNS TCP query, it could also be several responses. With the rate-limiting, I think it's very easy to hit the limit and cause applications can't resolve domains. Even the networkpolicy e2e test could cause tens of DNS responses per second according to my experience.
I wonder which process is more expensive: the packedin, or the userspace code that processes the packet. If it's latter, I wonder if we should use the previous way that uses a channel to rate limit the packets, and resume the transmission of the ones that fail to be queued.

Copy link
Contributor Author

@GraysonWu GraysonWu Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't understand the channel story. I know we already have a rate limit channel for packetin. What channel do you propose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar concern.
I feel like we need more investigation and experimenting. Just getting the packets to the antrea-agent handling code is quite expensive in terms of CPU time, so I am hoping we can improve that.
We could use a larger rate. However, I am pretty sure that this patch has no effect on the maximum rate of DNS packets we can handle. The reason is that we also have a rate-limited queue in the antrea-agent process, and each PacketIn category is rate-limited to 100 pps:

func newFeatureStartPacketIn(category uint8, stopCh <-chan struct{}) *featureStartPacketIn {
featurePacketIn := featureStartPacketIn{category: category, stopCh: stopCh}
featurePacketIn.packetInQueue = openflow.NewPacketInQueue(PacketInQueueSize, rate.Limit(PacketInQueueRate))
return &featurePacketIn
}

DNS responses use the PacketInCategoryDNS category.
The OVS meter should prevent high CPU usage in ovs-vswitchd and antrea-agent, but there should be no change in the effective rate of DNS responses we can process.

I suggest that after merging it, we revisit these parameters. We could have a higher OVS rate for DNS packets, or at least a higher tolerance for bursts. At the moment I think the OVS meter has a burst size of 200 packets, and the software queue also has capacity 200.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems 100 or 200 pps is far from enough. I did a statistic when analyzing a pcap got from a production cluster, even a single Pod's DNS response could exceed 100:

# tcpdump -r dns.pcap -n src port 53  | awk -F "." '{print $1}' | uniq -c |sort -r | head
reading from file dns.pcap, link-type EN10MB (Ethernet)
    101 15:38:43
    100 15:38:56
    100 15:38:40
    100 15:34:01
     99 15:38:44
     99 15:34:03
     99 15:33:58
     98 15:38:58
     98 15:38:04
     98 15:37:48

The above pcap has only UDP DNS traffic, the maximum value around 100 reminds me if it's already limited by the userspace rate limiting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the userspace rate-limiting was always there, let's merge this PR, and work on increasing the limit as a follow-up. @GraysonWu could you also work on this? I think settings it to a default of 500pps for DNS packets and making the rate configurable would be a good place to start. I think you should also run some experiments on a cluster to see how much CPU is consumed at 500pps. Based on some previous experiments I did, 500pps can consume one full vCPU (ovs-vswitchd + antrea-agent). Maybe there is some room to improve the implementation. Of course, in the long term, we could always consider a different implementation than packet-in (e.g. similar to what we do with Suricata for IDS).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will work on that and open a PR ASAP.
I did an experiment with 500 pps it takes around 70% of 1 vCPU in my env.

wenyingd
wenyingd previously approved these changes Aug 18, 2023
Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GraysonWu GraysonWu requested review from tnqn and antoninbas August 21, 2023 21:11
antoninbas
antoninbas previously approved these changes Aug 21, 2023
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I want to wait and see if @tnqn has any more comments regarding #5392 (comment)

@antoninbas
Copy link
Contributor

@GraysonWu can you fix the conflict?

1. Change to use ct_state to match the DNS responses that need
further interception.
2. Add OF meter for DNS interception.

Signed-off-by: graysonwu <wgrayson@vmware.com>
@antoninbas
Copy link
Contributor

/test-all

@antoninbas antoninbas added action/release-note Indicates a PR that should be included in release notes. action/backport Indicates a PR that requires backports. labels Aug 23, 2023
@antoninbas antoninbas merged commit 72bc791 into antrea-io:main Aug 23, 2023
@antoninbas
Copy link
Contributor

@GraysonWu please backport to 1.13

@luolanzone luolanzone mentioned this pull request Aug 24, 2023
@jianjuns jianjuns changed the title Uniform TCP and UDP DNS interception flows Unify TCP and UDP DNS interception flows Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flow matching for DNS responses is invalid
5 participants