-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
L7 Network Policy logs for allowed network policy rules #5982
Comments
@Edward033 If your main / only goal is to inspect HTTP requests, defining a L7NetworkPolicy may not be the best way to go about it. That being said, my understanding is that if you enable logging correctly for your policy rule, then there should be logs for all requests, not just rejected ones. I will give it a try and get back to you.
Other alternatives that may work for you:
|
Thanks @antoninbas for the quick reply! Please keep me posted about your tests if you don't mind! So far, I've been unable to obtain logs for permitted traffic running Antrea version 1.13.1. Here is my Network Policy at a high level:
Logs on /var/log/antrea/networkpolicy/l7engine/eve-YYYY-MM-DD.json only show rejected logs for me. I might be missing something. Thanks for the additional options. Given we're running 1.13.1, those don't apply to us, but will keep them in mind as we progress with our Antrea releases. |
Looks like my understanding was not correct. I tested with Antrea v1.15.0, and I could only see logs for rejected connections:
It seems that the only effect of setting
The second option of running your own Suricata does not require the most recent Antrea version. |
Assigning to @tnqn to get his comments on logging |
Hey @antoninbas, thanks! You're correct. I'll explore the suricata option! Do you have any documentation or do you mind elaborating on the implications of disabling checksum offloading? We want to initially inspect the http traffic, but at some point, if the feature is feasible, we want to enforce it as well. Thanks again! |
AFAIK the only implication would be an impact on throughput. I am not sure we have some data available. In the end, it's also likely to depend on your setup. This requirement is documented in https://github.com/antrea-io/antrea/blob/main/docs/antrea-l7-network-policy.md#prerequisites, and there is a bit more context in #4231:
|
Hi @tnqn, @antoninbas I've been trying to wrap my head around TX checksum offloading and how it relates to our L7 network policy deployment. We have TX checksum offloading enabled on our NIC:
What I understand from this sentence it's that L7 Network Policy (Suricata) prevents TX checksum offloading. Is the Kernel or OVS unable to handle checksum calculations then? Or can the Kernel or OVS still handle the checksum calculations, just potentially slower?
What is the benefit of enabling this parameter in the antrea-agent config? I'm guessing OVS now has the ability to delegate checksum calculations to the kernel, instead of "assuming" the NIC would perform the offload, and causing the potential problems you stated above?
With regards to the description of the "disableTXChecksumOffload" parameter in the antrea-agent,I'm not clear about the last part of this sentence. What causes packets to be dropped due to bad checksum? Disabling TX checksum offloading? Thanks! |
@Edward033 Setting
No, disabling checksum offloading prevents packets from being dropped due to bad checksum. Me providing more details may just create more confusion but here we go, assuming that checksum offloading is enabled: when packets are redirected to Suricata for IPS, a tap interface is used to send and receive packets to the userspace process (Suricata). When packets are sent from the Antrea OVS datapath to Suricata, the sk_buf data structure is aware that the checksum has not been computed yet (computation is deferred because checksum offloading is supported). However, after we send these packets to userspace and then get them back to reinject them into the Antrea OVS datapath, this information has been lost (obviously the sk_buf structure ceased to exist when we sent packets to userspace). So at that stage we have an invalid / missing checksum (it's never been calculated) and the kernel is also no longer aware that the checksum is missing. Because of the latter, the checksum never ends up being calculated and we send packets with an invalid checksum. |
Thanks for the detailed explanation!
Glad to hear this. You did a great job explaining the offloading disabled scenario when forwarding traffic to Suricata. I'm now curious what happens if checksum offloading is indeed enabled in Antrea and Suricata is not in use, and why you think this is likely to impact throughput.
What could be performing this hardware acceleration? Isn't everything software based at this point? I'm just not seeing the impact of computing the checksum sooner or the benefits of deferring checksum computation to a later stage unless you have something performing hardware acceleration. I understand that you need to perform checksum offloading when Suricata is used for the reasons you stated, but I feel I don't know enough about the other scenario (offloading enabled) to fully appreciate why you think this impacts throughput. Thanks again for the help! |
Below you will see the impact of setting ethtool output - default
ethtool output - checksum offload disabled
If you want to confirm the impact on throughput, the easiest thing to would be to run iperf benchmarks for intra-Node and inter-Node traffic, in both configurations. I did a quick experiment for inter-Node traffic, but it was a Kind cluster so it's not really representative of what would happen for a bare-metal cluster or a vSphere-based cluster:
So about a 6-7x decrease in throughput (again, it would be better to try on your actual setup).
Just wanted to emphasize that this is only needed when traffic is redirected, not when traffic is mirrored. In Suricata terms, this is only used for IPS (Intrusion prevention), not for IDS (Intrusion detection). This is why I suggested doing your own mirroring to Suricata (while L7NetworkPolicy requires redirection, as it is an "IDS feature"). |
Appreciate the information! On my current setup, iperf3 reaches ~ 3.7Gbits/s (3700Mbps) with default configuration. With checksum offload disabled I'm getting ~ 1.09Gbits/s so a 3x decrease approx. for me. I've only tested external traffic. Traffic from a pod to an external server, but will perform intra-node and inter-node tests as well. Thanks! |
It shouldn't be the intended behavior, otherwise there would be no point to set
|
The solution adds logs with event type http for allowed traffic in L7 NetworkPolicy. It also adds log support for TLS as it was later supported by L7NP. Fixes antrea-io#5982 Signed-off-by: Qiyue Yao <yaoq@vmware.com>
The solution adds logs with event type http for allowed traffic in L7 NetworkPolicy. It also adds log support for tls as it was later supported by L7NP. Fixes #5982 Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Hi all,
I'm trying to inspect the http traffic of our pods with L7 Network Policy. I want to understand the http hosts (domains) being hit by our workloads.
Based on my tests, L7 network policy logs only show the http host when the policy is rejected on '/var/log/antrea/networkpolicy/l7engine/eve-YYYY-MM-DD.json'.
The allowed logs go to '/var/log/antrea/networkpolicy/np.log' with the rest of the network policy logs. Is there any setting I can change that would show me the http host for permitted traffic? Or is exposing these logs on the roadmap?
The text was updated successfully, but these errors were encountered: