Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking "not allowed"/"denied" connections at Antrea Agent #1004

Closed
srikartati opened this issue Jul 30, 2020 · 2 comments · Fixed by #2112
Closed

Tracking "not allowed"/"denied" connections at Antrea Agent #1004

srikartati opened this issue Jul 30, 2020 · 2 comments · Fixed by #2112
Labels
area/monitoring/traffic-analysis Issues or PRs related to traffic analysis. kind/design Categorizes issue or PR as related to design.

Comments

@srikartati
Copy link
Member

srikartati commented Jul 30, 2020

Describe what you are trying to solve
Connection tracking table only holds "allowed" connections that are matched through network policy rules. We do not know what connections are dropped or denied, and also statistics associated with them. These connections will be useful for flow visibility feature giving an account of denied connections for the user, specifically flow exporter can export these flows along with the ones in the conntrack table.

In addition, this can be used by network policy metrics(Issue #985) to track metrics of "not allowed"/"denied" connections. Even with specific "deny" rule in ClusterNetworkPolicy, the number of denied connections/sessions would be equal to the number of packets, which is inaccurate.

Describe the solution you have in mind
We will add the "packetIn" action in OVS flow rules, where the packets are dropped. For the K8s network policy, this action will not be any specific rule, whereas for cluster network policy, this action be will specific to rule when the deny rule is present. packetIn infra: https://github.com/vmware-tanzu/antrea/blob/a858c9ad25c215418003e4cb92a68ec62c7fddbf/pkg/agent/openflow/packetin.go

With the help of packetIn, we maintain a map/store to keep track of the "not allowed"/"denied" connections.
More details are below.

Describe how your solution impacts user flows
This solution do not have any user-facing feature. Flow exporter features and other features could be benefited from this solution.

Describe the main design/architecture of your solution
We will have a map/store to store these "not allowed/denied" connections. We will create a new function handler in packetIn, where a new connection is added or an existing connection is updated. The key in the map will 5-tuple of connection. We will make use of Connection and ConnectionKey constructs defined as part of the flow exporter feature:
https://github.com/vmware-tanzu/antrea/blob/a858c9ad25c215418003e4cb92a68ec62c7fddbf/pkg/agent/flowexporter/types.go

We have to flush these connections periodically and have a limit on how many "not allowed"/"denied" connections are stored.

Test plan
Unit tests. Enhance the integration and e2e tests, where applicable

Additional context
Open questions:

  • What should be the time out period to flush the connections?
    Initial thoughts are we should have the same timeout period for all connections, and it should be a small period as these "not allowed"/"denied" connections are typically short-lived and will help to track more connections.
    Or should we base the time out on the last updated time for each connection?
    Is 10s good to begin with?
  • Not planning to maintain any state for each connection. Will there be any unforeseen issues?
@srikartati srikartati added kind/design Categorizes issue or PR as related to design. area/monitoring/traffic-analysis Issues or PRs related to traffic analysis. labels Jul 30, 2020
@antoninbas
Copy link
Contributor

I think you have to put a safeguard in place to make sure that the CPU doesn't get flooded by packetins. Otherwise a compromised Pod could just generated denied connections specifically to trigger packetins and flood the Antrea Agent. That means that some packetins could be dropped and the Agent would miss some "connection denied" events. Maybe we should also have a way to prioritize Traceflow packetins over these ones.

@srikartati
Copy link
Member Author

srikartati commented Aug 25, 2020

Thanks for the comment @antoninbas
Looked at the code more carefully.
We would need a new packetInHandler with a different reason other than ofprAction to process denied/rejected connections. This is because the packets with traceflow action need to be consumed for denied/rejected connections and vice-versa. If that is the case, we do not have to coexist with Traceflow packet handler and prioritizing would not be necessary.

We could use one workqueue as Traceflow, but rate limit the packets that are consumed to avoid flooding and high CPU usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring/traffic-analysis Issues or PRs related to traffic analysis. kind/design Categorizes issue or PR as related to design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants