Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NetworkPolicy statistics #985

Closed
3 tasks done
tnqn opened this issue Jul 27, 2020 · 20 comments · Fixed by #1172
Closed
3 tasks done

Support NetworkPolicy statistics #985

tnqn opened this issue Jul 27, 2020 · 20 comments · Fixed by #1172
Assignees
Labels
kind/design Categorizes issue or PR as related to design.

Comments

@tnqn
Copy link
Member

tnqn commented Jul 27, 2020

Describe what you are trying to solve
This proposal is to collect and expose the statistical data of NetworkPolicy. antrea-controller collects NetworkPolicy metrics from antrea-agent, aggregates the data, and exposes them through the antrea metrics API.

Monitoring solutions and users can access the data via the metrics API. It can also be accessed by antctl get metrics networkpolicy, making it easier to view.

The metrics data includes total number of packets, bytes, sessions for given NetworkPolicy. The metrics is collected asynchronously and periodically, hence the data got from the metrics API is not real-time and may have a delay up to the collection interval (configurable, 1 min by default).

Describe the solution you have in mind

Scalability consideration

Assuming we want to support 100,000 NetworkPolicies and 1,000 Nodes, 1,000 NetworkPolicies apply to each Node, and the metrics data is collected every minute, this means:

For collection:

  1. Each agent reports 1000 metrics per minute
  2. antrea-controller sums up 1000 * 1000 = 1,000,000 metrics (for 100, 000 individual NetworkPolicies) per minute
    There should be no performance issue when collecting and aggregating the data in above scale.

For strorage:
Kubernetes apiserver persists resources including CRD in KV store etcd. If we want to persist the data to Kubernetes, it means 100,000 / 60 = 1666 API writing per second by average (166 API writing per second even persisting them every 10 minutes), which may cause considerable load to the apiserver and the storage. On the other hand, the metrics data will only be lost when the controller itself is restarted and monitoring solutions can persist the data by themselves, so storing them in memory should be reasonable.

Metrics collection by antrea-agent

antrea-agent is responsible for collecting metrics from openflow stats.
In NetworkPolicy implementation, each NetworkPolicy rule gets an unique conjunction ID, the n_packets and n_bytes mean the packets and bytes hit by this rule.

table=50, n_packets=0, n_bytes=0, priority=200,ip,nw_src=172.60.0.5 actions=conjunction(1,1/2)
table=50, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=172.60.1.4 actions=conjunction(1,2/2)
table=50, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=172.60.0.5 actions=conjunction(1,2/2)
table=50, n_packets=0, n_bytes=0, priority=190,conj_id=1,ip actions=load:0x1->NXM_NX_REG5[],resubmit(,70)

However, the current flow stats only counts the first packet of each session as the conjunction match flows only match packets in one direction, and the following packets will be matched by a flow that allows all packets of established sessions.

table=50, n_packets=0, n_bytes=0, priority=210,ct_state=-new+est,ip actions=resubmit(,70)

One possible solution for this is to use ct_nw_src, ct_nw_dst, ct_tp_dst which match the conntrack original direction tuple source address, destination address, destination port.

table=50, n_packets=0, n_bytes=0, priority=200,ip,ct_nw_src=172.60.0.5 actions=conjunction(1,1/2)
table=50, n_packets=0, n_bytes=0, priority=200,ip,ct_nw_dst=172.60.1.4 actions=conjunction(1,2/2)
table=50, n_packets=0, n_bytes=0, priority=200,ip,ct_nw_dst=172.60.0.5 actions=conjunction(1,2/2)
table=50, n_packets=0, n_bytes=0, priority=190,conj_id=1,ip actions=load:0x1->NXM_NX_REG5[],resubmit(,70)

But we also see a few drawbacks in this solutions:

  1. ct_nw_src requires Open vSwitch > 2.8 (some distros don't have it yet?).
  2. In ingress table, we are using destination ofport instead of destination address for multiple IPs per Pod consideration.
table=90, n_packets=0, n_bytes=0, priority=200,ip,ct_nw_src=172.60.0.5 actions=conjunction(2,1/2)
table=90, n_packets=0, n_bytes=0, priority=200,ip,reg1=0x3 actions=conjunction(2,2/2)
table=90, n_packets=0, n_bytes=0, priority=190,conj_id=2,ip actions=load:0x2->NXM_NX_REG6[],resubmit(,105)
  1. The first reply packet would have to match the conjunction match flows again (raised by @srikartati).

@wenyingd proposed to persist the conjunction ID to conntrack label, and have dedicated metrics collection flows to match the conntrack label. The flows would be:

# All allowed packets will be resubmit to metrics table
table=50, n_packets=181551, n_bytes=27050226, priority=210,ct_state=-new+est,ip actions=resubmit(,61)
# Load the conjunction ID to ct_label and jump to metrics table
table=50, n_packets=16202, n_bytes=1198948, priority=190,conj_id=1,ip actions=load:0x1->NXM_NX_REG5[],ct(commit,table=61,zone=65520,exec(load:0x2->NXM_NX_CT_LABEL[32..63]))
# Collect "+new" and "-new" packets separately for each rule
table=61, n_packets=16202, n_bytes=1198948, priority=200,ct_state=+new,ct_label=0x100000000/0xffffffff00000000,ip actions=goto_table:70
table=61, n_packets=181551, n_bytes=27050226, priority=200,ct_state=-new,ct_label=0x100000000/0xffffffff00000000,ip actions=goto_table:70

In the above example, 16202 is the sessions count, 16202 + 181551 is the packets count, 1198948 + 27050226 is the bytes count.

Communication between antrea-agent and antrea-controller

Although antrea-agent now has its own API (for Prometheus api and support bundle), but it's difficult to enable server authentication for agent API as it would require certification generation and distribution for each agent.
In this proposal, instead of antrea-controller pulling data from antrea-agents, antrea-agents push metrics data to an internal metrics collection API exposed by antrea-controller. In this way, the same authentication and authorization mechanism, and even the TCP connection for the internal NetworkPolicy API can be reused.

Since each antrea-agent could restart and the openflow stats could be reset after that. If antrea-agent sends the whole stats to antrea-controller, it's not easy to aggregate the whole stats given that each agent could reset its portion individually.
In this proposal, antrea-agent is responsible for calculating the incremental stats and reports it to antrea-controller. Then antrea-agent could simply sum up the data from all agents.

APIs

Collection API (internal)

The data struct representing the collected metrics is as below.

type NetworkPolicyType string

const (
	K8sNetworkPolicy NetworkPolicyType = "K8sNetworkPolicy"
	ClusterNetworkPolicy NetworkPolicyType = "ClusterNetworkPolicy"
	AntreaNetworkPolicy NetworkPolicyType = "AntreaNetworkPolicy"
)

// NetworkPolicyReference represents a NetworkPolicy Reference.
type NetworkPolicyReference struct {
	// The type of this NetworkPolicy.
	Type NetworkPolicyType `json:"type"`
	// The name of this NetworkPolicy.
	Name string `json:"name"`
	// The namespace of this NetworkPolicy.
	Namespace string `json:"namespace"`
}

type NetworkPolicyStats struct {
	NetworkPolicy NetworkPolicyReference `json:"networkPolicy"`

	Packets  int64 `json:"packets"`
	Bytes int64 `json:"bytes"`
	Sessions int64 `json:"sessions"`
}

The API endpoints: /stats/networkpolicy.

Metrics API (public)

The metrics API must follow the K8s convention so that it can be registered as an APIService and accessed by antctl get metrics and kubectl get.

type NetworkPolicyMetric struct {
	metav1.TypeMeta
	metav1.ObjectMeta

	Packets  int64
	Bytes    int64
	Sessions int64
}

type NetworkPolicyMetricList struct {
	metav1.TypeMeta
	metav1.ObjectMeta

	// List of NetworkPolicy metrics.
	Items []NetworkPolicyMetric
}

The API group is metrics.antrea.tanzu.vmware.com and the endpoints is /apis/metrics.antrea.tanzu.vmware.com/v1alpha1/networkpolicies

Open questions:
Should metrics of all types (K8sNetworkPolicy, ClusterNetworkPolicy, AntreaNetworkPolicy) be exposed via a single endpoint or separate ones?

Describe how your solution impacts user flows
Users can access the NetworkPolicy metrics via the metrics API and antctl get metrics networkpolicy.

Describe the main design/architecture of your solution

Alternative solutions that you considered

Test plan
TBD

Additional context
PRs for this feature:

@tnqn tnqn added the kind/design Categorizes issue or PR as related to design. label Jul 27, 2020
@jianjuns
Copy link
Contributor

Question - for controller API to expose aggregated policy stats, do we need to follow some format to facilitate Prometheus consumption too?

@tnqn
Copy link
Member Author

tnqn commented Jul 28, 2020

Question - for controller API to expose aggregated policy stats, do we need to follow some format to facilitate Prometheus consumption too?

I don't know whether Prometheus is suitable to collect the metrics data for dynamically created/destroyed resources, didn't see the Kubernetes metrics API takes Prometheus into consideration: https://github.com/kubernetes/metrics#apis.

@ksamoray do you know a similar case of Prometheus integrated apps or whether it supports metrics for frequently created/destroyed resources?

If we really want to expose the data to Prometheus, I assume we need to expose the data via "/metrics" API instead, and follow the Prometheus format. In that case, do you think we have two APIs or let users/other monitoring solutions to access the "/metrics" API?

@srikartati
Copy link
Member

Will these statistics support denied/not allowed sessions, packets, and bytes too?

@ksamoray
Copy link
Contributor

@tnqn I haven't really ran into such use case, where Prometheus manages metrics with a short lifespan. I can look around and conduct a couple of experiments. I believe that Prometheus will store these metrics even after they're not reported anymore at /metrics.
It is possible to expose these via /metrics and consume them via other apps (there's a parser for the Prometheus output). But then, if Prometheus won't age them in its DB, wouldn't it bloat after a while with stale NetworkPolicy metrics?

@tnqn
Copy link
Member Author

tnqn commented Jul 28, 2020

Will these statistics support denied/not allowed sessions, packets, and bytes too?

For Antrea specific NetworkPolicies that have DROP action, the sessions, packets, and bytes will be for denied traffic, though sessions and packets should be same in this case.

@tnqn
Copy link
Member Author

tnqn commented Jul 28, 2020

@tnqn I haven't really ran into such use case, where Prometheus manages metrics with a short lifespan. I can look around and conduct a couple of experiments. I believe that Prometheus will store these metrics even after they're not reported anymore at /metrics.
It is possible to expose these via /metrics and consume them via other apps (there's a parser for the Prometheus output). But then, if Prometheus won't age them in its DB, wouldn't it bloat after a while with stale NetworkPolicy metrics?

Thanks @ksamoray for you quick answer. The cleanup for stale items is also my concern. Good to know that this parser approach that allows antrea maintain a single API even if user wants the data in their Prometheus server, is it Prometheus official mechanism or 3rd party tool?

@srikartati
Copy link
Member

Regarding network policies metrics at Prometheus server: Are deleted network policies should really be considered as stale? There may be a use case, where a user wants to see network policy metrics say 12 hours ago for a duration of one hour. In this scenario, deleted network policy data would still be useful. I am presuming the Prometheus server will have a configurable time period to save the metrics data in DB. This is true for non-deleted metrics resources as well, right? Thanks.

@ksamoray
Copy link
Contributor

Any metric which would be exposed on the Agent's /metrics endpoint can be defined as a Prometheus metric and will be scraped by the Prometheus server. Prometheus server stores scraped metrics in its tsdb to a predefined retention (defaults to 15 days), even if the metric has been unregistered on the agent.
This is not a problem as tsdb storage is fairly "cheap", and as @srikartati suggested, could be useful as well. However 15d is a lot and retention cannot be set per metric.
As for the controller side, metrics can be pulled from the agents via HTTPS in the same manner that the Prometheus server scrapes them. However, each request will pull all the metrics from the agent, including irrelevant ones.
Parsing the Prometheus format is fairly easy, Prometheus codebase has a parser which I've used in the e2e tests.
Another option is to gather the relevant metrics from the Prometheus server itself - that will require having one though. Prometheus server has a fairly strong query API backed up by a golang client. A query can filter out stale metrics easily. We could query Prometheus either from the controller or even directly from antctl. However that will require having a Prometheus server... So there's a decision to make about creating a dependency here for NetworkPolicy metrics.

@suwang48404
Copy link
Contributor

@tnqn @jianjuns a couple questions:

  1. metrics are defined per NP, can/should it include per endpoint? For instance, if an ANP is applied to all Pod, for live debugging user may want to know on which Pod(s) the NP is currently active?
  2. current design calls for: a) agent push metric to controller via CRD b) Prometheus scrapes controller for NP metrics, right?
  3. what timeframe this work is slated for.

Thx, Su

@jianjuns
Copy link
Contributor

Yes, per endpoint stats is useful in some cases, but maybe it can be next step, after we have cluster level stats?

@antoninbas
Copy link
Contributor

@suwang48404 the current plan is to have an internal API for agents to push metrics to the controller, and a public API exposed by the controller with aggregated data (see Metrics API (public) section in Quan's post). We haven't looked at what kind of Prometheus integration we can do for this yet. @srikartati has started looking into it IIRC, but it is unclear whether Prometheus is a good fit to expose statistics for a large number of "ephemeral" objects (object == NP here). Maybe we will need yet another level of statistic aggregation for the Prometheus metrics (i.e. to avoid exposing stats for each NP).

@tnqn
Copy link
Member Author

tnqn commented Aug 18, 2020

@jianjuns @antoninbas thanks for answering the questions.

@suwang48404 it targets for the coming release 0.10.

@suwang48404
Copy link
Contributor

@antoninbas @jianjuns @antoninbas thx all for replying.

@srikartati
Copy link
Member

@srikartati has started looking into it IIRC, but it is unclear whether Prometheus is a good fit to expose statistics for a large number of "ephemeral" objects (object == NP here). Maybe we will need yet another level of statistic aggregation for the Prometheus metrics (i.e. to avoid exposing stats for each NP).

Hi,
Thanks for the offline discussion @antoninbas
Here the basic issue is having a Prometheus metric with the label as network policy name/uuid, which makes the label a high cardinality label. This is not recommended in Prometheus because of explosion of time series metrics. I tried to see if a constMetric can help by sending standalone metrics periodically with a specific set of labels. This does not seem to be a recommended best practice.
Maintaining the network policy stats as logs/events, storing and visualizing them (ELK collector) seems to be the right best practice

@suwang48404
Copy link
Contributor

@srikartati , thank you, that was very informative.

@ceclinux
Copy link
Contributor

I think antctl is not implemented with this feature. Could you explain why? @tnqn

@tnqn
Copy link
Member Author

tnqn commented Feb 10, 2022

I think antctl is not implemented with this feature. Could you explain why? @tnqn

The API follows K8s style and the data can be retrived via kubectl get networkpolicystats/antreanetworkpolicystats/antreaclusternetworkpolicystats, no reason to repeat it in antctl

@ceclinux
Copy link
Contributor

I think antctl is not implemented with this feature. Could you explain why? @tnqn

The API follows K8s style and the data can be retrived via kubectl get networkpolicystats/antreanetworkpolicystats/antreaclusternetworkpolicystats, no reason to repeat it in antctl

I think you were referring to aggregated NetworkPolicy stats for the Antrea cluster. Implementing something like antctl get networkpolicystats in Antrea agent can expose NetworkPolicy stats for each node. So I am assuming 'node level NetworkPolicy stats` is out of proposal's scope. Please correct me if I understand it incorrectly.

@tnqn
Copy link
Member Author

tnqn commented Feb 10, 2022

Yes, I didn't plan node level stats, but it sounds like a good idea to me, maybe helpful for troubleshooting.

@ceclinux
Copy link
Contributor

Yes, I didn't plan node level stats, but it sounds like a good idea to me, maybe helpful for troubleshooting.

Thank you for your prompt reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants