-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NetworkPolicy statistics #1172
Conversation
Thanks for your PR. The following commands are available:
|
8bbb806
to
b2023b0
Compare
Codecov Report
@@ Coverage Diff @@
## master #1172 +/- ##
==========================================
- Coverage 54.40% 54.37% -0.03%
==========================================
Files 115 119 +4
Lines 10821 11213 +392
==========================================
+ Hits 5887 6097 +210
- Misses 4363 4527 +164
- Partials 571 589 +18
Flags with carried forward coverage won't be shown. Click here to find out more.
|
9cb99bd
to
2647d81
Compare
/test-all |
/test-all |
@antoninbas @jianjuns @abhiraut Sorry for changing the API name from metrics to stats in last minute. I changed it because I see most similar functions like |
/test-e2e |
/test-networkpolicy |
/test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// TODO: The following process is not atomic, there's a chance that the ofID is released and reused by another | ||
// NetworkPolicy rule in-between, leading to incorrect metrics. We should return relevant NetworkPolicy references | ||
// along with metrics to avoid it. | ||
ruleStatsMap := m.ofClient.NetworkPolicyMetrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be renamed from metrics to stats for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should, but since it doesn't affect user facing API and the method is not added by this PR, I plan to change it along with addressing the TODOs and ensuring its efficiency. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
metrics -> stats sounds good to me. For controller/agent restart, I feel we should at least handle controller restarts, as that will lose all previous stats, which can be discovered from agents. But probably let us consider how to handle these in the next release. |
Even if controller keeps a copy of agent stats, it cannot handle a scenario like agent restart -> controller restart. For example: Unless controller or agent persists the stats somewhere, the stats might be far from the actual value.
Sure, one solution I have thought is to persist agent's stats in local run dir periodically and before it receives a kill signal, then reload it on start. |
For Agent restart (not OVS restart), Agent should report the current counters, and Controller should know about the Agent restart to compute the diff based on the cached counters. Anything wrong in my assumptions? |
@jianjuns in antrea case, agent and ovs are always restarted together unless one of them is killed by liveness probe. Even only agent restarts and ovs doesn't, agent will flush all flows on restart and even on reconnection to agent. Do you mean reading stats from stale flows before flushing them? |
Ok. I got what you mean. Agent and OVS do not always restart together. But if we assume Agent will always flush all flows, then sure we can make it simpler, and just report the current counters as diff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.. maybe resolve nits in a follow up .. so we don't have merge conflicts on this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation comments
5bc1874
to
7172387
Compare
@antoninbas @jianjuns @abhiraut thanks for review. I have addressed all comments. /test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think the points raised by @jianjuns can de discussed after the release and addressed in a follow-up PR if needed.
This PR supports collecting and querying the NetworkPolicy statistics for K8s NetworkPolicy and Antrea policies. It does the following: - Introduce a feature gate called "NetworkPolicyStats" to manage its enablement. - Introduce a structure called (stats.)Collector that collects stats from the Openflow client, calculates the delta compared with the last reported stats, and reports it to the antrea-controller via the controlplane NodeStatsSummary API. - Introduce a structure called (stats.)Aggregator that collects the stats from the antrea-agents, aggregates them, caches the result, and provides interfaces for the Stats API handlers to query them. - Aggregate the Stats API group to the Kubernetes API.
7172387
to
fa5eada
Compare
@antoninbas sorry, corrected another word in feature-gate, metrics->statistics, could you re-approve? |
/test-all |
This PR supports collecting and querying the NetworkPolicy statistics for
K8s NetworkPolicy and Antrea policies. It does the following:
Introduce a feature gate called "NetworkPolicyStats" to manage its enablement.
Introduce a structure called (stats.)Collector that collects stats from the Openflow client, calculates the delta compared with the last reported stats, and reports it to the antrea-controller via the controlplane NodeStatsSummary API.
Introduce a structure called (stats.)Aggregator that collects the stats from the antrea-agents, aggregates them, caches the result, and provides interfaces for the Stats API handlers to query them.
Aggregate the Stats API group to the Kubernetes API.
The stats can be queried via
kubectl get networkpolicystats
andkubectl get clusternetworkpolicystats
, for example:Closes #985
Depends on #1140 and #1221