Categorize Prometheus metrics and add Connection metrics #1143

srikartati · 2020-08-24T22:42:24Z

This change does the following:

Categorizes Antrea Agent prometheus metrics and provides a way to flexibly configure them.
Remove host/node name metric. I changed antrea-prometheus.yaml to add node name to instance label instead of
IP:port, which make promql queries easier.
I do not know if there is any other benefit for the host/node name metric.
Add connection metrics with the flow exporter feature enabled. Specifically total connection count
in conntrack table and connection count in Antrea connection store.

antrea-bot · 2020-08-24T22:42:37Z

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

/test-e2e: to trigger e2e tests.
/skip-e2e: to skip e2e tests.
/test-conformance: to trigger conformance tests.
/skip-conformance: to skip conformance tests.
/test-whole-conformance: to trigger all conformance tests on linux.
/skip-whole-conformance: to skip all conformance tests on linux.
/test-networkpolicy: to trigger networkpolicy tests.
/skip-networkpolicy: to skip networkpolicy tests.
/test-windows-conformance: to trigger windows conformance tests.
/skip-windows-conformance: to skip windows conformance tests.
/test-windows-networkpolicy: to trigger windows networkpolicy tests.
/skip-windows-networkpolicy: to skip windows networkpolicy tests.
/test-hw-offload: to trigger ovs hardware offload test.
/skip-hw-offload: to skip ovs hardware offload test.
/test-all: to trigger all tests (except whole conformance).
/skip-all: to skip all tests (except whole conformance).

This change does the following: - Categorize Antrea Agent prometheus metrics and provides a way to flexibly configure them. - Remove host/node name metric. I changed the antrea-prometheus.yaml to add nodename to instance lable instead of IP:port, which makes promql queries easier. I do not know if there is any other benefit for the host/node name metric. - Add connection metrics with flow exporter feature enabled. Specifically total connection count in conntrack table and connection count in Antrea connetion store.

antoninbas

Just a general comment for now before I review further: what's the rationale for having individual configuration switches for different categories of metrics?

antoninbas · 2020-08-27T19:40:41Z

build/yamls/base/conf/antrea-agent.conf

+# EnablePrometheusMetrics is a map of metric categories to bool flags that enables or disables those metrics exposure.
+enablePrometheusMetrics:
+# Metrics in all below categories can be enabled or disabled through AllMetrics.
+#  AllMetrics: false
+# Metrics related to Pods. They can be enabled or disabled through PodMetrics.
+#  PodMetrics: false
+# Metrics related to Network Policies. They can be enabled or disabled through NetworkPolicyMetrics.
+#  NetworkPolicyMetrics: false
+# Metrics related to OVS switch. They can be enabled or disabled through OVSMetrics.
+#  OVSMetrics: false
+# Metrics related to connections when FlowExporter feature is enabled. They can be enabled or disabled through ConnectionMetrics.
+#  ConnectionMetrics: false


what's the advantage of exposing all these configuration switches?

unless this change is driven by some specific use case, I think we should look into addressing #723 first.

srikartati · 2020-08-27T20:45:48Z

Just a general comment for now before I review further: what's the rationale for having individual configuration switches for different categories of metrics?

Hi Antonin, The idea is that the user might be interested in specific metrics only, say he only wants to track the number of network policies at Antrea agent and not in other metrics. When we have a large number of metrics (not now but as we keep adding), categorizing them would help in optimizing resources both when tracking at the agent and also during scraping by the Prometheus server.

Agree that the functionality can be enhanced when we can change the configmap dynamically without needing a restart (#723 ).

antoninbas · 2020-08-27T22:01:28Z

@srikartati I haven't checked but do other projects (e.g. k8s) support conditionally enabling some Prometheus metrics? I am not sure the optimization on the Agent side is worth introducing new configuration dimensions. On the Prometheus server side, can't scraping be configured on a per-metric basis?

srikartati · 2020-08-28T17:08:51Z

@srikartati I haven't checked but do other projects (e.g. k8s) support conditionally enabling some Prometheus metrics? I am not sure the optimization on the Agent side is worth introducing new configuration dimensions. On the Prometheus server side, can't scraping be configured on a per-metric basis?

@antoninbas Hubble from Cilium does the conditional supporting, but the metric list is sent through cobra command flags and not the configMap.
https://github.com/cilium/hubble/blob/v0.5/pkg/metrics/metrics.go#L68
https://github.com/cilium/hubble/blob/v0.5/cmd/serve/serve.go#L137

I know that on the Prometheus server side we can drop some metrics before ingestion by relabeling them with action drop. However, scraping will still happen. I am not aware if there is some other way of not configuring a subset of metrics.

srikartati · 2020-09-02T23:18:11Z

Had an offline discussion with @antoninbas. As there is a significant change in the configMap format. This may need some more discussion in wider forum. Connection metrics will be handled separately.
So closing this PR.

vmwclabot added the cla-not-required label Aug 24, 2020

srikartati force-pushed the flow_metrics branch from 615571f to f5722d9 Compare August 24, 2020 22:59

srikartati changed the title ~~WIP: Categorize Prometheus metrics and add connection metrics~~ WIP: Categorize Prometheus metrics and add Connection metrics Aug 24, 2020

srikartati force-pushed the flow_metrics branch from f5722d9 to 777a4b7 Compare August 25, 2020 04:42

srikartati changed the title ~~WIP: Categorize Prometheus metrics and add Connection metrics~~ [WIP] Categorize Prometheus metrics and add Connection metrics Aug 25, 2020

srikartati changed the title ~~[WIP] Categorize Prometheus metrics and add Connection metrics~~ Categorize Prometheus metrics and add Connection metrics Aug 26, 2020

srikartati requested review from ksamoray and antoninbas August 26, 2020 23:23

srikartati force-pushed the flow_metrics branch from 777a4b7 to 01896e8 Compare August 26, 2020 23:27

antoninbas mentioned this pull request Aug 27, 2020

Connection Tracking Metrics #1033

Closed

antoninbas reviewed Aug 27, 2020

View reviewed changes

srikartati closed this Sep 2, 2020

srikartati deleted the flow_metrics branch September 2, 2020 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorize Prometheus metrics and add Connection metrics #1143

Categorize Prometheus metrics and add Connection metrics #1143

srikartati commented Aug 24, 2020 •

edited

Loading

antrea-bot commented Aug 24, 2020

antoninbas left a comment

antoninbas Aug 27, 2020

antoninbas Aug 27, 2020

srikartati commented Aug 27, 2020

antoninbas commented Aug 27, 2020

srikartati commented Aug 28, 2020

srikartati commented Sep 2, 2020

Categorize Prometheus metrics and add Connection metrics #1143

Categorize Prometheus metrics and add Connection metrics #1143

Conversation

srikartati commented Aug 24, 2020 • edited Loading

antrea-bot commented Aug 24, 2020

antoninbas left a comment

Choose a reason for hiding this comment

antoninbas Aug 27, 2020

Choose a reason for hiding this comment

antoninbas Aug 27, 2020

Choose a reason for hiding this comment

srikartati commented Aug 27, 2020

antoninbas commented Aug 27, 2020

srikartati commented Aug 28, 2020

srikartati commented Sep 2, 2020

srikartati commented Aug 24, 2020 •

edited

Loading