Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorize Prometheus metrics and add Connection metrics #1143

Closed
wants to merge 1 commit into from

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Aug 24, 2020

This change does the following:

  • Categorizes Antrea Agent prometheus metrics and provides a way to flexibly configure them.
  • Remove host/node name metric. I changed antrea-prometheus.yaml to add node name to instance label instead of
    IP:port, which make promql queries easier.
    I do not know if there is any other benefit for the host/node name metric.
  • Add connection metrics with the flow exporter feature enabled. Specifically total connection count
    in conntrack table and connection count in Antrea connection store.

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@srikartati srikartati changed the title WIP: Categorize Prometheus metrics and add connection metrics WIP: Categorize Prometheus metrics and add Connection metrics Aug 24, 2020
@srikartati srikartati changed the title WIP: Categorize Prometheus metrics and add Connection metrics [WIP] Categorize Prometheus metrics and add Connection metrics Aug 25, 2020
@srikartati srikartati changed the title [WIP] Categorize Prometheus metrics and add Connection metrics Categorize Prometheus metrics and add Connection metrics Aug 26, 2020
This change does the following:
- Categorize Antrea Agent prometheus metrics and provides a way to flexibly
  configure them.
- Remove host/node name metric. I changed the antrea-prometheus.yaml to add nodename
  to instance lable instead of IP:port, which makes promql queries easier.
  I do not know if there is any other benefit for the host/node name metric.
- Add connection metrics with flow exporter feature enabled. Specifically total connection count
  in conntrack table and connection count in Antrea connetion store.
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general comment for now before I review further: what's the rationale for having individual configuration switches for different categories of metrics?

Comment on lines +67 to +78
# EnablePrometheusMetrics is a map of metric categories to bool flags that enables or disables those metrics exposure.
enablePrometheusMetrics:
# Metrics in all below categories can be enabled or disabled through AllMetrics.
# AllMetrics: false
# Metrics related to Pods. They can be enabled or disabled through PodMetrics.
# PodMetrics: false
# Metrics related to Network Policies. They can be enabled or disabled through NetworkPolicyMetrics.
# NetworkPolicyMetrics: false
# Metrics related to OVS switch. They can be enabled or disabled through OVSMetrics.
# OVSMetrics: false
# Metrics related to connections when FlowExporter feature is enabled. They can be enabled or disabled through ConnectionMetrics.
# ConnectionMetrics: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the advantage of exposing all these configuration switches?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless this change is driven by some specific use case, I think we should look into addressing #723 first.

@srikartati
Copy link
Member Author

Just a general comment for now before I review further: what's the rationale for having individual configuration switches for different categories of metrics?

Hi Antonin, The idea is that the user might be interested in specific metrics only, say he only wants to track the number of network policies at Antrea agent and not in other metrics. When we have a large number of metrics (not now but as we keep adding), categorizing them would help in optimizing resources both when tracking at the agent and also during scraping by the Prometheus server.

Agree that the functionality can be enhanced when we can change the configmap dynamically without needing a restart (#723 ).

@antoninbas
Copy link
Contributor

@srikartati I haven't checked but do other projects (e.g. k8s) support conditionally enabling some Prometheus metrics? I am not sure the optimization on the Agent side is worth introducing new configuration dimensions. On the Prometheus server side, can't scraping be configured on a per-metric basis?

@srikartati
Copy link
Member Author

@srikartati I haven't checked but do other projects (e.g. k8s) support conditionally enabling some Prometheus metrics? I am not sure the optimization on the Agent side is worth introducing new configuration dimensions. On the Prometheus server side, can't scraping be configured on a per-metric basis?

@antoninbas Hubble from Cilium does the conditional supporting, but the metric list is sent through cobra command flags and not the configMap.
https://github.com/cilium/hubble/blob/v0.5/pkg/metrics/metrics.go#L68
https://github.com/cilium/hubble/blob/v0.5/cmd/serve/serve.go#L137

I know that on the Prometheus server side we can drop some metrics before ingestion by relabeling them with action drop. However, scraping will still happen. I am not aware if there is some other way of not configuring a subset of metrics.

@srikartati
Copy link
Member Author

Had an offline discussion with @antoninbas. As there is a significant change in the configMap format. This may need some more discussion in wider forum. Connection metrics will be handled separately.
So closing this PR.

@srikartati srikartati closed this Sep 2, 2020
@srikartati srikartati deleted the flow_metrics branch September 2, 2020 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants