Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/dockerstatsreceiver] Improvements proposal for dockerstatsreceiver #9794

Closed
jamesmoessis opened this issue May 8, 2022 · 14 comments · Fixed by #18449
Closed

[receiver/dockerstatsreceiver] Improvements proposal for dockerstatsreceiver #9794

jamesmoessis opened this issue May 8, 2022 · 14 comments · Fixed by #18449
Assignees

Comments

@jamesmoessis
Copy link
Contributor

jamesmoessis commented May 8, 2022

Looking to invest in the dockerstatsreceiver, and I noticed there are a couple of improvements that we could make to the component. I am happy to implement the below improvements once discussed here.

EDIT: This list below is the initial list. See my below comment for a refined list.

  1. Configuration ability to turn certain metrics on/off. Similar to hostmetricsreceiver. A set of default metrics will be on by default, and then the user has the ability to turn each metric on/off as they please. This is achieved by using mdatagen.
  2. Add a new metric memory.usage.total_cache. This is something we need to get a better picture of the container's memory usage. This metric can be off-by-default assuming item (1) is implemented.
  3. Documentation improvements. This also can be achieved by onboarding the component to mdatagen since it auto generates the documentation for each metric.
  4. (Breaking change) Rename configuration options to use attributes, not labels. For example, container_labels_to_metric_labels would be renamed to container_attributes_to_metric_attributes
  5. Once these items are implemented, declare the component as alpha with the aim to eventually move it into beta.
@jamesmoessis
Copy link
Contributor Author

@rmfitzpatrick as the codeowner of this component, I would really appreciate your feedback and thoughts on this :)

@jamesmoessis
Copy link
Contributor Author

I've edited the description of the issue to include onboarding to mdatagen in order to align with other similar scrapers in the contrib.

@rmfitzpatrick
Copy link
Contributor

rmfitzpatrick commented May 23, 2022

Thanks @jamesmoessis for these suggestions (and your patience!). I am a supporter of all of them though I think we may want to have an opt-in/grace period using redundant fields/feature flags as possible.

  1. Configuration ability to turn certain metrics on/off. Similar to hostmetricsreceiver. A set of default metrics will be on by default, and then the user has the ability to turn each metric on/off as they please. This is achieved by using mdatagen.

This is similar to my initial implementation but I was requested to defer to a processor. The requested approach doesn't appear to be the de facto standard at this point* and metric groups or similar config options are used in other components so I'm still for it. The complexity of filtering can also make it very undesirable as is.

  1. Documentation improvements. This also can be achieved by onboarding the component to mdatagen since it auto generates the documentation for each metric.

I would really like this to happen.

@jamesmoessis
Copy link
Contributor Author

Thanks @rmfitzpatrick! Yes I do agree about the op-in/grace period.

It sounds like we are on the same page. I will put some PRs in in the coming weeks to gradually implement these :)

@jamesmoessis
Copy link
Contributor Author

@rmfitzpatrick another thing I recently noticed about this receiver is that it records memory metrics as gauges, while the otel spec says to record these as sums. As I'm unsure how heavily this receiver is used, how much of a pain would that be to change? Ideally they are sums.

@rmfitzpatrick
Copy link
Contributor

rmfitzpatrick commented Jun 28, 2022

@jamesmoessis I'm not 100% on the spec but I think the difference between gauges and non-monotonic* sums/UpDownCounters is the implicit validity of adding their values across attributes, so I think there would be few if any side effects: https://github.com/open-telemetry/opentelemetry-specification/blob/bff2cc1365bb81b03f78a732dec822fff787e288/specification/metrics/supplementary-guidelines.md#guidelines-for-instrumentation-library-authors

(Anecdotally the SignalFx exporter converts non-monotonic sums to SFx gauges.)

@jamesmoessis
Copy link
Contributor Author

jamesmoessis commented Jul 13, 2022

After working on the receiver more, here is an updated list of further (possibly breaking) changes that we should make to uplift the receiver.

  • Change various units so they more accurately represent the measurements.
  • Fix mistakes of incorrect units and other minor semantics.
  • Change the blockIO metrics so the operation is an attribute, and so we can divide the number of those metrics by 6. This simplifies the code, and greatly reduces the amount of generated docs/code.
  • Change the memory metrics from gauge to sum.
  • All metrics are currently enabled by default. We should make it so a sensible set of metrics are enabled by default.
    • Most of the memory stats can be disabled as they are very specific.
  • Only emit entire CPU system usage once (not per container, and with different resource).

@jamesmoessis
Copy link
Contributor Author

jamesmoessis commented Sep 28, 2022

Update: I've raised most changes that I want to now. In the coming weeks, we will start using the featuregate-enabled scraper more heavily, building production-level confidence.

Once I'm happy with the stability of it, I will slowly:

  • Transition the featuregate to be the default, deprecating the v1 scraper (which is the main breaking change for users)
  • Remove the v1 scraper.
  • Transition the component into beta.

@dmitryax
Copy link
Member

dmitryax commented Oct 15, 2022

@jamesmoessis JFYI I created this issue some time ago #13848. Looks like it covers item 4 in your list.

@dmitryax
Copy link
Member

Transition the featuregate to be the default, deprecating the v1 scraper (which is the main breaking change for users)

I believe we should add warning saying that it'll be enabled by default soon so it doesn't happen unexpectedly.

@jamesmoessis
Copy link
Contributor Author

@dmitryax sounds good I commented on that issue.

I believe we should add warning saying that it'll be enabled by default soon so it doesn't happen unexpectedly.

I will raise a PR for this next week 👍

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 23, 2023
@jamesmoessis
Copy link
Contributor Author

This is not stale, aiming to be closed by 0.71.0 release

@jamesmoessis
Copy link
Contributor Author

#18019 closes this ticket out, once it's merged.

The next step for this component would be to move it to beta, but I think there can be another ticket for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment