Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/dockerstats] Revisit configuration for setting attributes from container labels and env vars #13848

Closed
dmitryax opened this issue Sep 2, 2022 · 18 comments

Comments

@dmitryax
Copy link
Member

dmitryax commented Sep 2, 2022

  1. container_labels_to_metric_labels and env_vars_to_metric_labels names are confusing. The container labels and env var key/values are added as Resource attributes not data point attributes, which is right, but configuration names are misleading.

  2. container_labels_to_metric_labels and env_vars_to_metric_labels allows users to set only known predefined set of labels or env vars. For example, if users want to set all container labels as resource attributes, they cannot do that with current configuration. I would suggest to make the configuration interface more flexible. We can align it with existing functionality of k8s attributes processor with does a similar thing:

    Labels []FieldExtractConfig `mapstructure:"labels"`

@dmitryax dmitryax changed the title [receiver/dockerstats] Revisit configuration interface for setting attributes from container labels and env vars [receiver/dockerstats] Revisit configuration for setting attributes from container labels and env vars Sep 2, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Pinging code owners: @rmfitzpatrick. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jamesmoessis
Copy link
Contributor

@dmitryax For item 1, I would say container_labels_to_resource_attributes would be a suitable name.

I'd be happy to work on these some time soon. If I was codeowner of this component I would get pinged for stuff like this too, I'd be happy to put my hand up for that.

@dmitryax
Copy link
Member Author

@jamesmoessis we already have a PR with a more generic approach #14537

It would be great if we can have a consistent config interface for such use cases, in the docker receiver, k8s attributes processor and others

@jamesmoessis
Copy link
Contributor

Looks like that PR went stale.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 23, 2023
@paologallinaharbur
Copy link
Member

paologallinaharbur commented Apr 19, 2023

Hello! We are interested as well in this feature and I am ready to contribute if we decide which is the desired behavior and design/config. 😄

Context of the issue

From my point of view, the source of the complexity is that container_labels_to_metric_labels covers two use-cases:

  • selecting which labels to use as attributes
  • renaming the labels while saving them as attributes (should this be handled by a processor?)

Introducing regexes to select labels we would need a mechanism to rename the selected attributes (that are not a fixed list). Unfortunately, this is not straightforward.

For example, the k8sAttributesProcessor allows to do that, but the configuration is a bit complex and difficult to document (having at the same time Key, KeyRegex, Regex)

I went through the code and the old PR and I see two possible paths forward.

OPTION 1 Drop renaming feature, embrace filterset to select the attributes/envVars

My favorite approach would be to have a simple configuration for the dockerstatreceiver and to leave the renaming functionality to the attributeprocessor:

type Config struct {          (names can be discussed directly in the PR)
...
	includeLabels  labelMatchConfig  `mapstructure:"include_labels"`
	includeEnvVars envVarMatchConfig `mapstructure:"include_env_vars"`
...
}
type labelMatchConfig struct {
	filterset.Config `mapstructure:",squash"`
	labelsNames      []string `mapstructure:"keys"`
}

We would merely select which labels/env to include with regexes, no renaming will be possible in this case.

PROS:

  • easy to be implemented, used, and configured.
  • the same implementation of includeFS,IncludeFSTypes,IncludeDevices,.. in hostReceiver and based on the common package filterset therefore having cache available by default
  • the receiver is not "processing" data anymore

CONS:

  • the user will need to rely on a processor to modify the attribute names

In this case, we would not align the implementation of the k8sattributes processor and the dockerstat receiver since they would serve different use-cases.

OPTION 2 Support regexes when extracting and renaming attributes, align k8s attribute processor

The other approach would be to implement a configuration as the one suggested in the old PR. It would be feature complete, but more complex to configure and develop.

In that case, I think we should break down the duties of such a configuration:

  • into a filter in internal/filter/newfilter capable of returning the matching groups
  • a package under internal using such filter and allowing extracting and renaming a series of attributes from a map

WDYT? 🤔

@dmitryax dmitryax removed the Stale label Apr 20, 2023
@jamesmoessis
Copy link
Contributor

@paologallinaharbur I like option 2, with something similar to what @dmitryax suggested in his comment on #14537. Ideally the same interface or struct is used across dockerstats/podman/k8s components.

Option 1 is nice and simple, but I wouldn't want to take the ability to name the attribute keys away from the user.

@paologallinaharbur
Copy link
Member

paologallinaharbur commented Apr 24, 2023

In option 1 we would not take away such a possibility. We would delegate it to another component, for example, the attributeProcessor is already available and working for all receivers.
The user could do something like the following if needed (most users will not need that anyway):

processors:
  resource:
    attributes:
      - key: "new.name"
        from_attribute: "metric.name.to.be.modified"
        action: upsert
      - key: "metric.name.to.be.modified"
        action: delete

In this way:

  • users that will need to punctually modify attribute names could rely on the different processors that are way more powerful and "standard"
  • the configuration will be based on an already existing component
  • the configuration for most users will be simple
receivers:
  docker_stats:
    include_labels:
      match_type: <strict|regexp>
      labels: ["l1","l2"]

I am worried that to cover an unusual use case we would cause the implementation and the configuration to be more complex for everyone.
Are we sure that this would not be enough?
Is there a drawback I do not see with delegating it to the processors?


However, as I stated before mine is just a preference and I am quite new to the project.
Therefore, if you and also @dmitryax vouch for Option2 I'll be happy to work on it defining the configuration and then moving to the implementation 😃

@jamesmoessis
Copy link
Contributor

jamesmoessis commented May 15, 2023

I'm still not super convinced that there's much problem with how it works right now to justify doing anything except maybe making an alias for the name so it says resource_attributes instead of metric_labels.

I see where you are coming from @paologallinaharbur but I still think option 1 is a little bit inconvenient for people using the current feature.

Would like to hear @dmitryax's thoughts on this too.

@dmitryax
Copy link
Member Author

The biggest problem with the option 1 is that it's not immune to the attribute conflicts while the k8sattributes approach allows you to do the defensive in-place renames like k8s.namespace.labels.$$1.

@paologallinaharbur there is a big effort lead by @TylerHelmuth to migrate the existing configurations for components like filter to OTTL. I think we should look into utilizing OTTL here as well. I'm not sure how it'll allow you to do the k8s attributes regexp filtering/renaming tho, but we need to consider that use case in the OTTL.

@paologallinaharbur
Copy link
Member

paologallinaharbur commented May 16, 2023

I'm still not super convinced that there's much problem with how it works right now [...]

The biggest issue I see right now is that you cannot include "all" labels and you need to specify them one by one in the config.
It could be inconvenient to be forced to restart the collector with a new config whenever you have a new label (especially in big environments)

There is a big effort lead by @TylerHelmuth to migrate the existing configurations for components like filter to OTTL.

Thanks for the heads up, I'll check it out and try to come up with a different proposal leveraging it

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 17, 2023
@dmitryax dmitryax removed the Stale label Aug 9, 2023
@dmitryax
Copy link
Member Author

dmitryax commented Aug 9, 2023

Given that we have replace_pattern and replace_match OOTL functions, I don't think we need fields to handle values like source_value and attribute_value as suggested in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/14537/files#r993965330. We can strip it to the following:

type FieldExtractConfig struct {
	// MatchType specifies whether the regex must be applied on SourceKey and SourceValue,
	// can be set to `"strict"` (default) or `"regex"`.
	MatchType MatchType `mapstructure:"match_type"`

	// SourceKey specifies name of the source key that will be fetched, regex can be used 
	// if MatchType="regex". SourceKey is the only required field in FieldExtractConfig.
	SourceKey string `mapstructure:"source_key"`

	// AttributeKey specifies name of the resource attribute that will be added in case of 
	// SourceKey match. Matching groups from SourceKey regex can be used in the provided
	// string if MatchType="regex". If not specified, a value matched by SourceKey will be used.
	AttributeKey string `mapstructure:"attribute_key"`
}

But it's still a pretty complicated API. I'll submit another issue where we can come up with a unified filtering interface that can be applied in all collector configuration parts

@TylerHelmuth
Copy link
Member

I'll submit another issue where we can come up with a unified filtering interface that can be applied in all collector configuration parts

@dmitryax FYI we have the building blocks for this as it is something I needed for #18642. All components can use internal/filterottl to implement conditional logic using OTTL. Ultimately I'd like to remove the other internal/filter* packages.

@dmitryax
Copy link
Member Author

dmitryax commented Aug 10, 2023

@TylerHelmuth, the problem is that we need a way to filter out external data sets (slices and maps), not OTLP objects. Do you think it's something OTTL should be used for?

@TylerHelmuth
Copy link
Member

I need to comprehend this issue more, but in general OTTL the language is written to treat common.Slice and []any the same and pcommon.Map and map[string]any the same. Most (If not all) functions that depend on accessing these types could be given either.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 16, 2023
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants