Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/awsemf] Add StorageResolution config to support high-resolution metrics #29506

Closed
bviolier opened this issue Nov 26, 2023 · 21 comments · Fixed by #36057
Closed

[exporter/awsemf] Add StorageResolution config to support high-resolution metrics #29506

bviolier opened this issue Nov 26, 2023 · 21 comments · Fixed by #36057
Labels

Comments

@bviolier
Copy link

Component(s)

exporter/awsemf

Is your feature request related to a problem? Please describe.

We want to be able to push high-resolution metrics from the opentelemetry-collector. This means that we should be able to set StorageResolution to 1.

Describe the solution you'd like

Have the option in the AWSEMF config to turn on high-resolution metrics through the StorageResolution config.

Describe alternatives you've considered

There doesn't seem to be another way of doing this, except for creating a new exporter that uses either this setting, or directly uses PutMetricData, which would mean a completely new exporter.

Additional context

I will be creating a PR soon.

@bviolier bviolier added enhancement New feature or request needs triage New item requiring triage labels Nov 26, 2023
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bryan-aguilar
Copy link
Contributor

More information on storage resolution can be found here for anyone interested.

@bviolier what is your idea for the configuration changes necessary to support this? We would need to be able to specify metrics by their name and then give them an associated storage resolution right?

I could see how this could be a useful feature but understanding the configuration required for it first would be a good start.

@bviolier
Copy link
Author

bviolier commented Nov 27, 2023

Thanks for the quick reply @bryan-aguilar. I created a PR for it. I chose to put it at the same level as things like dimension_rollup_option, detailed_metrics and log_retention as that seemed like things on the same configuration level.

Also, I am assuming (as was done before, I believe) that people can create multiple exporters if they want different settings for different metrics.

If you think it should look different, let me know. I'll try to devise a different way of setting this configuration value.

@bryan-aguilar
Copy link
Contributor

I had thought about how we could adopt storage resolution in the past and I thought that having a global storageresolution option would lead to more user pain. With a global storage resolution flag like what was proposed in the PR, users would have to configure a high and low storage resolution pipeline in their collector config. Then, users would have to drop high resolution metrics from the low resolution pipelines and vise versa. Is my understanding here correct? If you did not drop you could end up with duplicated metrics right?

I think it would behoove us to consider other options for this. Here are two thoughts that come to mind

  1. A configuration option where you can supply a list of metric name which will have high storage resolution applied. A list of regex strings should be accepted.
awsemf:
  high_resolution: 
    - metric_name_foo
    - metric_name_bar_selector*
  1. We could also look for a high resolution, or storage resolution, attribute. If it is set then we would add the high or low storage resolution to the metric. The benefit of this is that this could be done at the collector level using available processors, and would require no configuration change, or at the point at which the metric is generated. The downside would be that relying on a metric attributing name starts to move into the territory of semantic conventions and we may want to consider ironing the naming first.

@bviolier
Copy link
Author

You replied on the PR saying you think it shouldn't be a top-level configuration (like I suggested above).

The other option I can think of is making it part of metric_declarations with as a new attribute called storage_resolution, that gives you the freedom to define it for every metric you want.

The downside is that it does force you to define the metrics in metric_declarations if you want to use the storage_resolution.

An example would then look like this:

exporters:
  awsemf:
    log_group_name: '/containers/{ClusterName}/performance'
    log_stream_name: '{TaskId}'
    dimension_rollup_option: NoDimensionRollup
    metric_declarations:
      - dimensions: [ [ ClusterName, TaskDefinitionFamily ] ]
        storage_resolution: 1,
        metric_name_selectors:
          - MemoryUtilized
          - MemoryReserved
          - CpuUtilized
          - CpuReserved

@bviolier
Copy link
Author

bviolier commented Nov 27, 2023

I think doing an attribute might get hairy pretty quickly indeed. It does give you the power on the client to force specific metrics into high-resolution, but in our case, we specifically wouldn't want the client to have that power (that being said, that can easily be fixed with filters also).

The high-resolution attribute with a list seems like a good option, but I didn't think that would be acceptable as the value on AWS is 1 or 60 (and specifically not true/false). And I assumed there was a reason behind it (to add more values in the future maybe?).

@bryan-aguilar
Copy link
Contributor

The other option I can think of is making it part of metric_declarations with as a new attribute called storage_resolution, that gives you the freedom to define it for every metric you want.

Putting it as part of metric declarations may be a good idea also. It may be even more appropriate then my suggestion also. Putting it within the metric declarations object would give more flexibility.

The downside is that it does force you to define the metrics in metric_declarations if you want to use the storage_resolution.

Agreed, but I think this is an acceptable tradeoff when comparing the user experience of a top level storageresolution configuration option.

I think doing an attribute might get hairy pretty quickly indeed. It does give you the power on the client to force specific metrics into high-resolution, but in our case, we specifically wouldn't want the client to have that power (that being said, that can easily be fixed with filters also)

I think it's fine to strike that idea, especially if it does not fit your use case. I have not heard a request for this either and was just throwing out ideas. The good thing is that this can always be added in the future if we do get a feature request for this functionality.

I didn't think that would be acceptable as the value on AWS is 1 or 60 (and specifically not true/false). And I assumed there was a reason behind it (to add more values in the future maybe?).

This is a good call out and I don't know the exact reasoning for that. I can see if I can find out though. We shouldn't back ourselves into a corner right away if there's the chance that there could be additional values in the future.

@bviolier
Copy link
Author

I am up for updating the PR with either the array option (assuming true/false is ok) or going the metric_declarations way! Let me know what you think will be the best option.

On our side, we are now testing this on sandbox (with the current PR) and our Alarm for auto-scaling ECS containers went from 220+ seconds to between 40 and 50 seconds after load increase.

@bryan-aguilar
Copy link
Contributor

I may need to think on this a bit more but I'm leaning toward making this a configurable part of the metric_declarations object. I think the flexibility of being able to tie storage resolution to a metric directive object is what we would want.

There is also another option we hadn't considered. There is a metrics descriptor configuration option. It seems appropriate that storage resolution should be included in that.

@bryan-aguilar bryan-aguilar added priority:p2 Medium and removed needs triage New item requiring triage labels Nov 27, 2023
@bviolier
Copy link
Author

The Metrics Descriptor is interesting indeed! I was thinking though that the metric-directive also gives you the option to store the same metric in different resolutions when using different dimensions - which is also an interesting case.

Just let me know when you have come to a conclusion and I will pick this up again. No hurry from my side, as we can already use it by building our own collector.

@bryan-aguilar
Copy link
Contributor

I think both capabilities can work together. There may be times when using the metric descriptor field that you need to overwrite a resolution of a metric in all dimension sets. Having that option in metric descriptor provided a nice option.

But also having the ability to configure it per metric/dimension set is useful.

Also, I think if a storage resolution for a metric is configured in both the directive and descriptor then the descriptor would take priority. Would you tend to agree with that order of precedence?

@bryan-aguilar bryan-aguilar added priority:p2 Medium and removed priority:p2 Medium labels Dec 1, 2023
@bviolier
Copy link
Author

bviolier commented Dec 4, 2023

It sounds like a good plan to implement in both, indeed.

I would, however, have it the other way around. The directive is the most precise one, you can create "new metrics" out of a set of metrics + dimensions. You might want to have different storage-resolutions for different sets of dimensions. I could see myself setting a storage-resolution with the descriptor that says all CPU resources should be high-resolution, but then having a set of dimensions set to low-resolution through the directive.

@bryan-aguilar
Copy link
Contributor

I see your point and I agree that it would make more sense to have the directive field take precedence. Thanks you for the feedback!

@bviolier
Copy link
Author

I did a first pass for moving the storage resolution. Please let me know what you think, I had to dive deep a bit for the declaration to work, but I got something working. Open to improve, of course.

I didn't test it locally, but I updated the tests.

@bviolier
Copy link
Author

@bryan-aguilar it seems the PR has been closed due to inactivity. Do you know when you have some time to review and get this merged? Thanks!

@zehsor
Copy link

zehsor commented Feb 29, 2024

@bryan-aguilar any updates on this? I'd love to help if there's anything needed, we're desperate for this functionality!

@zehsor
Copy link

zehsor commented Apr 23, 2024

@Aneurysm9 @shaochengwang @mxiamxia @bryan-aguilar
any updates on this?

@zehsor
Copy link

zehsor commented Jun 13, 2024

@Aneurysm9 @shaochengwang @mxiamxia @bryan-aguilar
it feel like this is not maintained anymore at all after not receiving any answer for almost half a year..

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 13, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 12, 2024
@jpbarto
Copy link
Contributor

jpbarto commented Oct 21, 2024

@bryan-aguilar this feature is still needed by users of opentelemetry. Is there any update on its implementation?

andrzej-stencel pushed a commit that referenced this issue Nov 27, 2024
…36057)

This change implements the capability for users of the AWS EMF Exporter
to specify which metrics they would like to have sent to CloudWatch with
a 1 second Storage Resolution. The EMF Exporter now explicitly states
the Storage Resolution for each metric as 60 seconds, the previous
implicit default, so there is no behavior change. If the user specifies
a metric to have 1 second resolution it will be sent to CloudWatch EMF
with the Storage Resolution set accordingly.

#### Description
Previously the AWS EMF Exporter sent metric data into CloudWatch without
specifying the storage resolution. CloudWatch would then default to a 60
second storage resolution, even if metrics are sent more frequently than
every 60 seconds. This would confuse users when they try to apply
functions like AVG, SUM, MAX, or MIN to their metrics with a period of 5
seconds. The function would be applied by CloudWatch to 60 seconds worth
of data and produced unexpected results and confusion for the user. This
commit makes this 60 second resolution explicit in the messages sent to
CloudWatch by the EMF Exporter and also gives the user the option to
specify a more granular 1 second resolution per metric in the
configuration file of the AWS EMF Exporter.

#### Link to tracking issue
Fixes #29506 

#### Testing
Added tests to verify that config file parsing validates a metric
descriptor that specifies either a valid unit, valid storage resolution,
or both and rejects other invalid metric descriptors.
Added tests that the translation from metric data to CW EMF carries a
storage resolution with it, defaulting to a value of 60 (current
behavior) if no storage resolution valid is explicitly set in the
configuration.

#### Documentation
Comments added in the code but have not updated the README.md pending
acceptance of the PR.
shivanthzen pushed a commit to shivanthzen/opentelemetry-collector-contrib that referenced this issue Dec 5, 2024
…pen-telemetry#36057)

This change implements the capability for users of the AWS EMF Exporter
to specify which metrics they would like to have sent to CloudWatch with
a 1 second Storage Resolution. The EMF Exporter now explicitly states
the Storage Resolution for each metric as 60 seconds, the previous
implicit default, so there is no behavior change. If the user specifies
a metric to have 1 second resolution it will be sent to CloudWatch EMF
with the Storage Resolution set accordingly.

#### Description
Previously the AWS EMF Exporter sent metric data into CloudWatch without
specifying the storage resolution. CloudWatch would then default to a 60
second storage resolution, even if metrics are sent more frequently than
every 60 seconds. This would confuse users when they try to apply
functions like AVG, SUM, MAX, or MIN to their metrics with a period of 5
seconds. The function would be applied by CloudWatch to 60 seconds worth
of data and produced unexpected results and confusion for the user. This
commit makes this 60 second resolution explicit in the messages sent to
CloudWatch by the EMF Exporter and also gives the user the option to
specify a more granular 1 second resolution per metric in the
configuration file of the AWS EMF Exporter.

#### Link to tracking issue
Fixes open-telemetry#29506 

#### Testing
Added tests to verify that config file parsing validates a metric
descriptor that specifies either a valid unit, valid storage resolution,
or both and rejects other invalid metric descriptors.
Added tests that the translation from metric data to CW EMF carries a
storage resolution with it, defaulting to a value of 60 (current
behavior) if no storage resolution valid is explicitly set in the
configuration.

#### Documentation
Comments added in the code but have not updated the README.md pending
acceptance of the PR.
ZenoCC-Peng pushed a commit to ZenoCC-Peng/opentelemetry-collector-contrib that referenced this issue Dec 6, 2024
…pen-telemetry#36057)

This change implements the capability for users of the AWS EMF Exporter
to specify which metrics they would like to have sent to CloudWatch with
a 1 second Storage Resolution. The EMF Exporter now explicitly states
the Storage Resolution for each metric as 60 seconds, the previous
implicit default, so there is no behavior change. If the user specifies
a metric to have 1 second resolution it will be sent to CloudWatch EMF
with the Storage Resolution set accordingly.

#### Description
Previously the AWS EMF Exporter sent metric data into CloudWatch without
specifying the storage resolution. CloudWatch would then default to a 60
second storage resolution, even if metrics are sent more frequently than
every 60 seconds. This would confuse users when they try to apply
functions like AVG, SUM, MAX, or MIN to their metrics with a period of 5
seconds. The function would be applied by CloudWatch to 60 seconds worth
of data and produced unexpected results and confusion for the user. This
commit makes this 60 second resolution explicit in the messages sent to
CloudWatch by the EMF Exporter and also gives the user the option to
specify a more granular 1 second resolution per metric in the
configuration file of the AWS EMF Exporter.

#### Link to tracking issue
Fixes open-telemetry#29506 

#### Testing
Added tests to verify that config file parsing validates a metric
descriptor that specifies either a valid unit, valid storage resolution,
or both and rejects other invalid metric descriptors.
Added tests that the translation from metric data to CW EMF carries a
storage resolution with it, defaulting to a value of 60 (current
behavior) if no storage resolution valid is explicitly set in the
configuration.

#### Documentation
Comments added in the code but have not updated the README.md pending
acceptance of the PR.
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
…pen-telemetry#36057)

This change implements the capability for users of the AWS EMF Exporter
to specify which metrics they would like to have sent to CloudWatch with
a 1 second Storage Resolution. The EMF Exporter now explicitly states
the Storage Resolution for each metric as 60 seconds, the previous
implicit default, so there is no behavior change. If the user specifies
a metric to have 1 second resolution it will be sent to CloudWatch EMF
with the Storage Resolution set accordingly.

#### Description
Previously the AWS EMF Exporter sent metric data into CloudWatch without
specifying the storage resolution. CloudWatch would then default to a 60
second storage resolution, even if metrics are sent more frequently than
every 60 seconds. This would confuse users when they try to apply
functions like AVG, SUM, MAX, or MIN to their metrics with a period of 5
seconds. The function would be applied by CloudWatch to 60 seconds worth
of data and produced unexpected results and confusion for the user. This
commit makes this 60 second resolution explicit in the messages sent to
CloudWatch by the EMF Exporter and also gives the user the option to
specify a more granular 1 second resolution per metric in the
configuration file of the AWS EMF Exporter.

#### Link to tracking issue
Fixes open-telemetry#29506 

#### Testing
Added tests to verify that config file parsing validates a metric
descriptor that specifies either a valid unit, valid storage resolution,
or both and rejects other invalid metric descriptors.
Added tests that the translation from metric data to CW EMF carries a
storage resolution with it, defaulting to a value of 60 (current
behavior) if no storage resolution valid is explicitly set in the
configuration.

#### Documentation
Comments added in the code but have not updated the README.md pending
acceptance of the PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants