Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[otlp] Rename key for enabling retries during transient failures #5495

Conversation

vishweshbankwar
Copy link
Member

@vishweshbankwar vishweshbankwar commented Apr 1, 2024

Fixes #
Design discussion issue #

Changes

Previously, OTEL_DOTNET_EXPERIMENTAL_OTLP_ENABLE_INMEMORY_RETRY was introduced to offer an in-memory retry strategy as an experimental feature.

For adding support for a persistent storage-based retry strategy, we need an additional setting.

To avoid having multiple settings for enabling different retry strategies, introducing a single setting, OTEL_DOTNET_EXPERIMENTAL_OTLP_RETRY. When OTEL_DOTNET_EXPERIMENTAL_OTLP_RETRY is set to in_memory, it will enable the in-memory-based retry strategy. When the support for persistent storage-based retry is added in future, it could be set to disk.

This is a temporary setting to offer the experimental feature and will be removed in the future when the features are marked as stable. At present, OTLP exporters does not retry requests during transient failures by default.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.56%. Comparing base (6250307) to head (02b9fc1).
Report is 156 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5495      +/-   ##
==========================================
+ Coverage   83.38%   85.56%   +2.17%     
==========================================
  Files         297      289       -8     
  Lines       12531    12591      +60     
==========================================
+ Hits        10449    10773     +324     
+ Misses       2082     1818     -264     
Flag Coverage Δ
unittests ?
unittests-Solution-Experimental 85.49% <100.00%> (?)
unittests-Solution-Stable 85.53% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...etryProtocol/Implementation/ExperimentalOptions.cs 100.00% <100.00%> (ø)

... and 65 files with indirect coverage changes

@vishweshbankwar vishweshbankwar marked this pull request as ready for review April 1, 2024 18:52
@vishweshbankwar vishweshbankwar requested a review from a team April 1, 2024 18:52
@CodeBlanch CodeBlanch added the pkg:OpenTelemetry.Exporter.OpenTelemetryProtocol Issues related to OpenTelemetry.Exporter.OpenTelemetryProtocol NuGet package label Apr 1, 2024
Copy link
Member

@CodeBlanch CodeBlanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@reyang
Copy link
Member

reyang commented Apr 1, 2024

When progressing to stable:

  • OTEL_DOTNET_EXPERIMENTAL_OTLP_RETRY will be removed and the exporter will default to in_memory retry strategy.
  • OTEL_DOTNET_OTLP_RETRY will be added with possible values as in_memory, disk, none (in_memory set as default). Allowing users to switch the strategy as needed.
  • If needed the setting can be further drilled down at signal level as OTEL_DOTNET_OTLP_LOGS_RETRY, OTEL_DOTNET_OTLP_METRICS_RETRY and OTEL_DOTNET_OTLP_TRACES_RETRY

I have couple questions:

  1. Why would environment variables be used instead of strongly typed configuration? Or do we anticipate both to coexist?
  2. What's the plan for configuring the next level of details - for example, when "in_memory" is used, how to control the memory upper bound, when "disk" is used, how to control the retention policy and maximum disk usage?

@vishweshbankwar
Copy link
Member Author

vishweshbankwar commented Apr 1, 2024

When progressing to stable:

  • OTEL_DOTNET_EXPERIMENTAL_OTLP_RETRY will be removed and the exporter will default to in_memory retry strategy.
  • OTEL_DOTNET_OTLP_RETRY will be added with possible values as in_memory, disk, none (in_memory set as default). Allowing users to switch the strategy as needed.
  • If needed the setting can be further drilled down at signal level as OTEL_DOTNET_OTLP_LOGS_RETRY, OTEL_DOTNET_OTLP_METRICS_RETRY and OTEL_DOTNET_OTLP_TRACES_RETRY

I have couple questions:

  1. Why would environment variables be used instead of strongly typed configuration? Or do we anticipate both to coexist?
  2. What's the plan for configuring the next level of details - for example, when "in_memory" is used, how to control the memory upper bound, when "disk" is used, how to control the retention policy and maximum disk usage?
  1. Why would environment variables be used instead of strongly typed configuration? Or do we anticipate both to coexist?

    I anticipate the strongly typed configuration and environment variables to co-exist with more control over configuration given to users via strongly typed configuration options. This is inline with how we offer other configurations as well.

  2. What's the plan for configuring the next level of details - for example, when "in_memory" is used, how to control the memory upper bound, when "disk" is used, how to control the retention policy and maximum disk usage?

    I'd consider these options to be provided via strongly typed configuration options.

I edited the description to remove progression to stability part as it is not detailed view of end goal. I will lay out a detailed plan separately.

@reyang
Copy link
Member

reyang commented Apr 1, 2024

  1. Why would environment variables be used instead of strongly typed configuration? Or do we anticipate both to coexist?
    I anticipate the strongly typed configuration and environment variables to co-exist with more control over configuration given to users via strongly typed configuration options. This is inline with how we offer other configurations as well.
  2. What's the plan for configuring the next level of details - for example, when "in_memory" is used, how to control the memory upper bound, when "disk" is used, how to control the retention policy and maximum disk usage?
    I'd consider these options to be provided via strongly typed configuration options.

I'm concerned about the direction here. It sounds like we'll end up with two configuration mechanisms that provide different sets of functionalities.

@vishweshbankwar
Copy link
Member Author

  1. Why would environment variables be used instead of strongly typed configuration? Or do we anticipate both to coexist?
    I anticipate the strongly typed configuration and environment variables to co-exist with more control over configuration given to users via strongly typed configuration options. This is inline with how we offer other configurations as well.
  2. What's the plan for configuring the next level of details - for example, when "in_memory" is used, how to control the memory upper bound, when "disk" is used, how to control the retention policy and maximum disk usage?
    I'd consider these options to be provided via strongly typed configuration options.

I'm concerned about the direction here. It sounds like we'll end up with two configuration mechanisms that provide different sets of functionalities.

That sounds reasonable. To avoid confusion arising from multiple configuration methods, we could initially focus on implementing strongly typed options. If the specifications around environment variables become clearer in the future, we can consider adding them as additional configuration methods.

@reyang
Copy link
Member

reyang commented Apr 1, 2024

To avoid confusion arising from multiple configuration methods, we could initially focus on implementing strongly typed options. If the specifications around environment variables become clearer in the future, we can consider adding them as additional configuration methods.

It sounds like we're not going to expose any environment variables related to retry during the initial stable release. If the answer is yes, how do we think about this this PR?

@vishweshbankwar
Copy link
Member Author

To avoid confusion arising from multiple configuration methods, we could initially focus on implementing strongly typed options. If the specifications around environment variables become clearer in the future, we can consider adding them as additional configuration methods.

It sounds like we're not going to expose any environment variables related to retry during the initial stable release. If the answer is yes, how do we think about this this PR?

I think this PR is a good change as it streamlines the experimental retry strategies settings.

  1. This PR combines two experimental features into a single setting, eliminating the need for separate configurations for in_memory and disk retry.
  2. Allows users targeting stable version of the package to onboard.

Introducing two distinct settings as I had initially planned (OTEL_DOTNET_EXPERIMENTAL_OTLP_ENABLE_INMEMORY_RETRY and OTEL_DOTNET_EXPERIMENTAL_OTLP_ENABLE_DISK_RETRY) could cause confusion regarding possible value combinations and their precedence relative to each other.

Co-authored-by: Reiley Yang <reyang@microsoft.com>
Copy link
Member

@CodeBlanch CodeBlanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CodeBlanch CodeBlanch merged commit dbec6f8 into open-telemetry:main Apr 1, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:OpenTelemetry.Exporter.OpenTelemetryProtocol Issues related to OpenTelemetry.Exporter.OpenTelemetryProtocol NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants