Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a strict mode for feature gates #7804

Open
atoulme opened this issue Jun 1, 2023 · 7 comments
Open

Create a strict mode for feature gates #7804

atoulme opened this issue Jun 1, 2023 · 7 comments

Comments

@atoulme
Copy link
Contributor

atoulme commented Jun 1, 2023

Is your feature request related to a problem? Please describe.
The current behavior of the collector allows folks to assume sane defaults and embraces breaking changes without warning on version upgrades. The collector relies on feature gates that work as feature toggles that enable a breaking change explicitly.
Initially, features gates are considered alpha. After a period of time, feature gates are then turned on by default and declared beta. Finally, feature gates are declared stable and removed altogether.

The system doesn't allow proactive customers to prepare and execute well with feature gates, instead forcing customers to read through changelogs and make sure they understand the implication of a feature gate. It requires to pay attention to new feature changes, change of status of feature gates, and granular updates by keeping up with the collector changes constantly.

Describe the solution you'd like
I would like to introduce a strict mode that forces users of the collector to explicitly opt into feature gates without any ambiguity.

  • If the feature gate is freshly introduced and considered alpha, the strict mode must make the collector fail to start if the configuration doesn't explicitly list a decision about the feature gate, whether it is enabled or disabled. This forces the customer to adopt the feature gate explicitly.
  • If the feature gate is considered beta and therefore enabled by default, the strict mode must fail if the gate is not adopted by the customer, as in the flag is not explicitly set to activate the feature gate. The customer can decide to override this for a specific feature gate, but this must result in loud warnings. This is to ensure that the customer cannot be surprised on adoption of the feature gate by default.
  • If the feature gate is stable and removed, the strict mode must fail if the feature gate flag is used. This is to ensure that customers understand that the feature gate can no longer be changed.

The strict mode can be used in testing environments of customers to test and validate configurations and help smooth over migrations between versions of the collector.

@TylerHelmuth
Copy link
Member

@atoulme are you imaging this mode would be a separate flag to pass into the collector, not a new stability level?

When the collector fails, what information should be provided to the user to ensure they understand that the failure is because of strict mode and a feature gate?

Could this problem be solved if the --help flag share more information about the available feature gates, such as the description, FromVersion, and reference URL? Would a more explicit list of feature gates, other than the changelog, help (k8s has this)?

@atoulme
Copy link
Contributor Author

atoulme commented Jun 1, 2023

It would be a separate flag. It can also be set in the collector, I hope, for downstream distributions.

The failure message should state the current feature gates that are not being addressed properly by the end user, the resolution steps (set up the feature gate value, change the value, remove the feature gate).

The strict mode would also implicitly remove any warnings related to feature gates at startup time.

@kentquirk
Copy link
Member

The use case you describe is not wrong, but it's also not the only use case.

I am concerned about collector operators who would prefer not to be bothered with the alpha-level flags that never rise to beta. I'm pretty sure there are a lot of operators who don't want to know about features until they are reasonably final -- so would prefer a strictness flag that allows them to say "don't make me care about it until it's solid, and then give me some time to convert."

In that sense, I think there may be a variant of this flag that is:

  • Nothing happens in alpha
  • In beta, you must accept or reject -- but are strongly warned if you reject

@atoulme
Copy link
Contributor Author

atoulme commented Jun 7, 2023

@kentquirk I don't see this as being different with the current behavior, or not different enough to warrant a variant.

@mx-psi
Copy link
Member

mx-psi commented Jun 28, 2023

The system doesn't allow proactive customers to prepare and execute well with feature gates, instead forcing customers to read through changelogs and make sure they understand the implication of a feature gate. It requires to pay attention to new feature changes, change of status of feature gates, and granular updates by keeping up with the collector changes constantly.

Some work from users seems unavoidable with this; if feature gates introduce changes that affect users they necessarily need to be aware of what is changing from version to version. I don't see how the strict mode would help alleviate that need: users still need to be aware of changes.

I think if the main concern is lack of awareness/documentation we should focus on improving our documentation and UX by

  1. providing some way to get a human-readable list of the available feature flags from the CLI as @TylerHelmuth mentions on Create a strict mode for feature gates #7804 (comment)
  2. improving our documentation and component metadata to list all available feature gates on each component's documentation (see [cmd/mdatagen] Add support for declaring feature gates in metadata.yaml file opentelemetry-collector-contrib#21801)
  3. having a way to query (maybe via the zpages extension?) whether each feature gate is enabled/disabled at runtime.

@atoulme
Copy link
Contributor Author

atoulme commented Jun 30, 2023

The use case I have in mind is relatively narrow: I want to run an integration test with all the feature gates we use, and make sure it fails on upgrade whenever they change. This way, instead of relying on release notes, we can use the CI as a gating factor to embrace upgrades, and have to perform explicit remediation to fix the CI, which we can use to document our findings, enable/disable feature gates for ourselves, and stay on top of changes.

@mx-psi
Copy link
Member

mx-psi commented Jun 30, 2023

Alright, thanks for explaining :) I think that use case makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants