Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix featuregate late initialization #8478

Merged

Conversation

frzifus
Copy link
Member

@frzifus frzifus commented Sep 16, 2023

Description:
The factories for the components are created before the CLI flags are evaluated. Therefore, featuregate flags are ignored during the early startup phase.
This PR simply shifts the time of creation of the factory map after the CLI flag evaluation.

Link to tracking Issue:
Closes #4967

Testing:
Short manual testing via cli. Adding the following line in cobra.Command.RunE.

featuregate.GlobalRegistry().VisitAll(func(fg *featuregate.Gate) { fmt.Println(fg.ID(), fg.IsEnabled()) })

Changing + & -:

./bin/otelcorecol_linux_amd64 --config=config.yaml --feature-gates=+telemetry.disableHighCardinalityMetrics

@frzifus frzifus force-pushed the fix_featuregate_late_initialization branch from ebdc19f to a2ece5b Compare September 16, 2023 03:57
@codecov
Copy link

codecov bot commented Sep 16, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Files Coverage Δ
cmd/otelcorecol/main.go 0.00% <0.00%> (ø)
otelcol/command_components.go 92.10% <89.28%> (-3.73%) ⬇️
otelcol/collector.go 77.50% <60.00%> (-2.77%) ⬇️

📢 Thoughts on this report? Let us know!.

@frzifus frzifus force-pushed the fix_featuregate_late_initialization branch from a2ece5b to 16e1027 Compare September 16, 2023 04:04
@frzifus frzifus marked this pull request as ready for review September 16, 2023 08:45
@frzifus frzifus requested review from a team and jpkrohling September 16, 2023 08:45
otelcol/command.go Outdated Show resolved Hide resolved
@@ -11,8 +11,7 @@ import (
)

func main() {
factories, err := components()
if err != nil {
if _, err := components(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With components() called twice, it's possible the second call will swap out a component the second time around if a feature gate is set and end up error'ing differently at line 25 right? Would it be possible to move the initialization of the feature gates up instead of calling this twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With components() called twice, it's possible the second call will swap out a component the second time around if a feature gate is set and end up error'ing differently at line 25 right?

I did not consider that. But yes, that might be the case. Making the first call somehow useless.

Would it be possible to move the initialization of the feature gates up instead of calling this twice?

Defiantly, that is what I tried first. Any recommendations how to do that? What I had in mind was creating another root command and parse only the featuregate flags. This only worked, when I ignored the config flag using spf13/cobra#662.

But since I moved the featureflag evaluation out, the original root command started to fail.

@frzifus frzifus force-pushed the fix_featuregate_late_initialization branch from 16e1027 to 5436454 Compare September 20, 2023 07:22
@@ -12,6 +12,14 @@ import (
"go.opentelemetry.io/collector/featuregate"
)

// NewCommandFeatureGate constructs a new cobra.Command used to parse given FeatureGates.
func NewCommandFeatureGate() *cobra.Command {
flagSet := flags(featuregate.GlobalRegistry())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here we could add something like flagFeatureGate() instead and add FParseErrWhitelist: cobra.FParseErrWhitelist{UnknownFlags: false} to the root command and add a No-op FeatureGate flag to the actual root command? Or simply parse them twice. Wdyt?

@frzifus
Copy link
Member Author

frzifus commented Sep 26, 2023

Ping @codeboten @jpkrohling

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seem to me as a big hack, I would prefer the version that I think @dashpole proposed, to provide a func that returns the factories to the settings instead.

But still not sure what factories need this to justify this change. I understand that it is a problem/limitation but I don't understand why is important.

@dashpole
Copy link
Contributor

But still not sure what factories need this to justify this change. I understand that it is a problem/limitation but I don't understand why is important.

My memory is fuzzy, but from what I recall, I wanted to use a feature gate for a refactor that entirely replaced the factory implementation. The natural place to do that was in NewFactory(), but it took a long time to realize that the feature gate wasn't being respected. The eventual workaround: open-telemetry/opentelemetry-collector-contrib#9116. I would be satisfied if there was an easy way to detect that case.

It is also definitely possible it isn't worth the effort if no one else has encountered this case. We could just document this shortcoming, and close the issue if that is our conclusion.

@frzifus
Copy link
Member Author

frzifus commented Sep 28, 2023

This seem to me as a big hack, I would prefer the version that I think @dashpole proposed, to provide a func that returns the factories to the settings instead.

@bogdandrutu I had chosen this way, because the first approach to pass a constructor func through changes the signature of the otelcon.NewCommand. This means that new collector versions can only be built with the upcoming builder version. (Was mentioned by @codeboten #8478 (comment))

But still not sure what factories need this to justify this change. I understand that it is a problem/limitation but I don't understand why is important.

From my point of view, it is somewhat misleading that the feature gates in the factory do not work. I have run into the same problems as @dashpole. We also wanted to use a feature gate for an entirely replaced factory implementation. (Jaeger Remote Sampling and Jaeger Receiver)

--

I think some kind of tradeoff has to be made. Leaving it as it is and documenting it is already one. It seems like other alternatives are a lot of refactor work, breaking downward compatibility or double parsing the feature gate flags.

@frzifus frzifus force-pushed the fix_featuregate_late_initialization branch 4 times, most recently from 2279213 to 81a0b34 Compare October 5, 2023 07:55
@frzifus frzifus requested a review from bogdandrutu October 5, 2023 13:11
@frzifus
Copy link
Member Author

frzifus commented Oct 5, 2023

@bogdandrutu based on yesterday's conversation, does it match the concept?

I checked the initialization of featuregates and found none inside a factory.

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
frzifus and others added 6 commits November 3, 2023 11:31
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
Co-authored-by: Ben B. <bongartz@klimlive.de>
Signed-off-by: Alex Boten <aboten@lightstep.com>
Signed-off-by: Alex Boten <aboten@lightstep.com>
@codeboten codeboten force-pushed the fix_featuregate_late_initialization branch from c6d085c to 593700b Compare November 3, 2023 18:32
@codeboten codeboten merged commit ecd837e into open-telemetry:main Nov 3, 2023
30 of 31 checks passed
@github-actions github-actions bot added this to the next release milestone Nov 3, 2023
@frzifus frzifus deleted the fix_featuregate_late_initialization branch November 4, 2023 07:57
andrzej-stencel added a commit to SumoLogic/sumologic-otel-collector that referenced this pull request Nov 16, 2023
andrzej-stencel added a commit to SumoLogic/sumologic-otel-collector that referenced this pull request Nov 17, 2023
andrzej-stencel added a commit to SumoLogic/sumologic-otel-collector that referenced this pull request Nov 17, 2023
andrzej-stencel added a commit to SumoLogic/sumologic-otel-collector that referenced this pull request Nov 17, 2023
* feat(processor/remoteobserver)!: rename `remoteobserver` processor to `remotetap`

* chore: update otel core to `v0.89.0`

* fix(otelcolbuilder): adjust after API breaking change in otelcol.CollectorSettings

    open-telemetry/opentelemetry-collector#8478
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Featuregates don't work until late in initialization
6 participants