Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/solarwindsapmsettingsextension] Added part one of the concrete implementation of solarwindsapmsettingsextension #30788

Conversation

jerrytfleung
Copy link
Contributor

Description:

  • Adding part one of the concrete implementation of solarwindsapmsettingsextension
  • Adding github.com/solarwindscloud/apm-proto dependency in go.mod
  • Changed the datatype of interval configuration from string to time.Duration
  • Moved config validation logic from config to extension so as to print warning instead of shutting down the collector

Link to tracking Issue:
27668

Testing:

  • Added testdata to load config.yaml in config_test
  • Added parallel test in extension test

Documentation:

  • Updated README about the change of the datatype of interval and the default value

@@ -13,7 +14,7 @@ import (
)

const (
DefaultInterval = "1m"
DefaultInterval = time.Duration(10000000000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it 10*time.Second?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks!
Fixed.

Interval string `mapstructure:"interval"`
}

func (cfg *Config) Validate() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is validation the removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an error in Config.Validate() will cause collector not starting up.

While we are testing with the opentelemetry-lambda collector lambda layer, the collector error from this function can cause the lambda function (user's lambda function) timeout. i.e. A bad config in this extension can break user's lambda function.

While we think a bad config in this extension (e.g. a wrong endpoint causing no such host gRPC error or a missing / wrong key to get a setting should not be a fatal to the collector process.

So we moved the logic to the extension.

When there is a bad config, the extension will log messages and will be in noop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jerrytfleung ,

With regards to:

Returning an error in Config.Validate() will cause collector not starting up.

This was a purposeful decision considering that invalid config either means assuming defaults (like a default region, default ingestion endpoint) or letting it run in a broken state (which means it can silently fail)

From my perspective, converting into a noop component on a bad configuration is an anti pattern for a lot of users, a critical piece of infrastructure.

Moreover, without this functionality, the command otelcol validate will return fine which may also contribute to user confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added back otelcol validate function

Comment on lines 56 to 58
} else {
extension.logger.Warn("solarwindsapmsettingsextension is in noop. Please check config")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. Config.Validate() is called by the builder automatically and collector doesn't start with returned error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it can break user's application when it is used in collector lambda layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it can break user's application when it is used in collector lambda layer.

The time couple of lambda to lambda extensions can be painful, I am not sure it justifies removing all protections. I would recommend that you raise that the lambda project as a feature in order to allow the lambda layer to run without error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will raise that to lambda project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MovieStoreGuy We are thinking to add back Config.Validate() and replace any invalid field of the configuration by default values. As we set default values, Config.validate() will always return nil. And the extension will not need to work in noop mode anymore.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back Config.Validate()

@dmitryax
Copy link
Member

Moved config validation logic from config to extension so as to print warning instead of shutting down the collector

This may answer my two comments. But why?

@jerrytfleung
Copy link
Contributor Author

Moved config validation logic from config to extension so as to print warning instead of shutting down the collector

This may answer my two comments. But why?

replied in the thread.

@jerrytfleung jerrytfleung requested a review from dmitryax February 6, 2024 19:45
@jerrytfleung
Copy link
Contributor Author

@atoulme @MovieStoreGuy I tried my best to address your review comments / concerns. Can you please kindly to elaborate and review this PR again? Thanks a lot.

Copy link
Contributor

@MovieStoreGuy MovieStoreGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a comment from me and the rest appear fine.


func (cfg *Config) Validate() error {
// Endpoint
matched, _ := regexp.MatchString(`apm.collector.[a-z]{2,3}-[0-9]{2}.[a-z\-]*.solarwinds.com:443`, cfg.Endpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you ever want this to be localhost for testing or other purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We want the endpoint to be one of our APM collector endpoints listed in here
i.e.
apm.collector.na-01.cloud.solarwinds.com:443
apm.collector.na-02.cloud.solarwinds.com:443
apm.collector.eu-01.cloud.solarwinds.com:443
apm.collector.apj-01.cloud.solarwinds.com:443

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a comment from me and the rest appear fine.

@MovieStoreGuy I fixed the ordering. Can you kindly review again? Thanks!

return nil
}

func (extension *solarwindsapmSettingsExtension) Shutdown(_ context.Context) error {
extension.logger.Debug("Shutting down solarwinds apm settings extension")
if extension.conn != nil {
return extension.conn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flip the ordering and that should resolve the shutdown issue :)

@jerrytfleung
Copy link
Contributor Author

@atoulme @MovieStoreGuy I addressed the PR comment and updated the branch. Can you kindly review this PR again? Thank you!

@jerrytfleung
Copy link
Contributor Author

Hey @atoulme @MovieStoreGuy, may we know your feedback? We still want to proceed with this PR. Thanks!

@MovieStoreGuy MovieStoreGuy merged commit 6c2fc71 into open-telemetry:main May 22, 2024
162 checks passed
@github-actions github-actions bot added this to the next release milestone May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd/otelcontribcol otelcontribcol command cmd/oteltestbedcol extension/solarwindsapmsettings never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants