Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reloader: don't fail on envvar expansion errors #7429

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

rexagod
Copy link
Contributor

@rexagod rexagod commented Jun 12, 2024

Don't fail the reloader on environment variable expansion errors.

Refer: prometheus-operator/prometheus-operator#6136

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Reloader: do not fail on environment variable expansion errors.

Verification

  • Modified the tests to cover the changes.

@rexagod rexagod force-pushed the po-6136 branch 2 times, most recently from d8b54d9 to 8a16f7b Compare June 12, 2024 19:11
return err
}
r.configApplyErrors.Inc()
level.Error(r.logger).Log("msg", "expand environment variables", "err", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So by logging out, sidecar doesn't fail, but there would be errors on Prometheus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, due to unset variables? Referencing the parent issue, there's a need to allow the reloader to not crash on such instances, as is the case for all >1 file modifications (and reloads), and thus align environment expansion errors to that same behavior.

saswatamcode
saswatamcode previously approved these changes Jun 13, 2024
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe @MichaHoffmann also wants to take a look?

@simonpasquier
Copy link
Contributor

Maybe we can make the behavior configurable? FWIW most of the prometheus config reloader functionality comes from the pkg/reloader but as long as we have an option to create a Reloader instance that doesn't fail on env subst, it is also fine to keep the current behavior as the default.

@@ -701,7 +708,7 @@ func expandEnv(b []byte) (r []byte, err error) {

v, ok := os.LookupEnv(string(n))
if !ok {
err = errors.Errorf("found reference to unset environment variable %q", n)
err = errors.Wrapf(expandEnvError, "%s", n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the first failed expansion will stop the processing of the config. I'd rather see this as best-effort (e.g. do all possible replacements and record any error).

@rexagod
Copy link
Contributor Author

rexagod commented Jun 13, 2024

CI failures don't seem to be stemming from the patch.

@rexagod rexagod requested a review from simonpasquier June 13, 2024 20:25
CHANGELOG.md Outdated
@@ -25,6 +25,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re

### Changed

- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `suppressEnvironmentVariablesExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't crash when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `suppressEnvironmentVariablesExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't crash when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual.
- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `suppressEnvironmentVariablesExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't consider the operation to fail when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual.

cfgFile string
cfgOutputFile string
cfgDirs []CfgDirOption
suppressEnvironmentVariablesExpansionErrors bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggestion)

Suggested change
suppressEnvironmentVariablesExpansionErrors bool
discardEnvVarExpansionErrors bool

cfgFile string
cfgOutputFile string
cfgDirs []CfgDirOption
suppressEnvironmentVariablesExpansionErrors bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be exposed in Options too?

@@ -692,16 +693,24 @@ func RuntimeInfoURLFromBase(u *url.URL) *url.URL {

var envRe = regexp.MustCompile(`\$\(([a-zA-Z_0-9]+)\)`)

func expandEnv(b []byte) (r []byte, err error) {
func expandEnv(logger log.Logger, b []byte, suppressEnvironmentVariablesExpansionErrors bool, configApplyErrorsPtr *prometheus.Counter) (r []byte, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than passing the additional parameters, why not make it a *Reloader function?

n = n[2 : len(n)-1]

v, ok := os.LookupEnv(string(n))
if !ok {
err = errors.Errorf("found reference to unset environment variable %q", n)
(*configApplyErrorsPtr).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see a new metric to track this: either an expansion errors counter or a gauge (the latter probably makes more sense as a counter needs constant increases to trigger an alert).

(*configApplyErrorsPtr).Inc()
errStr := errors.Errorf("found reference to unset environment variable %q", n)
if suppressEnvironmentVariablesExpansionErrors {
level.Debug(logger).Log("msg", "expand environment variable", "err", errStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect a Warn

@pull-request-size pull-request-size bot added size/L and removed size/M labels Jun 14, 2024
@rexagod rexagod force-pushed the po-6136 branch 3 times, most recently from 80de75d to 2d9509d Compare June 14, 2024 08:54
@rexagod
Copy link
Contributor Author

rexagod commented Jun 14, 2024

Test failure seems to indicate that the filesystem changes dropped significantly since the last commit, I don't see any changes in the reloader that may visibly warrant that though. I'll take another look.

Also, the documentation failure seems unrelated (I believe some URLs were moved)?

@simonpasquier
Copy link
Contributor

IIUC the failure on the docs job workflow is a glitch with https://outshift.cisco.com/blog/multi-cluster-monitoring

@rexagod
Copy link
Contributor Author

rexagod commented Jun 14, 2024

It seems it wasn't moved, but failed to render due to a recent change.

Couldn't find target container #hubspotForm-7f3330bd-39c5-4358-b56d-44b90ae62e8e for HubSpot Form 7f3330bd-39c5-4358-b56d-44b90ae62e8e. Not rendering form onto the page

My guess is they tried to integrate HubSpot which didn't go as planned, and crashed the DOM.

@rexagod rexagod force-pushed the po-6136 branch 2 times, most recently from a28d14e to 5bcbc33 Compare June 14, 2024 18:56
@rexagod rexagod requested a review from simonpasquier June 18, 2024 07:36
pkg/reloader/reloader.go Outdated Show resolved Hide resolved
n = n[2 : len(n)-1]

v, ok := os.LookupEnv(string(n))
if !ok {
err = errors.Errorf("found reference to unset environment variable %q", n)
r.configEnvVarExpansionErrors.Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd need to save the number of errors in a local variable and set the gauge before exiting expandEnv()

Copy link
Contributor Author

@rexagod rexagod Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right! It makes sense to do this on a per-config basis, so as to not take into account increments from a previous config expansion. I'll make the change.

level.Warn(r.logger).Log("msg", "expand environment variable", "err", errStr)
return m
}
level.Error(r.logger).Log("msg", "expand environment variable", "err", errStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm the error should be logged by the caller in this case?

@rexagod rexagod force-pushed the po-6136 branch 2 times, most recently from f3a17f7 to edf9a3f Compare June 18, 2024 11:39
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test similar changes in the past, I opened a PR against prometheus-operator/prometheus-operator replacing the upstream version by my fork. This way I made sure that it worked as expected (e.g. no need for back-and-forth changes between the 2 repos).

pkg/reloader/reloader.go Outdated Show resolved Hide resolved
@@ -348,8 +360,8 @@ func (r *Reloader) Watch(ctx context.Context) error {
}
}

func normalize(logger log.Logger, inputFile, outputFile string) error {
b, err := os.ReadFile(inputFile)
func (r *Reloader) normalize(input, output string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I preferred inputFile and outputFile as it makes it obvious that we talk about files and not data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, an overlook on my part. Reverted.

@rexagod
Copy link
Contributor Author

rexagod commented Jun 18, 2024

@simonpasquier Makes sense, I remember doing a similar thing in node-exporter a while back (I pointed it to a local directory for convenience (faster changes since the other PR was also open and I wanted to test and develop simultaneously), but it's better to use the fork's URL in this case, as we've reached a point where this PR looks good for now (and pointing to a local directory will fail the CI).

I'll raise a PR downstream and hold off on merging this before that's been verified.

rexagod added a commit to rexagod/prometheus-operator that referenced this pull request Jun 23, 2024
Refer: thanos-io/thanos#7429
Fixes: prometheus-operator#6136
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
docs/storage.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated
@@ -25,6 +25,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re

### Changed

- [#7429](https://github.com/thanos-io/thanos/pull/7429): Reloader: introduce `suppressEnvironmentVariablesExpansionErrors` to allow suppressing errors when expanding environment variables in the configuration file. When set, this will ensure that the reloader won't consider the operation to fail when an unset environment variable is encountered. Note that all unset environment variables are left as is, whereas all set environment variables are expanded as usual.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name doesn't match the code. Also I'm not sure if the change should be listed in the changelog since it's not user-visible.
cc @saswatamcode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Also moved this to the Added section since we are adding to a, IIUC, user-visible API (reloader.Options)?

pkg/reloader/reloader.go Outdated Show resolved Hide resolved
@rexagod rexagod force-pushed the po-6136 branch 4 times, most recently from 509d346 to e2d158d Compare June 25, 2024 14:16
# 500 when requested my mdox in GH actions.
- regex: 'outshift\.cisco\.com'
- regex: 'outshift\.cisco\.com\/blog\/multi-cluster-monitoring'
Copy link
Contributor Author

@rexagod rexagod Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain works just fine, only /blog/multi-cluster-monitoring has a problem.

Allow suppressing environment variables expansion errors when unset, and
thus keep the reloader from crashing. Instead leave them as is.

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rexagod rexagod requested a review from saswatamcode July 2, 2024 08:08
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM!

@saswatamcode saswatamcode merged commit fcc88c0 into thanos-io:main Jul 2, 2024
20 checks passed
rexagod added a commit to rexagod/prometheus-operator that referenced this pull request Jul 2, 2024
Refer: thanos-io/thanos#7429
Fixes: prometheus-operator#6136
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/prometheus-operator that referenced this pull request Jul 2, 2024
Refer: thanos-io/thanos#7429
Fixes: prometheus-operator#6136
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/prometheus-operator that referenced this pull request Jul 2, 2024
Refer: thanos-io/thanos#7429
Fixes: prometheus-operator#6136
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
dongjiang1989 pushed a commit to kubeservice-stack/prometheus-operator that referenced this pull request Jul 4, 2024
Refer: thanos-io/thanos#7429
Fixes: prometheus-operator#6136
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
tizki pushed a commit to tizki/thanos that referenced this pull request Jul 4, 2024
Allow suppressing environment variables expansion errors when unset, and
thus keep the reloader from crashing. Instead leave them as is.

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
Signed-off-by: Tidhar Klein Orbach <tidhar.o@taboola.com>
jnyi pushed a commit to jnyi/thanos that referenced this pull request Oct 17, 2024
Allow suppressing environment variables expansion errors when unset, and
thus keep the reloader from crashing. Instead leave them as is.

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants