Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use absolute path of template to decide if we need to remove it #3604

Merged
merged 2 commits into from
Dec 8, 2022

Conversation

vaxvms
Copy link
Collaborator

@vaxvms vaxvms commented Dec 1, 2022

What this PR does

When alertmanager.data_dir is set to a relative path (as in default configuration), templates are deleted and recreated on a regular basis.
This PR fix the issue.

Which issue(s) this PR fixes or relates to

Fixes #3591

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@vaxvms vaxvms requested a review from a team as a code owner December 1, 2022 13:58
@vaxvms vaxvms force-pushed the relative_alertmanager_data_dir branch from ece87b4 to bf9b9ce Compare December 1, 2022 14:00
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this fix be accompanied by a test? I think that would be helpful in order to avoid a potential future regression :)

@aknuds1 aknuds1 requested a review from a team December 1, 2022 14:06
@aknuds1 aknuds1 added bug Something isn't working component/alertmanager labels Dec 1, 2022
@vaxvms
Copy link
Collaborator Author

vaxvms commented Dec 1, 2022

I'll need some guidance on how to write a test for this case.

I can trigger it in multitenant_test.go:TestMultitenantAlertmanager_loadAndSyncConfigs but only if user3Dir is a relative path. And user3Dir is an absolute path because of multitenant_test.go:mockAlertmanagerConfig using t.TempDir()

@pracucci
Copy link
Collaborator

pracucci commented Dec 1, 2022

When alertmanager.data_dir is set to a relative path (as in default configuration), templates are deleted and recreated on a regular basis.

Why are they? I think I need more guidance on what the root cause is, to better understand the fix.

@aknuds1
Copy link
Contributor

aknuds1 commented Dec 1, 2022

Why are they? I think I need more guidance on what the root cause is, to better understand the fix.

I was also hoping a test would make it clearer how this bug is happening.

@vaxvms
Copy link
Collaborator Author

vaxvms commented Dec 1, 2022

When alertmanager.data_dir is set to a relative path (as in default configuration), templates are deleted and recreated on a regular basis.

Why are they? I think I need more guidance on what the root cause is, to better understand the fix.

A list of all current file templates is built at https://github.com/grafana/mimir/blob/main/pkg/alertmanager/multitenant.go#L638

Then, at https://github.com/grafana/mimir/blob/main/pkg/alertmanager/multitenant.go#L649 they are removed from the list if they are still relevant (ie, in the received configuration).

But at theses two steps, they aren't built in the same way :

My PR built the two filenames the same way, fixing the bug.

@vaxvms vaxvms force-pushed the relative_alertmanager_data_dir branch 2 times, most recently from f97f7a6 to 0520bbb Compare December 6, 2022 10:17
@vaxvms
Copy link
Collaborator Author

vaxvms commented Dec 6, 2022

Why are they? I think I need more guidance on what the root cause is, to better understand the fix.

I was also hoping a test would make it clearer how this bug is happening.

I've added a testcase in the PR

Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test looks good to me! I'll wait for some other maintainers to take look before merging.

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good as I can understand why you would need to make this fix (to mirror line 643) in light of your test.

Please review my language related suggestions :)

CHANGELOG.md Outdated Show resolved Hide resolved
pkg/alertmanager/multitenant_test.go Outdated Show resolved Hide resolved
@vaxvms vaxvms force-pushed the relative_alertmanager_data_dir branch from 0520bbb to 076b234 Compare December 7, 2022 19:49
@vaxvms
Copy link
Collaborator Author

vaxvms commented Dec 7, 2022

Thanks for your suggestions :) Here they are.

@aknuds1 aknuds1 self-requested a review December 8, 2022 13:39
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice fix!

@aknuds1 aknuds1 merged commit 6298ca6 into grafana:main Dec 8, 2022
@vaxvms vaxvms deleted the relative_alertmanager_data_dir branch December 8, 2022 14:54
masonmei pushed a commit to udmire/mimir that referenced this pull request Dec 16, 2022
…ana#3604)

* Test if templates are persisted across alertmanager configuration syncs
* Use absolute path of template to decide if we need to remove it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/alertmanager
Projects
None yet
Development

Successfully merging this pull request may close these issues.

alertmanager templating fail half of the time
4 participants