Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib/mixin: Generate rules, fix tests #13671

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

mrueg
Copy link
Contributor

@mrueg mrueg commented Feb 6, 2022

Primary intent of this PR is to have a rendered rules file that we can include in kube-prometheus-stack.
Additionally I added the following pieces:

  • Add Makefile
  • Make tests runnable
  • Add generated rule manifest file

After changing the time series (series only seems to support plain time series) in the test.yaml so the tests run, those two still fail:

promtool test rules test.yaml
Unit Testing:  test.yaml
  FAILED:
    alertname:etcdHighNumberOfLeaderChanges, time:10m, 
        exp:"[Labels:{alertname=\"etcdHighNumberOfLeaderChanges\", job=\"etcd\", severity=\"warning\"} Annotations:{description=\"etcd cluster \\\"etcd\\\": 4 leader changes within the last 15 minutes. Frequent elections may be a sign of insufficient resources, high network latency, or disruptions by other components and should be investigated.\", summary=\"etcd cluster has high number of leader changes.\"}]", 
        got:"[]"
    alertname:etcdExcessiveDatabaseGrowth, time:10m, 
        exp:"[Labels:{alertname=\"etcdExcessiveDatabaseGrowth\", job=\"etcd\", severity=\"warning\"} Annotations:{message=\"etcd cluster \\\"etcd\\\": Observed surge in etcd writes leading to 50% increase in database size over the past four hours, please check as it might be disruptive.\"}]", 
        got:"[]"

make: *** [Makefile:15: test] Error 1

Let me know if anyone has time to provide a fix (I tried to fix them, but for some reason my ideas didn't work).

@ptabor

@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2022

Codecov Report

Merging #13671 (f668026) into main (986a2b5) will decrease coverage by 0.22%.
The diff coverage is n/a.

❗ Current head f668026 differs from pull request most recent head 70f8524. Consider uploading reports for the commit 70f8524 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #13671      +/-   ##
==========================================
- Coverage   72.91%   72.68%   -0.23%     
==========================================
  Files         465      465              
  Lines       37947    37858      -89     
==========================================
- Hits        27668    27518     -150     
- Misses       8514     8562      +48     
- Partials     1765     1778      +13     
Flag Coverage Δ
all 72.68% <ø> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/proxy/grpcproxy/register.go 69.76% <0.00%> (-9.31%) ⬇️
client/v3/namespace/watch.go 87.87% <0.00%> (-6.07%) ⬇️
raft/rafttest/node.go 95.00% <0.00%> (-5.00%) ⬇️
api/etcdserverpb/raft_internal_stringer.go 76.78% <0.00%> (-4.94%) ⬇️
client/v3/leasing/cache.go 87.77% <0.00%> (-3.89%) ⬇️
server/etcdserver/api/rafthttp/msgappv2_codec.go 69.56% <0.00%> (-3.48%) ⬇️
server/etcdserver/api/v3rpc/member.go 93.54% <0.00%> (-3.23%) ⬇️
server/lease/leasehttp/http.go 62.77% <0.00%> (-2.92%) ⬇️
server/etcdserver/api/v3election/election.go 66.66% <0.00%> (-2.78%) ⬇️
client/v3/experimental/recipes/key.go 75.34% <0.00%> (-2.74%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 986a2b5...70f8524. Read the comment docs.

@@ -0,0 +1,147 @@
groups:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add generated files to repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should add a test that ensures that this file is up to date, so that PRs that change rules always need to regenerate this file.

I would like a second opinion of fact that we are adding this file. @tomwilkie @lilic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @serathius here, you should rarely want to use a generic generated yaml and often override configs like namespaces and other fields. and @mrueg it might be better to sync on the non-generated files rather than yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree with not having rendered output in a repository (as it might run out of sync), this approach follows what https://github.com/prometheus-operator/kube-prometheus/tree/main/manifests and https://github.com/monitoring-mixins/website have been doing and had been practice in the etcd-3.4 and previous releases https://raw.githubusercontent.com/etcd-io/website/master/content/en/docs/v3.4/op-guide/etcd3_alert.rules.yml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brancz @paulfantom might be more up to date on this, but from what I remember there was talk about removing the generated manifests to discourage folks from applying those directly instead consuming the jsonnet.

Copy link
Contributor

@paulfantom paulfantom Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we still have that issue in kube-prometheus and we are looking for alternatives to remove generated files from the repository. However, this is not a top priority right now.
As for mixins, https://monitoring.mixins.dev/ is regenerating etcd mixin daily and allows for consumption in YAML format (YAML files available in here)

Copy link
Contributor

@paulfantom paulfantom Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for using it in https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack, why not add jsonnet execution there and customize mixin to the needs of the helm chart? This way you can even add more mixins and consume them directly.

Copy link
Contributor Author

@mrueg mrueg Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing additional context @paulfantom! I wasn't aware that you're looking into phasing out generated files. In general, I think from a documentation perspective a rendered file is more readable than the jsonnet. If the etcd maintainers would be okay, we could add it back to the website repo?

For https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack this currently relies on someone's own setup when running the script and providing the PR, thus I wanted to avoid to require additional tooling there. Eventually, it might be good to look into adding more mixins there if we need more and then have common tooling for it.

contrib/mixin/Makefile Outdated Show resolved Hide resolved
contrib/mixin/Makefile Outdated Show resolved Hide resolved
contrib/mixin/Makefile Outdated Show resolved Hide resolved
@serathius
Copy link
Member

Those tests were broken as they were not run automatically. Please integrate this tests in GitHub CI tests or there is no sense in fixing them.

@mrueg mrueg force-pushed the mixin-generate-manifests branch 5 times, most recently from 70f8524 to 6b78793 Compare February 7, 2022 22:53
@mrueg
Copy link
Contributor Author

mrueg commented Feb 7, 2022

Tests are included now in github workflow. Still failing on the two tests.

@serathius
Copy link
Member

Tests are included now in github workflow. Still failing on the two tests.

I would disable those two tests for now so we can at least have other tests running and preventing breakage. If you don't want to fix them in this PR, then please comment them out and file an issue for it so someone else can pick it up.

@mrueg mrueg force-pushed the mixin-generate-manifests branch 2 times, most recently from 082044b to a514863 Compare February 8, 2022 12:33
@mrueg
Copy link
Contributor Author

mrueg commented Feb 8, 2022

I fixed one test and will take a look at the other one later. Thanks for the quick feedback!

@serathius
Copy link
Member

Hey @mrueg, could you split the test fix into separate PR so we can merge it?

@tomwilkie
Copy link
Contributor

Hi! Sorry for the late reply. It may be easier to use mixtool generate to compile the mixin, it will output yaml etc: https://github.com/monitoring-mixins/mixtool

* Add Makefile
* Make tests runnable
* Add generated rule manifest file

Signed-off-by: Manuel Rüger <manuel@rueg.eu>
@@ -14,6 +14,7 @@
*.test
hack/tls-setup/certs
.idea
/contrib/mixin/manifests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the folder to .gitignore here so it doesn't get added to the repo if someone generates it.

@@ -0,0 +1,147 @@
groups:
Copy link
Contributor Author

@mrueg mrueg Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing additional context @paulfantom! I wasn't aware that you're looking into phasing out generated files. In general, I think from a documentation perspective a rendered file is more readable than the jsonnet. If the etcd maintainers would be okay, we could add it back to the website repo?

For https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/hack this currently relies on someone's own setup when running the script and providing the PR, thus I wanted to avoid to require additional tooling there. Eventually, it might be good to look into adding more mixins there if we need more and then have common tooling for it.

@mrueg
Copy link
Contributor Author

mrueg commented Feb 10, 2022

@serathius I have removed the rendered file
@tomwilkie thanks, I'll take a look! Could it be used in a generic github action that renders and publishes artifacts?

@tomwilkie
Copy link
Contributor

@tomwilkie thanks, I'll take a look! Could it be used in a generic github action that renders and publishes artifacts?

I think that'd be a great idea - we're using it in a few places already (eg https://github.com/grafana/jsonnet-libs/blob/master/Makefile#L24). Its also got a linter that applied various lint rules to both the dashboards and the alerts that you might find useful.

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing tests. Great Job!

@serathius
Copy link
Member

cc @spzala @ptabor

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
Thanks @mrueg

@serathius serathius merged commit e814f6f into etcd-io:main Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants