-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/prometheus rules missing #1073
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1073 +/- ##
==========================================
+ Coverage 56.65% 56.69% +0.03%
==========================================
Files 283 283
Lines 20045 20044 -1
==========================================
+ Hits 11357 11363 +6
+ Misses 6935 6927 -8
- Partials 1753 1754 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I'm a little confused, it seems what you changed has nothing to do with Could you explain more about how it works? |
Sorry for the confusing, It's both fixed for First of all, let's talk about the usage of But in the current implementation, if those fields were set, then It will only transfer the local files into the remote, without copy the provided ones too. tiup/pkg/cluster/spec/prometheus.go Lines 315 to 319 in 4ece39a
|
32e801e
to
a772f6a
Compare
This is by designed, because the If we implement the dir as additional, the user can only add files and once the file added, he can never remove it with TiUP, it's not self-consistent for TiUP, I think. |
If the Grafana or Prometheus was deployed independently(without tidb or dm), then the default dashboards or rules seems redundant. But I 🤔 if someone uses TiUP, he'd prefer to deploy Prometheus with the full tidb or dm stack. If the user want to delete the dashboards or rules, maybe he can replace it with an empty file(not tested) |
Let me see if there's a better solution or tradeoff. |
@lucklove Let's put aside the Grafana dashboards, and please look at the Prometheus issue. tiup/templates/config/prometheus.yml.tpl Lines 12 to 16 in d4881ba
rule_dir but without those rules, then the prometheus.yml is actually problematic and the rules specified by the user are not working.
For this bug, I'll try to introduce a new member rule_files:
{{- if .MonitoredServers}}
- 'node.rules.yml'
- 'blacker.rules.yml'
- 'bypass.rules.yml'
{{- end}}
{{- range .CustomRules}}
- '{{.}}'
{{- end}} |
In this way, there may be two
|
Those rules will be ignored if duplicated:
The user given ones has higher priority |
If duplicated rules will not trigger any error, I think it's OK |
Sorry that there is something conflict after refactoring the context: #1069 |
77e5dd0
to
3e3f7c3
Compare
/retest |
Seems the bot can't respond to the |
/test all |
@lucklove PTAL, the current implementation will append the Grafana Dashboards or Prometheus ruls if |
It seems we didn't solve the deletion problem yet: the user can only append rule/dashboard (we can use an empty file, but it will print error logs and it's not graceful) |
IMO, it's not a good idea to change current behavior since it's documented and released long ago: if some user already deployed a cluster with local directory and delete some rule/dashboard from that directory, after he upgrade tiup-cluster, the removed rule/dashboard will be back again. And the removed rule will fire some alert that he didn't care at all. This is unexpected. |
I will hold on to this PR for the moment and consider it when I have a suitable opportunity next time |
/hold |
@9547: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'm just going to close this first. |
What problem does this PR solve?
I'm deploying a cluster with
rule_dir
set in topology.yaml:After deploy, found the
nodes.rules.yml
,tikv.rules.yml
and other rules under/home/tidb/deploy/prometheus/conf/
were missing.Should be merged after #1075
What is changed and how it works?
Check List
Tests
Code changes
Side effects
Related changes
Release notes: