Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/grafana: refactor for RFC42 #191768

Merged
merged 8 commits into from
Oct 23, 2022
Merged

nixos/grafana: refactor for RFC42 #191768

merged 8 commits into from
Oct 23, 2022

Conversation

KFearsoff
Copy link
Contributor

@KFearsoff KFearsoff commented Sep 18, 2022

This PR aims to refactor Grafana towards the RFC42 style, solving #191464 and #144575, while preserving backwards compatibility.

Description of changes
  • Grafana is refactored
  • Grafana datasource provisioning is refactored
  • Grafana dashboard provisioning is refactored
  • Grafana notifier provisioning is refactored Decided against it, as notifiers will be deprecated in Grafana 10
  • Grafana alerting provisioning is added
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation 8.has: module (update) This PR changes an existing module in `nixos/` labels Sep 18, 2022
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Sep 18, 2022
@KFearsoff
Copy link
Contributor Author

KFearsoff commented Sep 26, 2022

Everything major except for Grafana refactoring is done. Some minor things are (not all of which I have it in me to do):

  • Add descriptions to new alerting.* options simialar to new descriptions for dashboards options
  • The warning about storing passwords in Nix store doesn't work with new config style, fix it
  • Maybe tests can be refactored to implement more DRY
  • Probably can add a bit more NixOS options for alerting.* stuff
  • Find a nicer way to assert for alerting.* stuff, if possible. We currently declare settings as nullOr submodule and make default value a null. An empty submodule isn't actually a {}, so we can't check settings against a {}, and Nix doesn't currently have an "empty submodule" struct. This is described in detail here: NixOS: default = {} for submodule not respected #112494
  • Increase stylistic consistency (how array of attrs are written, convert everything into YAML instead of JSON, etc.)
  • Fix markdown docs

@KFearsoff
Copy link
Contributor Author

KFearsoff commented Oct 11, 2022

The Grafana settings part is done, so I consider the PR ready for initial review.

Some things still left to do:

  • [ ] Add services.grafana.path analogous to other path options to provision Grafana config with a INI file (should we make the path thingie an RFC, actually?)
  • The warning about plaintext passwords is removed for the time being while I'm figuring out how to use it with new syntax. It should also be modified to pop up at correct times (namely: when a password isn't specified with file)
  • Fix NixOS doc build

nixos/tests/grafana/default.nix Outdated Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Outdated Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Outdated Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Outdated Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Outdated Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Outdated Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Outdated Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Outdated Show resolved Hide resolved
nixos/modules/services/monitoring/grafana.nix Show resolved Hide resolved
@Ma27
Copy link
Member

Ma27 commented Oct 14, 2022

All in all it looks pretty nice, just left a few minor comments :)

Will test it later, but that's a bit more effort because I have to cherry-pick it to my 22.05-based tracking-branch and it has a bunch of conflicts because lib.mdDoc 🙃

@KFearsoff KFearsoff force-pushed the grafana-rfc42 branch 3 times, most recently from ccb9e60 to a2a9a3e Compare October 15, 2022 17:50
@KFearsoff KFearsoff requested a review from Ma27 October 15, 2022 17:55
Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just deployed my local Grafana with this. All in all, it looks good. I'd like to give others a chance to review. As soon as the remaining things are resolved, I'd merge then.

Thanks a lot!

This commit refactors `services.grafana.provision.dashboards` towards
the RFC42 style. To preserve backwards compatibility, we have to jump
through a ton of hoops, introducing esoteric type signatures and bizarre
structs. The Grafana module definition should hopefully become a lot
cleaner after a release cycle or two once the old configuration style is
completely deprecated.
@pschyska
Copy link
Contributor

@KFearsoff Great, thank you ;-)
FWIW, I think server.protocol needs to be defined because the modules does something extra for server.protocol = "socket". Maybe just make it types.str?

@KFearsoff
Copy link
Contributor Author

New PR open at #197585
cc @delroth @pschyska @Ma27

@KFearsoff Great, thank you ;-) FWIW, I think server.protocol needs to be defined because the modules does something extra for server.protocol = "socket". Maybe just make it types.str?

It actually isn't used anywhere. But I think using types.enum is more fitting here, and it doesn't cost a lot. I don't think that server.protocol will be updated in anything but major Grafana updates, and those need more care anyway.

jtojnar added a commit to jtojnar/nixfiles that referenced this pull request Oct 26, 2022
Flake lock file updates:

• Updated input 'home-manager':
    'github:nix-community/home-manager/c485669ca529e01c1505429fa9017c9a93f15559' (2022-10-20)
  → 'github:nix-community/home-manager/7dc4e4ebd71280842b4d30975439980baaac9db8' (2022-10-24)
• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/301aada7a64812853f2e2634a530ef5d34505048' (2022-10-21)
  → 'github:nixos/nixpkgs/f994293d1eb8812f032e8919e10a594567cf6ef7' (2022-10-25)

  Includes Grafana module refactoring from <NixOS/nixpkgs#191768>.
  Had to fix the following warnings:

      trace: warning: The option `services.grafana.domain' defined in `hosts/azazel/ogion.cz/monitor' has been renamed to `services.grafana.settings.server.domain'.
      trace: warning: Provisioning Grafana datasources with options has been deprecated.
      Use `services.grafana.provision.datasources.settings` or
      `services.grafana.provision.datasources.path` instead.

      trace: warning: Provisioning Grafana dashboards with options has been deprecated.
      Use `services.grafana.provision.dashboards.settings` or
      `services.grafana.provision.dashboards.path` instead.

      trace: Obsolete option `services.grafana.domain' is used. It was renamed to `services.grafana.settings.server.domain'.
      trace: Obsolete option `services.grafana.port' is used. It was renamed to `services.grafana.settings.server.http_port'.
@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Oct 26, 2022

Going to add the default value for services.grafana.settings.server.socket back.

#197987

default = [];
apply = x: if (builtins.isList x) then map _filter x else x;
type = with types; either (listOf grafanaTypes.dashboardConfig) (submodule {
options.settings = mkOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option does not show up in search.nixos.org which makes migration really hard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Apparently, datasources also don't show up. Same for man configuration.nix. Might be due to the really weird type. I don't even know where to approach the issue, unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically what we discussed in #191768 (comment).

Don't have a strong opinion whether this should be fixed in the module-system or whether we should just drop the either and force the submodule here. (If I had to guess I'd say that mkRenamedOptionModule doesn't work here because one is a sub-option of another option, but we could probably create a custom submodule-type which throws a more intuitive warning when a list is used for dashboards/datasources).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how specific of an issue it is to either, exactly. Is it possible to use oneOf here instead?
I still don't exactly see a different data type here, unfortunately. Is there an existing issue for the either problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now: #207962

@dhess
Copy link
Contributor

dhess commented Oct 29, 2022

I appreciate the effort and support the idea of moving to RFC42-style configuration across the board, but between dropping whole sections of previously-supported configuration without a migration strategy (e.g., extraOptions) and the bug with nested configuration in INI generation — especially given the auth configuration's reliance on nested config for any OAuth configuration — personally, I don't think this PR should have been merged as-is.

@KFearsoff
Copy link
Contributor Author

but between dropping whole sections of previously-supported configuration without a migration strategy (e.g., extraOptions)

Correct me if I'm wrong, but I think the settings pattern is already a migration strategy? Everything that you could stuff into extraOptions, you can stuff into settings, because extraOptions is just a different way of providing the same settings.
Perhaps an example would help. Personally, I think the bigger problem is the fact that not every setting even worked properly.

and the bug with nested configuration in INI generation

Yeah, that was... bad. I didn't even consider that INI might not support nested attrs, nor do I think that many people thought that it wouldn't. The misleading warning certainly doesn't help either.
Since then, I fixed the warning, so it should be better. The main problem persists, though. I opened an issue to track the progress on that: #197598, and I also tried to fix it myself, but no luck so far.

personally, I don't think this PR should have been merged as-is

I agree. There was quite a number of bugs (and there still are bugs, and pretty nasty ones that can't be fixed easily, like the fact that there's no more documentation, yikes). I've put in effort to fix them ASAP, but the damage is already done. I hope I've done as good of a job of salvaging the situation as I think I did, though.

Frankly, I'm unsure of how to handle the situation better next time. Doing more reviews would certainly help, but the scope of the PR is pretty gigantic, so people aren't too keen on doing that (can you blame them?). The manual and automatic testing that I've done didn't prevent the problems from leaking (once again, due to the gigantic scope). I'm also unsure if doing it in multiple PRs would help: the starting point looked very different from what we ended up with, the fact that I could edit the history freely helped a lot.

@@ -272,6 +272,12 @@ Available as [services.patroni](options.html#opt-services.patroni.enable).

- The `services.matrix-synapse` systemd unit has been hardened.

- The `services.grafana` options were converted to a [RFC 0042](https://github.com/NixOS/rfcs/blob/master/rfcs/0042-config-option.md) configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog entry is pretty much useless. The changelog should help users updating to figure out what they need to change. This technical justification is completely irrelevant – it should much rather tell people what they can expect to change and ideally how they need to change what in order to be able to update.

You shouldn't need to read an RFC just to get a vague idea what you need to change in your grafana configuration.

};
};

extraOptions = mkOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very frustrated at the moment at how this was removed – the error message you get is as unhelpful as possible. There should at least be an error message informing users how to migrate to the new style and ideally a compatibility shim (maybe with a deprecation warning) since adapting seems nontrivial.

@sternenseemann
Copy link
Member

I agree. There was quite a number of bugs (and there still are bugs, and pretty nasty ones that can't be fixed easily, like the fact that there's no more documentation, yikes). I've put in effort to fix them ASAP, but the damage is already done. I hope I've done as good of a job of salvaging the situation as I think I did, though.

Reverting would also have been an option (although it seems people have already started migrating their configuration), this would also have alleviated the pressure on you to get this done in time for the 22.11 release which is coming up relatively quickly.

@KFearsoff
Copy link
Contributor Author

Reverting would also have been an option (although it seems people have already started migrating their configuration), this would also have alleviated the pressure on you to get this done in time for the 22.11 release which is coming up relatively quickly.

Frankly, I was very surprised the merge happened before 22.11 release, I was assuming it will be there later (apparently, it hasn't landed in the stable branch yet, which I hope will continue to be the case). Perhaps it is my own misunderstanding of "Added a release notes entry if the change is significant" line: I considered that to be a hard requirement, not a sign of the readniess to merge.

As for the documentation part, can you please share the pain points you've encountered? I am very bad at writing documentation (and I don't particularly like the process), so I mostly just copy-pasted the release note from some other module. I don't think I've seen RFC42 migrations describing the exact details on how to do them, even though I do agree that this is not a good UX. I will try to make the most out of it.

As a side note: over half of the actual documentation vanished completely due to the bug with how documentation is rendered when types.either is used. I am immensely salty about that and how there's no easy fix. I'm also not too familiar with how release notes are even written: from my understanding of it, there's no space for proper description of the migration path. Perhaps someone knows more about the conventions that are currently in play in NixOS release notes?

As another side note: oh God. With the half of the documentation gone due to a bug, and the release notes not being rendered because they aren't even up, the migration path really is horrific. I might have picked a wrong "good first issue", huh.

@sternenseemann
Copy link
Member

apparently, it hasn't landed in the stable branch yet, which I hope will continue to be the case

It won't land in 22.05, but it will be part of 22.11 when it is branched off.

As for the documentation part, can you please share the pain points you've encountered?

I have never used the module, but happened to upgrade a machine where this module was used. The error I ran into was the removal of extraOptions which just tells you that the option doesn't exist – it seems like it is missing a mkRemovedOptionModule at the very least.

from my understanding of it, there's no space for proper description of the migration path

The release notes are quite big, you can't give a step by step guide of course, but I think taking a few paragraphs to give an outline of what was removed in favor of what should be fine. It is also easier to remove superfluous information during release note editing than adding missing information…

As another side note: oh God. With the half of the documentation gone due to a bug, and the release notes not being rendered because they aren't even up, the migration path really is horrific. I might have picked a wrong "good first issue", huh.

These things happen and we can sort them out! In this instance it seems to be a case of unfortunate timing, sadly.

@Ma27
Copy link
Member

Ma27 commented Oct 30, 2022

Yeah sorry, I underestimated the impact of the documentation problems, apologies for that!
To get this to an end, I'll take care of that (and check which other things are also to be fixed) by tomorrow!

@KFearsoff
Copy link
Contributor Author

It won't land in 22.05, but it will be part of 22.11 when it is branched off.

Oof.

I have never used the module, but happened to upgrade a machine where this module was used. The error I ran into was the removal of extraOptions which just tells you that the option doesn't exist – it seems like it is missing a mkRemovedOptionModule at the very least.

Oh, the mkRemovedOptionModule is my bad for sure, will fix.

The release notes are quite big, you can't give a step by step guide of course, but I think taking a few paragraphs to give an outline of what was removed in favor of what should be fine. It is also easier to remove superfluous information during release note editing than adding missing information…

That's fair. Okay, I'll try to outline how the migration should be done in a separate PR.

@Ma27
Copy link
Member

Ma27 commented Oct 31, 2022

Oof.

Well, it would've landed in a release sooner or later anyways and we still have a month until it's actually released, that should be enough to fix documentation and resolve remaining regressions.

Wait, are you planning to take care of it, not that we duplicate work now.

@KFearsoff
Copy link
Contributor Author

Wait, are you planning to take care of it, not that we duplicate work now.

I was going to start later today, feeling 8 hours from now. I don't mind you starting earlier and me reviewing the PR, though!

@Ma27
Copy link
Member

Ma27 commented Oct 31, 2022

OK, summary of what's problematic:

First of all, I'd like to apologize, I missed a few parts upon reviewing this and I'll do my best to not let this happen again.

I'll start a PR to fix up the documentation stuff now.

@Ma27
Copy link
Member

Ma27 commented Oct 31, 2022

OK, my friends' cat is telling me that I've done enough work today and wants to cuddle now, so expect a PR by tomorrow. I'm currently fixing the missing options in the docs issue.
@KFearsoff if you want to take care of improving the release notes (for a migration guide), feel free otherwise I'll do it tomorrow as well.

@Ma27
Copy link
Member

Ma27 commented Nov 2, 2022

Fix for the missing opts is on the following branch: https://github.com/Ma27/nixpkgs/tree/grafana-fixup

As soon as I've taken care of the releas-enotes I'll file a PR.

@Ma27
Copy link
Member

Ma27 commented Nov 2, 2022

Here we go: #199150.

I assume that this and #198113 need to go into master before branchoff and we're good or am I missing somethign else?

@KFearsoff
Copy link
Contributor Author

I assume that this and #198113 need to go into master before branchoff and we're good or am I missing somethign else?

The #198113 is a little bit in the air for me, but I don't mind it getting merged once the changes I requested I made. Otherwise, I think looks good?

@ju1m
Copy link
Contributor

ju1m commented Nov 4, 2022

I guess settings.analytics.check_for_updates should be set to false by default since updates are NixOS' responsability. But I may be wrong, I don't know grafana well enough.

@KFearsoff
Copy link
Contributor Author

I guess settings.analytics.check_for_updates should be set to false by default since updates are NixOS' responsability. But I may be wrong, I don't know grafana well enough.

As far as I can tell, the option should be pretty harmless since it only polls if there's an update, never tries to update anything, as stated in the docs: https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#check_for_updates .
I guess we might as well turn it off (why poll Github when we can, like, not do that?), but I'm not sure. For the time being, I'd like to play it safe and leave as much settings as possible to match defaults.

Ma27 added a commit that referenced this pull request Nov 20, 2022
nixos/grafana: documentation/warning improvements after #191768
tazjin pushed a commit to tvlfyi/kit that referenced this pull request Mar 28, 2023
* //ops/machines/whitby: Disable grafana, since the grafana module was
  changed upstream in a way that our configuration no longer works.
  Since the OpenSSL security update is relatively pressing, adapting the
  grafana configuration beforehand is not a hard requirement. See
  NixOS/nixpkgs#191768.

* //tools/depotfmt: keep Go at version 1.18 to forgo a reformat of the
  tree.

* //nix/buildGo: keep Go at version 1.18, as 1.19 changed the CLI
  interface (?) in a way that breaks buildGo.

* //3p/overlays/tvl: drop upstreamed tdlib upgrade.

* //3p/overlays/tvl: patch buf to work around breakage due to git 2.38.1

TODO items for Go are tracked in b/215.

Change-Id: Ie08fef49cf3db12e6b5225a8b992a990ddc5b642
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7141
Tested-by: BuildkiteCI
Autosubmit: sterni <sternenseemann@systemli.org>
Reviewed-by: grfn <grfn@gws.fyi>
Reviewed-by: tazjin <tazjin@tvl.su>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.