Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/plausible: Fix shell scripting errors, runtime fixes #130062

Merged
merged 3 commits into from
Jul 21, 2021

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Jul 13, 2021

Motivation for this change

See https://github.com/NixOS/nixpkgs/pull/124055/files#r668271575

Fixes script continuing on fatal errors instead of aborting.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
    • CI will show if they pass.
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • 21.11 Release Notes (or backporting 21.05 Relase notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@nh2 nh2 requested review from happysalada and Ma27 July 13, 2021 02:08
@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Jul 13, 2021
@nh2 nh2 mentioned this pull request Jul 13, 2021
9 tasks
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 13, 2021
@happysalada
Copy link
Contributor

Thank you for this!
Should we also add assertions that the necessary variables are set in the config? Just to provide a friendlier error message.

Out of curiosity what was your failure case? Did you not set those conf variables or was it something else?

Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ofborg test plausible

@nh2
Copy link
Contributor Author

nh2 commented Jul 14, 2021

Should we also add assertions that the necessary variables are set in the config? Just to provide a friendlier error message.

@happysalada I'm not sure what you mean, assertions on nix variables or shell variables?

For the shell variables, this PR adds set -u, so it'll complain e.g. MYVAR: unbound variable if an undefined shell variable is accessed.

Out of curiosity what was your failure case? Did you not set those conf variables or was it something else?

The problem from #124055 (comment): I didn't put my secrets into the nix store, where they are world-readable. As a result, the DynamicUser couldn't read them, but the script continued without giving good error messages.

@happysalada
Copy link
Contributor

I was thinking of a nix assert. But in your case you don't want to put those variables in the nix store right? So no assertions is fine probably.

Out of curiosity what solution did you end up with ? Something like agenix or sops2nix ?

@nh2
Copy link
Contributor Author

nh2 commented Jul 14, 2021

Out of curiosity what solution did you end up with ?

@happysalada I changed the module for now to not use DynamicUser.

@nh2 nh2 changed the title nixos/plausible: Fix shell scripting errors nixos/plausible: Fix shell scripting errors, runtime fixes Jul 14, 2021
@nh2
Copy link
Contributor Author

nh2 commented Jul 14, 2021

@Ma27 @happysalada I pushed 2 more commits, one that fixes that the patch introduced in https://github.com/NixOS/nixpkgs/pull/124055/files#diff-17ac0fd74e68fbbd03f80f2621d691d3f120a776537bae768bc2206de9559063R31 completely breaks Plausible because it's incompatible with the version of ecto_sql that plausible pins in its lockfile.

@ofborg ofborg bot requested a review from Ma27 July 14, 2021 03:45
@ofborg ofborg bot added 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 1-10 and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 14, 2021
@Ma27
Copy link
Member

Ma27 commented Jul 14, 2021

Please wait with merging, I'd like to take a closer look at this today!

patches = [ ./ecto_sql-fix.patch ];
patches = [
./ecto_sql-fix.patch
./plausible-Bump-clickhouse_ecto-dependency-to-be-compatible-with-ecto-3.6.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding a patch,, how about updating plausible to an unstable version ? (trying to use the current master for example)
There are a couple of things that work differently, but I would be happy to details the changes (the ones I followed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional info regarding this. Upstream provides now a secure way to read secrets, you can now specify a configuration folder and the which file the secret should be read from.
Appart from that, I don't think much has changed. If you are still against an update to an unstable version, no worries.

Note that <literal>/path</literal> components are currently ignored:
<link xlink:href="https://github.com/plausible/analytics/issues/1182">
https://github.com/plausible/analytics/issues/1182
</link>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<xref xlink:href="..." /> should be sufficient here.

@Ma27
Copy link
Member

Ma27 commented Jul 14, 2021

@nh2 I have to admit that I somehow lost track of what your issues are specifically after reading something in multiple threads/conversations. Are the fixes here plus #130101 sufficient?

Also regarding 1656e8e: can you elaborate what exactly went wrong there? I mean, the createdb.sh somehow succeeded still. I mean, I believe you, but I'd really like to fully understand what went wrong there :D

Fixes runtime error during `migrate.sh`:

    ** (UndefinedFunctionError) function ClickhouseEcto.lock_for_migrations/3 is undefined or private

The function `lock_for_migrations` indeed does not exist in the `ClickhouseEcto`
module packaged so far.

Reason:

So far we use the patch `ecto_sql-fix.patch` doing

    -      {:ecto_sql, "~> 3.0"},
    +      {:ecto_sql, "~> 3.6"},

@scvalex found that the commit that makes the `plausible/clickhouse_ecto` fork
compatible with 3.6 is _newer_ than the commit pinned in the `.lock` file,
namely  `1969f14ecef - Update for ecto 3.6.1`:

    plausible/clickhouse_ecto@1969f14ecef

That commit introduces the function that my error shows (`def lock_for_migrations`).

This means that the version that's in the lockfile (and pulled) was
incompatible with the version the nixpkgs patch forces.

This commit fixes it by patching the `.lock` file to have the version of
`plausible/clickhouse_ecto` that makes it compatible with `ecto_sql` 3.6.
@nh2 nh2 force-pushed the plausible-fix-shell-scripting-errors branch from 1656e8e to 6de23b9 Compare July 14, 2021 17:00
@ofborg ofborg bot requested a review from Ma27 July 14, 2021 17:10
@nh2
Copy link
Contributor Author

nh2 commented Jul 14, 2021

I have to admit that I somehow lost track of what your issues are specifically after reading something in multiple threads/conversations. Are the fixes here plus #130101 sufficient?

@Ma27 Yes. Concretely, there were 3 issues I encountered in this order:

  1. Clickhouse is not available on cache.nixos.org (clickhouse does not get built by Hydra #130101).
  2. Our shell scripts not erroring on failure, making it hard to debug why things aren't working (fixed by 8613698 in this PR)
  3. migrate.sh failing to set up the clickhouse schema for plausible. Because the nixpkgs patch bumped the ecto_sql dependency to 3.6, which resulted in a call to function lock_for_migrations. But this function does not exist in the clickhouse_ecto version pinned by plausible-1.3.0. Thus, migrate.sh crashes with ClickhouseEcto.lock_for_migrations/3 is undefined or private. (Fixed by 6de23b9 in this PR)
    lock_for_migrations

I mean, the createdb.sh somehow succeeded still.

I made a mistake in the commit message: It is migrate.sh that does the nonexistent function call, not createdb.sh. I've just force-pushed to fix the commit message.

I believe it should be impossible to deploy a working Plausible service on 6de23b983d2659d1a4782db4ea582bdfb52eb5b5^ (the commit before my fix), or on nixos-unstable, with the Plausible version packaged there.

Because that function that's called there, lock_for_migrations, does not exist, so Elixir crashes.

I have verified that using:

cd /nix/store/a9scafm69xld0hnl3vhvzrqb08zbfhk8-plausible-1.3.0/lib/clickhouse_ecto-0.2.8/ebin/
iex
import Elixir.ClickhouseEcto
exports Elixir.ClickhouseEcto

This prints the functions in the module, and lock_for_migrations/3 is not among them in nixos-unstable.

If you run the NixOS tests on current nixos-unstable, it does indeed show the failing function call:

NIX_PATH=nixpkgs=. nix-build --no-link nixos/tests/plausible.nix

Output:

machine # [   29.129497] dmqx3vgkg844pjdvgw0adx41lckzip01-plausible-setup[1086]: ** (UndefinedFunctionError) function ClickhouseEcto.lock_for_migrations/3 is undefined or private

So the conclusion here is that because of the lack of effective set -e the nixpkgs pre-start scripts, they just continued even though migrate.sh failed, and the checks that the NixOS VM test currently does work anyway, even though the Clickhouse DB is not properly migrated.

I'd really like to fully understand what went wrong there :D

Yes, full understanding is usually the best approach. 👍

Hope the above helps clear it up!

@nh2
Copy link
Contributor Author

nh2 commented Jul 15, 2021

Another related topic you may want to be aware of, also involving bash issues:

#130244

I have found the options to set the plausible listen IPs, and will make a PR for it soon.

Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's reasonable to delay this even more. The current changes solve an actual problem, so I'm going to merge this.
If you prefer to go to an unstable version now rather than waiting for an upstream-release, I'd be fine with that.

@Ma27 Ma27 merged commit 65d60ae into NixOS:master Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants