Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenv: fix crossSystem localSystem comparison #237427

Closed
wants to merge 3 commits into from

Conversation

Artturin
Copy link
Member

this appears to fix the following issue for some reason

nix-repl> (import ./. { localSystem.system = "x86_64-linux"; crossSystem.system = "x86_64-linux"; }).bash
«derivation /nix/store/5lz9p8xhf89kb1c1kk6jxrzskaiygnlh-bash-5.2-p15.drv»

nix-repl> (import ./. { localSystem.system = "x86_64-linux"; crossSystem = "x86_64-linux"; }).bash
«derivation /nix/store/1whiq03rsfrmch2ds3jhyqjaczfz97lb-bash-5.2-p15-x86_64-unknown-linux-gnu.drv»

fixed

nix-repl> (import ./. { localSystem.system = "x86_64-linux"; crossSystem = "x86_64-linux"; }).bash
«derivation /nix/store/5lz9p8xhf89kb1c1kk6jxrzskaiygnlh-bash-5.2-p15.drv»

issue reported by @spikespaz on matrix in the cross compilation channel

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added the 6.topic: stdenv Standard environment label Jun 12, 2023
@Mindavi Mindavi added the 6.topic: cross-compilation Building packages on a different platform than they will be used on label Jun 12, 2023
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jun 12, 2023
@roberth
Copy link
Member

roberth commented Jun 12, 2023

Ohh I did not expect to see this problem in practice, having noticed it only while handling platforms manually and not through the nixpkgs entrypoint, for testing.
I'd discussed a plan to mitigate the problem with @Ericson2314. It's still more of a workaround than a solution, I have to admit.

  1. add a platform equality function to lib.systems
  2. use it in all places where nixpkgs consumers system/platform parameters, so they can be turned into equal platforms, along these lines
# *Platform omitted for brevity
args@{ host, build }:
let inherit (
  # actually this part could be turned into a function
  if systems.equals args.host args.build
  then { inherit (args) host; build = host; }
  else { inherit (args) host build; }
  ) host build;
in
  # use host and build as usual

That way we can keep using regular equality on these types.
What's ugly is that its a bad habit in case you ever have to compare systems between distinct nixpkgs invocations. There's no workaround for that, except to use lib.systems.equals.

I think the real solution is to either
a. make equality break, so that nobody can use it by accident, or
b. remove all functions and put them in lib.systems instead.

But until we decide that, we may as well fix it for 95% of cases by explicitly making the values identical where it matters most.

@spikespaz
Copy link
Contributor

spikespaz commented Jun 13, 2023

The issue happens simply when just importing nixpkgs with localSystem.system = "x86_64-linux" or localSystem = "x86_64-linux". Whether or not I specify crossSystem = localSystem makes no difference; I have tried it both ways, traced it through default.nix and top-level/impure.nix and have determined that this would be the default regardless. I don't know why nobody else can confirm this, as the broken example given in the issue description is not as simple as my use.

Perhaps we could fix 100% of the problem by fixing function comparison in Nix?

  intrepid = inputs.nixpkgs.lib.nixosSystem {
    modules = with tree.hosts.intrepid; [
      bootloader
      filesystems
      plymouth
      configuration
      packages
      powerplan
      touchpad
      greeter
      # gamemode
      gaming
      pia-openvpn
    ];
    pkgs = import inputs.nixpkgs-unstable {
      overlays = [
        # flake packages
        self.overlays.default
        # override packages with an unfree license
        self.overlays.allowUnfree
        # other packages
        inputs.slight.overlays.default
        inputs.ragenix.overlays.default
      ];
      localSystem = "x86_64-linux";
      crossSystem = "x86_64-linux";
      config.allowUnfree = true;
    };
    specialArgs = {
      enableUnstableZfs = false;
    };
  };

I will see the following error:

error:
       … while calling the 'seq' builtin

         at /nix/store/3lv5x5bn3bvy0xmd35ds3dsmvk23591w-source/lib/modules.nix:326:18:

          325|         options = checked options;
          326|         config = checked (removeAttrs config [ "_module" ]);
             |                  ^
          327|         _module = checked (config._module);

       … while evaluating a branch condition

         at /nix/store/3lv5x5bn3bvy0xmd35ds3dsmvk23591w-source/lib/modules.nix:267:9:

          266|       checkUnmatched =
          267|         if config._module.check && config._module.freeformType == null && merged.unmatchedDefns != [] then
             |         ^
          268|           let

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: Neither nixpkgs.hostPlatform nor the legacy option nixpkgs.system has been set.
       You can set nixpkgs.hostPlatform in hardware-configuration.nix by re-running
       a recent version of nixos-generate-config.
       The option nixpkgs.system is still fully supported for NixOS 22.05 interoperability,
       but will be deprecated in the future, so we recommend to set nixpkgs.hostPlatform.

To resolve this temporarily I attempted to add { config.nixpkgs.hostPlatform = hostPlatform; } as a module, but this causes other problems.

@roberth
Copy link
Member

roberth commented Jun 13, 2023

Perhaps we could fix 100% of the problem by fixing function comparison in Nix?

It's fundamentally unfixable because of undecidability. We should make it throw, but that's a big breaking change.
I suppose we could make equality overridable. That would still take a while to be available in Nixpkgs, fwiw.

error: Neither nixpkgs.hostPlatform nor the legacy option nixpkgs.system has been set.

That shouldn't happen. You've specified pkgs manually, so the options in misc/nixpkgs.nix should be ignored. However, it is set through _module.args.pkgs and not through nixpkgs.pkgs so misc/nixpkgs.nix doesn't know that it should disable its assertions.

I've added a safe alternative read-only nixpkgs module as part of this.

Maybe that should be loaded by nixosSystem/eval-config.nix when pkgs is set, or maybe its behavior should be integrated into the standard module. That's a quite a challenge, not to break existing configs while making subtle changes there, but if we're willing to break the pkgs argument behavior anyway...

For now you could fix your example by setting the nixpkgs.pkgs option instead of the nixosSystem pkgs argument.

@@ -34,9 +34,11 @@ let

stagesCustom = import ./custom args;

removeFunctions = a: lib.filterAttrs (_: v: !builtins.isFunction v) a;
Copy link
Member

@roberth roberth Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a similar equality function here https://github.com/NixOS/nixpkgs/pull/231940/files#diff-2165823a8d82c5dd1353601bd290df8bd431f9ee2096750d9ef655cf5d251998

It's a bit more selective about the arguments it ignores, but that might make it unnecessarily fragile. I feel like your solution is probably better.

Though still fragile if someone puts a function in a nested attrset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added your solution with the test I previously wrote (almost unchanged)

@roberth roberth mentioned this pull request Jun 13, 2023
12 tasks
@spikespaz
Copy link
Contributor

spikespaz commented Jun 14, 2023

For now you could fix your example by setting the nixpkgs.pkgs option instead of the nixosSystem pkgs argument.

This seems to me missing the binary cache, I am rebuilding too many things. Tried both localSystem.system = "x86_64-linux" and localSystem = "x86_64-linux".

This is not a problem when I pass pkgs to nixosSystem and manually set the option config.nixpkgs.hostPlatform.

this appears to fix the following issue for some reason

```
nix-repl> (import ./. { localSystem.system = "x86_64-linux"; crossSystem.system = "x86_64-linux"; }).bash
«derivation /nix/store/5lz9p8xhf89kb1c1kk6jxrzskaiygnlh-bash-5.2-p15.drv»

nix-repl> (import ./. { localSystem.system = "x86_64-linux"; crossSystem = "x86_64-linux"; }).bash
«derivation /nix/store/1whiq03rsfrmch2ds3jhyqjaczfz97lb-bash-5.2-p15-x86_64-unknown-linux-gnu.drv»
```

fixed
```
nix-repl> (import ./. { localSystem.system = "x86_64-linux"; crossSystem = "x86_64-linux"; }).bash
«derivation /nix/store/5lz9p8xhf89kb1c1kk6jxrzskaiygnlh-bash-5.2-p15.drv»
```
@Artturin Artturin marked this pull request as ready for review June 15, 2023 21:02
@spikespaz
Copy link
Contributor

spikespaz commented Jun 16, 2023

I have cherry-picked the commits from both pull requests, and add a module {config.nixpkgs.pkgs = ...;} but get infinite recursion. Now when I try to add --show-trace to the command, I see

error: cached failure of attribute 'nixosConfigurations.intrepid.config'

This is happening a lot now and it is very annoying. Sometimes I want to see an error message multiple times. How do I prevent this "cache" error from happening? What does it mean? Why am I seeing it more?

Maybe you know about this well enough to save me the trouble of recovering the full trace (this time).

error:
       … while calling the 'seq' builtin

         at /nix/store/vj42zj0563qkm9ga6473a8k8il55v38h-source/lib/modules.nix:326:18:

          325|         options = checked options;
          326|         config = checked (removeAttrs config [ "_module" ]);
             |                  ^
          327|         _module = checked (config._module);

       … while evaluating a branch condition

         at /nix/store/vj42zj0563qkm9ga6473a8k8il55v38h-source/lib/modules.nix:267:9:

          266|       checkUnmatched =
          267|         if config._module.check && config._module.freeformType == null && merged.unmatchedDefns != [] then
             |         ^
          268|           let

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: infinite recursion encountered

       at /nix/store/vj42zj0563qkm9ga6473a8k8il55v38h-source/lib/modules.nix:512:28:

          511|         builtins.addErrorContext (context name)
          512|           (args.${name} or config._module.args.${name})
             |                            ^
          513|       ) (lib.functionArgs f);

@spikespaz
Copy link
Contributor

spikespaz commented Jun 16, 2023

Is it possible that the following lines in impure.nix are responsible for the original problem?
My guess is that the values localSystem and crossSystem are not the same if both are provided, as some other value is used via the confusing default logic.

This is the top of the attrset bound to args:

{ # We put legacy `system` into `localSystem`, if `localSystem` was not passed.
  # If neither is passed, assume we are building packages on the current
  # (build, in GNU Autotools parlance) platform.
  localSystem ? { system = args.system or builtins.currentSystem; }

# These are needed only because nix's `--arg` command-line logic doesn't work
# with unnamed parameters allowed by ...
, system ? localSystem.system
, crossSystem ? localSystem

Perhaps the issue that you "did not expect to see ... in practice" is caused by this confusing evaluation flow.

My tests show otherwise, but I feel as if I am still missing something.

nix-repl> test = args @ { localSystem ? { system = args.system; }, system ? localSystem.system, crossSystem ? localSystem }: { inherit localSystem crossSystem; }

nix-repl> :p test { system = "x86_64-linux"; }
{ crossSystem = { system = "x86_64-linux"; }; localSystem = «repeated»; }

nix-repl> :p test { localSystem = "x86_64-linux"; }
{ crossSystem = "x86_64-linux"; localSystem = "x86_64-linux"; }

nix-repl> :p test { localSystem.system = "x86_64-linux"; }
{ crossSystem = { system = "x86_64-linux"; }; localSystem = «repeated»; }

# is this the problem?
nix-repl> :p test { localSystem = "x86_64-linux"; crossSystem = "x86_64-linux"; }
{ crossSystem = "x86_64-linux"; localSystem = "x86_64-linux"; }

nix-repl> :p test { localSystem.system = "x86_64-linux"; crossSystem.system = "x86_64-linux"; }
{ crossSystem = { system = "x86_64-linux"; }; localSystem = { system = "x86_64-linux"; }; }

I understand that you both have already looked into this and are aware of this. I guess what I'm still stuck on is this: If they are the same value here, whether string or attrset, default or provided; why does the comparison differ?

The following two attrsets can begotten from either the default crossSystem value or if they are assigned to both be the same when importing nixpkgs.

{ crossSystem = { system = "x86_64-linux"; }; localSystem = «repeated»; }
{ crossSystem = "x86_64-linux"; localSystem = "x86_64-linux"; }

So why does the comparison see them as different? Because of the lack of the usage of lib.systems.equals? Okay, then furthermore, why does it work when the same value is achieved through crossSystem ? localSystem but not if it were manually set?

I'm still trying to wrap my head around this. As far as I can tell, through days of playing with it, the file in which the comparison is performed (before lib.systems.elaborate) should get the same values whether or not crossSystem was manually set or not.

@roberth
Copy link
Member

roberth commented Jun 16, 2023

error: cached failure of attribute 'nixosConfigurations.intrepid.config'

Which nix version, and which command is that?
That error should only ever be visible in commands that deal with large quantities of attributes, so this particular error seems like a Nix bug.
Related:

This is a workaround that lets us use `==` on systems despite the
functions that would make `==` always return `false` if a
workaround like this is not applied.

The real solution is to either use `lib.systems.equals` for all
such comparisons, or to remove all functions from the elaborated
systems.
@spikespaz
Copy link
Contributor

Which nix version, and which command is that?

Both nix build and nixos-rebuild produce the same message if invoked twice, the first time resulting in an error code and trace.

Nix version is 2.15.1.

if crossSystem0 == null || crossSystem0 == args.localSystem
then localSystem
else lib.systems.elaborate crossSystem0;
if crossSystem0 == null || lib.systems.equals crossSystem0 args.localSystem
Copy link
Member Author

@Artturin Artturin Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/GrahamcOfBorg/1e800ef80702f76206b24b79c9f0dc1d

error: value is a string while a set was expected

crossSystem0 and args.localSystem haven't been elaborated so they're possibly just strings

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crossSystem0 and args.localSystem haven't been elaborated so they're possibly just strings

IMHO we should elaborate unconditionally here.

@ghost
Copy link

ghost commented Jun 17, 2023

I think the real solution is to either
a. make equality break, so that nobody can use it by accident, or
b. remove all functions and put them in lib.systems instead.

I think these are the only two reasonable solutions.

Does anybody object strongly to (b)?

Edit: #238331

@Artturin
Copy link
Member Author

Artturin commented Sep 28, 2023

#254763 which seems to make this unnecessary, was merged.

@Artturin Artturin closed this Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cross-compilation Building packages on a different platform than they will be used on 6.topic: stdenv Standard environment 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants