Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/networkd: fails non-deterministically with predictable interface names #39069

Closed
xeji opened this issue Apr 17, 2018 · 6 comments · Fixed by #39340
Closed

nixos/networkd: fails non-deterministically with predictable interface names #39069

xeji opened this issue Apr 17, 2018 · 6 comments · Fixed by #39340

Comments

@xeji
Copy link
Contributor

xeji commented Apr 17, 2018

Issue description

When using systemd-networkd together with predictable interface names, there seems to be a race condition between networkd activating the interfaces and udevd renaming them, resulting in non-deterministic failure of interface renaming and setup.

For an example, see these hydra builds: failed and succeeded

Issue occurs on 18.03 and current master (i.e. with systemd 237 and 238).

It is probably an upstream bug (but systemd has not confirmed this yet). I found two similar reports: systemd/systemd#7293 and coreos/bugs#1767

Since the corresponding test is part of the tested jobs for the nixos-unstable-small and release-18.03-small channels, the sporadically failing test can delay the channels.

I suggest removing this test from the tested jobs for the time being until this is fixed.

Steps to reproduce

Although the issue seems non-deterministic in nature (see hydra builds), I can quite reliably reproduce it running the test on my local machine:
nix-build '<nixpkgs/nixos/tests/predictable-interface-names.nix>' -A vm-test-run-predictableInterfaceNames-with-networkd

Technical details

nixos 18.03.132008.ad771371fb2 or master e0c9a25

@Mic92
Copy link
Member

Mic92 commented Apr 17, 2018

Note that our udev rules for network interfaces are not in sync with what systemd upstream uses.
So that bugs in our system are probably different from what other linux distributions report.

@xeji
Copy link
Contributor Author

xeji commented Apr 18, 2018

The only relevant change to udev rules that I found so far is that we use a custom version of 80-net-setup-link.rules when predictable interface names are enabled. Not sure why the built-in upstream solution doesn't work for us.

xeji added a commit to xeji/nixpkgs that referenced this issue Apr 19, 2018
…test

remove vm-test-run-predictableInterface-names-with-networkd from tested,
as it failed non-deterministically due to a race condition (NixOS#39069),
which kept the "small" channels from updating.

This temporary fix should be reverted after NixOS#39069 is fixed.
@xeji
Copy link
Contributor Author

xeji commented Apr 21, 2018

After some tests, here's what I think is happening: Stage 1 udev discovers and adds the network interfaces with their original ethX names. Renaming doesn't happen until stage 2. But networkd tries to configure all interfaces it finds, so there's a race condition between stage 2 udev and networkd.

This can be (sort of) "solved" by an ugly hack: Add ExecStartPre=sleep 5 to networkd to allow time for renaming. Fixes the test but slows down the boot and still does not guarantee success.

A better solution might be to let stage 1 udev rename the interfaces. I tried but haven't quite figured out how to do this. We might try to go back to the upstream way of predictable interface names controlled by the net.ifnames kernel boot option.

@shlevy maybe you can help? (I found your f756369 from a long time ago, which was reverted after a few days)

@shlevy
Copy link
Member

shlevy commented Apr 21, 2018

Hmm not sure here, sorry!

@xeji
Copy link
Contributor Author

xeji commented Apr 22, 2018

Fixed by applying predictable interface names in stage 1 already.

rnhmjoj added a commit that referenced this issue Feb 20, 2021
systemd-udev-settle is a terrible hack[1] and should never[2] ever[3]
used, seriously it's very bad. It was used as a stop-gap solution for
issue #39069, but thanks to PR #79532 it can be removed now.

[1]: systemd/systemd#7293 (comment)
[2]: #73095
[3]: #107341
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixosstag.fcio.net/t/predictable-network-interface-names-in-initrd/4055/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants