Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply: make sure that networkd is restarted when needed #449

Merged
merged 4 commits into from
Jun 26, 2024

Conversation

alfonsosanchezbeato
Copy link
Member

@alfonsosanchezbeato alfonsosanchezbeato commented Mar 26, 2024

This PR fixes these two problems:

  • It restarts networkd instead of reload/reconfigure
    Due to systemd-networkd bugs, reload/reconfigure was not working in
    some cases. Instead, restart the service if the configuration has
    changed (LP#2058976).
  • It compares full configuration to decide to restart or not networkd.
    Copy the current networkd configuration and then call generate so we
    know more accurately if systemd-networkd needs to be restarted.

@slyon slyon changed the title apply: make sure that networkd is started when needed apply: make sure that networkd is started when needed (LP: #2058976) Mar 26, 2024
@slyon
Copy link
Collaborator

slyon commented Mar 26, 2024

alfonsosanchezbeato: to be clear, the networkctl reload/reconfigure commands seem to work, but after 6-8 minutes the state goes back to the one expressed by the old .network files
alfonsosanchezbeato: you can even force the revert to happen sooner with sudo udevadm trigger, with that it tries to get control back too

@daniloegea
Copy link
Collaborator

Do you know how safe a networkd restart is? I mean, I'm trying to determine what kind of change in behavior it would introduce from the point of view of the user. I'm concerned it would cause this kind of problem: https://discuss.linuxcontainers.org/t/systemd-update-networkd-restart-breaks-routed-containers/9979

@slyon
Copy link
Collaborator

slyon commented Mar 28, 2024

This is going backwards wrt. #200

@alfonsosanchezbeato
Copy link
Member Author

This is going backwards wrt. #200

Right, as it looks like networkd is not really respecting the reload/reconfigure commands, at least on 20.04.

@alfonsosanchezbeato
Copy link
Member Author

Do you know how safe a networkd restart is? I mean, I'm trying to determine what kind of change in behavior it would introduce from the point of view of the user. I'm concerned it would cause this kind of problem: https://discuss.linuxcontainers.org/t/systemd-update-networkd-restart-breaks-routed-containers/9979

Restarting it should be in theory fine, this is how it was being done in the past. The issue you shared seemed to happen only on 18.04 I think.

@alfonsosanchezbeato
Copy link
Member Author

I have tested this through different UC versions, and I see:

  • 20.04 systemd-networkd has the bug
  • 22.04 systemd-networkd has the bug
  • 24.04 systemd-networkd has apparently fixed the bug

Therefore, it would make sense to backport b9ef830 to netplan in 20/22, and remove it from this PR, but keep 8dabb54. Are there branches where the commit could be proposed for 20/22?

An alternative could be to find out the patch fixing the issue in systemd-networkd and SRU to 20/22, but being systemd it might take longer than doing an SRU of netplan, I think.

@slyon
Copy link
Collaborator

slyon commented Apr 3, 2024

Thanks a lot for your continued investigation @alfonsosanchezbeato !

An alternative could be to find out the patch fixing the issue in systemd-networkd and SRU to 20/22, but being systemd it might take longer than doing an SRU of netplan, I think.

Do we have a clean reproducer? I could bisect systemd to find the commit fixing the behaviour. I'd prefer fixing it properly in systemd that backporting a workaround.

Therefore, it would make sense to backport b9ef830 to netplan in 20/22, and remove it from this PR, but keep 8dabb54. Are there branches where the commit could be proposed for 20/22?

I very much like what you did with dircmp there, this makes the code so much cleaner! We do not have upstream branches for Ubuntu releases, but ship changeset as distro patches in the packages directly. Alternatively, we could consider picking it into the next point-release, which might be SRUed to Jammy (UC22) eventually, but not Focal (UC20).

@alfonsosanchezbeato
Copy link
Member Author

alfonsosanchezbeato commented Apr 3, 2024

Thanks a lot for your continued investigation @alfonsosanchezbeato !

An alternative could be to find out the patch fixing the issue in systemd-networkd and SRU to 20/22, but being systemd it might take longer than doing an SRU of netplan, I think.

Do we have a clean reproducer? I could bisect systemd to find the commit fixing the behaviour. I'd prefer fixing it properly in systemd that backporting a workaround.

Create a UC22 image, then install network-manager snap, then run sudo udevadm trigger. If the networkctl command shows something like:

IDX LINK TYPE     OPERATIONAL SETUP    
  1 lo   loopback carrier     unmanaged
  2 ens3 ether    routable    failed   

2 links listed.

then you have the bug (if not, the interfaces should be still unmanaged).

@slyon
Copy link
Collaborator

slyon commented Apr 8, 2024

@slyon
Copy link
Collaborator

slyon commented Apr 8, 2024

Create a UC22 image, then install network-manager snap, then run sudo udevadm trigger. If the networkctl command shows something like:

I wonder if there is any reproducer to show this issue on Ubuntu classic?

@alfonsosanchezbeato alfonsosanchezbeato changed the title apply: make sure that networkd is started when needed (LP: #2058976) apply: make sure that networkd is restarted when needed (LP: #2058976) Apr 15, 2024
@daniloegea
Copy link
Collaborator

An alternative (instead of changing the current default behavior of apply) would be to add a new parameter to netplan apply (something like --force-backend-restart) to enable consumers of Netplan to force the restart of the backend in cases where the backend doesn't behave as expected or introduces a bug.

alfonsosanchezbeato added a commit to alfonsosanchezbeato/core-base that referenced this pull request Apr 19, 2024
alfonsosanchezbeato added a commit to alfonsosanchezbeato/core-base that referenced this pull request Apr 19, 2024
@slyon slyon force-pushed the ensure-networkd-restart branch from 8dabb54 to 6874d4a Compare May 7, 2024 13:56
@slyon slyon changed the title apply: make sure that networkd is restarted when needed (LP: #2058976) apply: make sure that networkd is restarted when needed May 7, 2024
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, it would make sense to backport b9ef830 to netplan in 20/22, and remove it from this PR, but keep 8dabb54.

So I've reverted the first commit in this PR (6874d4a), but kept the "apply: restart networkd instead of reload/reconfigure" commit as suggested (ac0d615).

Additionally, I've added a small fix that was failing autopkgtests when we have no networkd configuration at all in /run/systemd/networkd/ (directory doesn't exist) before or after the generator run.

With the workaround moved to canonical/core-base#214 I think this PR (just the cleaner networkd-config comparison logic) should now be good for a (squash) merge!

@daniloegea WDYT? (CC @alfonsosanchezbeato)

@slyon slyon requested a review from daniloegea May 29, 2024 10:35
@slyon slyon added community This PR has been proposed by somebody outside of the Netplan team and roadmap commitments. Canonical by Canonical employees outside the Netplan team labels Jun 5, 2024
alfonsosanchezbeato and others added 4 commits June 26, 2024 12:41
Due to systemd-networkd bugs, reload/reconfigure was not working in
some cases. Instead, restart the service if the configuration has
changed (LP#2058976).
Copy the current networkd configuration and then call generate so we
know more accurately if systemd-networkd needs to be restarted.
@slyon
Copy link
Collaborator

slyon commented Jun 26, 2024

Rebased.

Copy link
Collaborator

@daniloegea daniloegea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks fine.

So, effectively now we are going to continue to reload (not restart) networkd only when we detect changes in the generated configuration. It's sounds like a nice improvement.

The problem is that it doesn't fix the issue for UC and now they need to maintain a patch (PR#214) that will force the restart.

Wouldn't it help them if we introduce a way to force-restart the backends via a CLI parameter?

@alfonsosanchezbeato
Copy link
Member Author

Therefore, it would make sense to backport b9ef830 to netplan in 20/22, and remove it from this PR, but keep 8dabb54.

So I've reverted the first commit in this PR (6874d4a), but kept the "apply: restart networkd instead of reload/reconfigure" commit as suggested (ac0d615).

Additionally, I've added a small fix that was failing autopkgtests when we have no networkd configuration at all in /run/systemd/networkd/ (directory doesn't exist) before or after the generator run.

With the workaround moved to snapcore/core-base#214 I think this PR (just the cleaner networkd-config comparison logic) should now be good for a (squash) merge!

@daniloegea WDYT? (CC @alfonsosanchezbeato)

This is fine by me too, thank you!

@slyon slyon merged commit a527c51 into canonical:main Jun 26, 2024
15 of 16 checks passed
alfonsosanchezbeato added a commit to canonical/core-base that referenced this pull request Aug 8, 2024
@daniloegea daniloegea mentioned this pull request Sep 16, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Canonical by Canonical employees outside the Netplan team community This PR has been proposed by somebody outside of the Netplan team and roadmap commitments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants