-
-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wireguard: non-invasive fix for permanent disconnects on unstable network (e.g. laptops) from dyndns endpoints #140890
wireguard: non-invasive fix for permanent disconnects on unstable network (e.g. laptops) from dyndns endpoints #140890
Conversation
Thanks for your work. I also have this issue. Say wireguard is set on your laptop,
According to the code, wireguard won't retry if it gets the Adding retry in the systemd unit does fix this issue. I do not known if there is a better way, like only retrying when the network is back online. |
Thanks for you further analysis @jian-lin ! I guess the basic periodic retry shouldn't cause significant system load unless there is a really large number of peers with dynamic IPs, but something smarter might be nicer (e.g. exponential backoff or when network connectivity is restored, as you suggested). We should be sure that such "smartness" doesn't introduce a new edge case where we accidentally fail to retry, though. |
I marked this as stale due to inactivity. → More info |
I'm having the same problem, exactly on a laptop with no permanent connection. If time allows I should be able to review soon. @jian-lin I think it's okay to keep the same interval, after all the invariant is "I want this peer to be refreshed every X seconds" and when the connection is brought back up you can expected it to have the right IP after X seconds |
@zarelit I use a better workaround, patching wireguard-tools. By better, I mean wireguard peer services are not restarted. That patch is as follows. Don't know if there is a way to distinguish between no network connection and EAI_NONAME.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on a intermittently-connected laptop and works.
The script that refreshes the endpoints quits with error when it's not possible to resolve the name of the peer and so the unit dies. This will happen 100% on a laptop with no automatic connection at boot.
The PR does not change the script and just guarantees that the refresh operation is tried every dynamicEndpointRefreshSeconds
even when the script fails so it's a safe merge unless I'm missing a usecase where having the peers not refreshed is somehow expected.
Somewhat related to #165474 i.e. lifecycle of wireguard-related units.
Also thanks @seb314
Also wrt/ #63869 (comment) This means that the restart of the unit is to handle the DNS not available case (e.g. because of lack of connection) and other transient failure are still handled by wg itself |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
@zarelit thanks for the review! Is there anything that can be done or improved about this PR that would improve the chances of it getting merged? |
Both @nh2 and @WilliButz worked on dynamically refreshed peers and systemd units that didn't restart in this module so maybe they can help us here |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
I changed the title to something more descriptive, in case the previous title was part of the reason why the PR has been stuck. (If there is something wrong with the new title, please let me know). |
cfefff7
to
2821cd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried out latest version (2022-09-27) with peers with mixed settings and works well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest some small changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sane to me (untested as I do not have any dynamic endpoint of this type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, untested but I have this problem and this looks to me like a suitable fix.
squash the commits the pr can be tested with https://nixos.wiki/wiki/Nixpkgs/Reviewing_changes#Modules |
Make the dynamic-dns refresh systemd service (controlled via the preexisting option dynamicEndpointRefreshSecond) robust to e.g. dns failures that happen on intermittent network connections. Background: When dns resolution fails with a 'permanent' error ("Name or service not known" instead of "Temporary failure in name resolution"), wireguard won't retry despite WG_ENDPOINT_RESOLUTION_RETRIES=infinity. -> This change should improve reliability/connectivity. somewhat related thread: NixOS#63869
f4f9839
to
82c5c3c
Compare
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping, would love to see this land.
🎉 thanks @Artturin ! |
Successfully created backport PR #204134 for |
Thanks @Artturin and thank you all for review, testing & support! |
Motivation for this change
When dns resolution fails with a permanent error ("Name or service not
known" instead of "Temporary failure in name resolution"), in the current
setup for dynamic dns with dynamicEndpointRefreshSeconds, wireguard
won't retry despite WG_ENDPOINT_RESOLUTION_RETRIES=infinity.
Ideally, dns would probably never report a permanent error for an
existing name, but unfortunately this does happen (maybe especially
with dynamic dns?) and cannot easily be fixed by the wireguard setup's
admin.
I can't think of a scenario where it is essential to not retry after a
negative dns response (given that the endpoint has been configured, the
dns name quite certainly exists), right?. On the other hand, a machine
that drops out of the vpn can be very annoying...
-> This change should improve reliability/connectivity.
somewhat related thread: #63869
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)