Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting control server URL for Tailscale #7807

Merged
merged 1 commit into from
Jul 7, 2023

Conversation

dennwc
Copy link
Contributor

@dennwc dennwc commented Jun 20, 2023

This change enables the use of Headscale - open source implementation of the Tailscale control server.

Proposed Changes

Pretty simple: Add controlURL parameter to --vpn-auth, which will be propagated to Tailscale as --login-server flag. This flag will only be set if controlURL parameter is preset, so it's backward-compatible.

Types of Changes

New Feature

Verification

You'll need a Headscale instance for this, see docs. Then add controlURL=https://<your-headscale-server> to the usual --vpn-auth parameter.

Verification can be done in a negative way: set controlURL=https://example.com for a valid Tailscale join key, and it should fail to join the network by hitting the wrong domain. See #7352 for a general Tailscale setup.

Testing

1 - Install headscale in a separate VM by following these instructions .

2 - Edit the headscale config and set listen_addr: 0.0.0.0:8080

3 - As explained in the headscale instructions, create a user and a pre authenticated key

4 - Deploy k3s with tailscale passing the extra config vpnServerURL=, where url is http://$IP_HEADSCALE:8080

5 - Verify that tailscale interface gets an IPv4/IPv6 address

6 - In the headscale server, list the routes sudo headscale routes list and enable all of them (e.g. sudo headscale routes enable -r 1`)

7 - Verify you can ping between pods on different nodes

Linked Issues

#7824

User-Facing Change

Support connecting tailscale to a separate server (e.g. headscale)

Further Comments

@dennwc dennwc requested a review from a team as a code owner June 20, 2023 22:37
Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@brandond
Copy link
Member

cc @manuelbuil

@dennwc
Copy link
Contributor Author

dennwc commented Jun 21, 2023

CI failure is unrelated, right?

@dereknola
Copy link
Member

Yeah don't worry about it

@manuelbuil
Copy link
Contributor

Thanks for the contribution. We would like to keep the API as generic as possible in case we integrate with more VPN providers. Therefore, I'd prefer if we name the option as "vpnServerURL" instead of "controlURL"

@dennwc
Copy link
Contributor Author

dennwc commented Jun 26, 2023

Although I think that --vpn-auth=vpnServerURL=... is a bit "stuttering" (and serverURL alone is confusing), I updated the PR to use vpnServerURL.

@manuelbuil
Copy link
Contributor

@dennwc I was trying to set up headscale + tailscale, so that I could prepare a testing document for QA to run. However, I must be doing something wrongly because it is not working. Could you give me a hand? This is what I am doing:

1 - Install headscale by following https://headscale.net/running-headscale-linux/#migrating-from-manual-install. I'm using the latest version v0.22.3 and Ubuntu. I'm on Azure and tailscale client has version 1.44.0
2 - I edit /etc/headscale/config.yaml with just one change ==> listen_addr: 0.0.0.0:8080
3 - headscale service is active and running. I create a user and a key:

headscale users create myfirstuser
headscale --user myfirstuser preauthkeys create --reusable --expiration 24h

4 - In the client, I run:

sudo tailscale up --authkey $PREVIOUS-KEY --reset --login-server https://10.1.1.9:8080

And it blocks forever. Unfortunately, journalctl outputs nothing but I can see tcpdump packets with an HTTP 400 error coming from the headscale server:

09:16:01.819137 IP 10.1.1.7.34376 > 10.1.1.9.8080: Flags [S], seq 4234346916, win 64240, options [mss 1410,sackOK,TS val 2923518753 ecr 0,nop,wscale 7], length 0
09:16:01.819196 IP 10.1.1.9.8080 > 10.1.1.7.34376: Flags [S.], seq 735098295, ack 4234346917, win 65160, options [mss 1460,sackOK,TS val 4098053191 ecr 2923518753,nop,wscale 7], length 0
09:16:01.819699 IP 10.1.1.7.34376 > 10.1.1.9.8080: Flags [.], ack 1, win 502, options [nop,nop,TS val 2923518754 ecr 4098053191], length 0
09:16:01.819867 IP 10.1.1.7.34376 > 10.1.1.9.8080: Flags [P.], seq 1:258, ack 1, win 502, options [nop,nop,TS val 2923518754 ecr 4098053191], length 257: HTTP
09:16:01.819888 IP 10.1.1.9.8080 > 10.1.1.7.34376: Flags [.], ack 258, win 508, options [nop,nop,TS val 4098053192 ecr 2923518754], length 0
09:16:01.819962 IP 10.1.1.9.8080 > 10.1.1.7.34376: Flags [P.], seq 1:104, ack 258, win 508, options [nop,nop,TS val 4098053192 ecr 2923518754], length 103: HTTP: HTTP/1.1 400 Bad Request
09:16:01.819983 IP 10.1.1.9.8080 > 10.1.1.7.34376: Flags [F.], seq 104, ack 258, win 508, options [nop,nop,TS val 4098053192 ecr 2923518754], length 0
09:16:01.820408 IP 10.1.1.7.34376 > 10.1.1.9.8080: Flags [.], ack 104, win 502, options [nop,nop,TS val 2923518754 ecr 4098053192], length 0
09:16:01.820486 IP 10.1.1.7.34376 > 10.1.1.9.8080: Flags [F.], seq 258, ack 105, win 502, options [nop,nop,TS val 2923518754 ecr 4098053192], length 0

I guess there is some misconfiguration in the headscale server but I can't find it. Unfortunately, the headscale docs do not explain much about how to troubleshoot. Any idea what might be happening? Do you know where to look for headscale logs? Is there any way to increase the verbosity of those logs?

@manuelbuil
Copy link
Contributor

Found it! It was tls which was not configured and thus https would not work. By changing to:

sudo tailscale up --authkey $PREVIOUS-KEY --reset --login-server http://10.1.1.9:8080

things worked :). In any case, if you could recommend ways to get better logs or more verbosity, that would be great for the future.

In the meanwhile, could you add the --timeout flag when doing tailscale up? By doing this test (or the test you suggested with https://example.com), I noticed that tailscale client blocks forever if it can't find the server. I'd say a timeout of 30s seems reasonable enough for tailscale/headscale, right?

@manuelbuil
Copy link
Contributor

@dennwc to comply with K3s ways of working, we should edit the PR description and include the testing steps for QA, a linked issue (I created this one #7824) and add a release note. If you don't mind me changing it, I can do it :)

@codecov
Copy link

codecov bot commented Jun 26, 2023

Codecov Report

Patch coverage has no change and project coverage change: +4.29 🎉

Comparison is base (fe9604c) 47.22% compared to head (42bf5a1) 51.51%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7807      +/-   ##
==========================================
+ Coverage   47.22%   51.51%   +4.29%     
==========================================
  Files         143      143              
  Lines       14509    14521      +12     
==========================================
+ Hits         6852     7481     +629     
+ Misses       6571     5856     -715     
- Partials     1086     1184      +98     
Flag Coverage Δ
e2etests 49.31% <0.00%> (?)
inttests 44.47% <0.00%> (-0.12%) ⬇️
unittests 19.85% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/flannel/setup.go 37.90% <ø> (ø)
pkg/cli/cmds/agent.go 100.00% <ø> (ø)
pkg/vpn/vpn.go 0.00% <0.00%> (ø)

... and 43 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dennwc
Copy link
Contributor Author

dennwc commented Jun 26, 2023

@manuelbuil Sure, thank you, please do!

I will add the timeout, then 👍 I assume you mean set it to 30s by default, no new options added.

Regarding logs, sorry, cannot suggest anything useful here. Probably need to open an issue to Tailscale to include verbose logging. Cannot see relevant flags in the CLI help.

@manuelbuil
Copy link
Contributor

@manuelbuil Sure, thank you, please do!

I will add the timeout, then +1 I assume you mean set it to 30s by default, no new options added.

Regarding logs, sorry, cannot suggest anything useful here. Probably need to open an issue to Tailscale to include verbose logging. Cannot see relevant flags in the CLI help.

I would have expected more logs in the headscale server but I guess it is still in its infancy.

One extra comment I have. If you check this line: https://github.com/k3s-io/k3s/blob/master/pkg/agent/flannel/setup.go#L79, we are again using tailscale up to activate the subnet router. We do this at that point because we don't know the route before. I guess that as we are not passing --reset at that point, the --login-server config would remain. Could you verify that this is the case and subnet routing works correctly? Thanks!

@dennwc
Copy link
Contributor Author

dennwc commented Jun 26, 2023

Good catch! I missed that extension section 🤦‍♂️

From CLI docs:

If flags are specified, the flags must be the complete set of desired
settings. An error is returned if any setting would be changed as a
result of an unspecified flag's default value, unless the --reset flag
is also used.

So we need to pass all of the settings there as well.

@dennwc
Copy link
Contributor Author

dennwc commented Jun 26, 2023

@manuelbuil I see no easy way of getting auth options there. I'll need to expose a new field on types.Agent I guess.

I'm considering the following approaches:

  1. Add FlannelExtUpFlags []string, and allow VPN auth set these. It's probably the fastest way and less intrusive way. But I don't like that it's still too specific and these assumptions may break for other VPN providers.
  2. Allow VPN to override whole Flannel extension config. In this case the tailscale up ... will be specified with the right flags during VPN auth setup and Flannel won't need to know about Tailscale specifically.

@manuelbuil
Copy link
Contributor

@manuelbuil I see no easy way of getting auth options there. I'll need to expose a new field on types.Agent I guess.

I'm considering the following approaches:

  1. Add FlannelExtUpFlags []string, and allow VPN auth set these. It's probably the fastest way and less intrusive way. But I don't like that it's still too specific and these assumptions may break for other VPN providers.
  2. Allow VPN to override whole Flannel extension config. In this case the tailscale up ... will be specified with the right flags during VPN auth setup and Flannel won't need to know about Tailscale specifically.

What about changing that call in the flannel extension and instead of using tailscale up we use tailscale set, I think that makes more sense and should work too. That should keep the previous config, right?

@dennwc
Copy link
Contributor Author

dennwc commented Jun 26, 2023

That would work as well, but we also need tailscale up (without arguments) anyway, so that it's symmetric with tailscale down in ShutdownCommand. I assume tailscale set ... && tailscale up will work, right?

I tried implementing (2) in the meantime, it requires one new type for config, plus returning that type from StartVPN. Not as invasive as I assumed. Upside of that approach is that it pushes all tailscale-specific things to vpn package.

But it's up to you. I guess it's easier to start with set and maybe consider (2) as a separate refactoring PR.

pkg/cli/cmds/agent.go Outdated Show resolved Hide resolved
pkg/cli/cmds/agent.go Outdated Show resolved Hide resolved
pkg/vpn/vpn.go Outdated Show resolved Hide resolved
@manuelbuil
Copy link
Contributor

@dennwc I was testing your PR today, trying to understand if the second tailscale up will break things and realized k3s was failing to start. The reason is a bug: #7831. I'll prepare a PR this week to fix it (I'm a bit busy as I'm on a workshop until Thursday)

@manuelbuil
Copy link
Contributor

Fix for the bug is merged, could you please rebase? Then, after all reviews are successful, we can merge this PR! Note that code freeze is tomorrow EOD

@dennwc
Copy link
Contributor Author

dennwc commented Jul 6, 2023

Okay, great, will do it today 👍

@brandond
Copy link
Member

brandond commented Jul 6, 2023

I would love to get this in for the July releases, if we can get the feedback addressed (in addition to rebasing).

This change enables the use of Headscale - open source implementation of the Tailscale control server.

Signed-off-by: Denys Smirnov <dennwc@pm.me>
@dennwc
Copy link
Contributor Author

dennwc commented Jul 7, 2023

Rebased, removed tailscale up from set command, and also renamed everything to controlServerURL.

Let me know if anything else should be done here before the code freeze.

@manuelbuil
Copy link
Contributor

Rebased, removed tailscale up from set command, and also renamed everything to controlServerURL.

Let me know if anything else should be done here before the code freeze.

Perfect! As soon as CI passes, I'll merge and prepare the backports. Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants