Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

certs: update tscert dialer to connect to tsnet server #53

Merged
merged 2 commits into from
Jun 3, 2024

Conversation

willnorris
Copy link
Member

@willnorris willnorris commented May 17, 2024

This provides an alternate implementation for tscert.TailscaledDialer that tries to find a tsnet server for the requested certificate. This allows caddy-tailscale to be used with caddy's auto_https support.

Fixes #19

This is a draft PR that relies on changes in both tscert and caddy, which will need to be merged in order:

@irbekrm
Copy link
Contributor

irbekrm commented May 21, 2024

I had a look at the original issue, but it's not very clear to me- is the goal to issue ts.net certs from a non-Tailscale issuer or to use the Tailscale-issued ts.net cert for a non-tailnet endpoint? Is there an example I can try out to see what this PR achieves?
I tried a caddy config without the auto_https off stanza and it seems like, when connecting to a Tailscale node, it seems to work identically for the current main and this PR (issues a Tailscale cert for the MagicDNS name of the node)

module.go Outdated

// localAPIDialer finds the node that matches the requested certificate in ctx
// and dials that node's local API.
// If no matching node is found, the default dailer is used,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// If no matching node is found, the default dailer is used,
// If no matching node is found, the default dialer is used,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's stil there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done for real this time.

@willnorris
Copy link
Member Author

ah, good call out. It's really not obvious what this is doing.

So, caddy has really good native SSL support. It supports HTTP-01 acme challenges, DNS-01 challenges with a whole host of DNS providers, etc. For Tailscale hostnames, neither of those will work, since the host isn't publicly available online (needed for HTTP-01) and the user can't directly edit DNS records for ts.net (needed for DNS-01). So caddy also has native support for calling out to Tailscale to have it get certificates as needed. You can see the docs here, but there isn't much there since most of the time it just works.

You run caddy on a machine that already has Tailscale running, and include the full ts.net hostname in the config:

https://myhost.tailnet123.ts.net {
  respond "Hello"
}

Caddy will detect that it is a Tailscale hostname, and use the Tailscale cert manager to get the certificate. That cert manager in turns uses tscert, which is a very minimal LocalClient that will call out to the local tailscaled to request the certificate be issued and pass it back to caddy to use. The caveat is that it only works with a tailscaled running on the local machine that it can talk to, so it doesn't work for tsnet at all.

To initially support certs in caddy-tailscale, we basically ignored all of the above built-in functionality, and instead created the tailscale+tls network listener that handles certificate management itself. But this requires completely turning off caddy's automatic HTTPS support entirely, which is not ideal. The main goal of this PR is to allow using caddy's native HTTPS support, and in particular the native Tailscale cert support, with tsnet-powered listeners provided from caddy-tailscale.

We should ideally be able to use a caddyfile like:

https://host.tailnet123.ts.net {
  bind tailscale
  respond "Hello"
}

So for testing, I'm actually using this caddyfile:

{
  debug
  tailscale {
    ephemeral
  }
  servers {
    # Disable h3 until https://github.com/tailscale/tailscale/pull/12184
    protocols h1 h2
  }
}

:443 {
  bind tailscale/caddytest4
  tls {
    # this is only needed if the full ts.net hostname isn't used for the site address
    get_certificate tailscale
  }
  respond `Hello, world`
}

Without this PR, caddy will call out to tscert like normal which will try to talk to the local tailscaled. That will result in the error:

2024/05/31 05:43:10.204 WARN    tls.get_certificate.tailscale   could not get status; will try to get certificate anyway        {"error": "Get \"http://local-tailscaled.sock/localapi/v0/status\": dial unix /var/run/tailscale/tailscaled.sock: connect: no such file or directory"}
2024/05/31 05:43:10.205 ERROR   tls.handshake   external certificate manager    {"remote_ip": "100.111.108.23", "remote_port": "40820", "sni": "caddytest4.tailb575b.ts.net", "cert_manager": "*caddytls.Tailscale", "cert_manager_idx": 0, "error": "Get \"http://local-tailscaled.sock/localapi/v0/cert/caddytest4.tailb575b.ts.net?type=pair\": dial unix /var/run/tailscale/tailscaled.sock: connect: no such file or directory"}

With this PR, we override the dialer that tscert uses to connect to the LocalAPI, and have it connect to the relevant tsnet server. That should result in:

2024/05/31 05:45:33.594 DEBUG   tailscale       cert("caddytest4.tailb575b.ts.net"): registered ACME account.
2024/05/31 05:45:33.818 DEBUG   tailscale       cert("caddytest4.tailb575b.ts.net"): starting SetDNS call...
2024/05/31 05:45:43.228 DEBUG   tailscale       wg: [v2] [/DI02] - Receiving keepalive packet
2024/05/31 05:45:43.676 DEBUG   tailscale       Accept: TCP{100.111.108.23:47022 > 100.127.54.12:443} 52 tcp non-syn

2024/05/31 05:45:44.556 DEBUG   tailscale       cert("caddytest4.tailb575b.ts.net"): did SetDNS
2024/05/31 05:45:45.719 DEBUG   tailscale       cert("caddytest4.tailb575b.ts.net"): requesting cert...
2024/05/31 05:45:47.012 DEBUG   tailscale       cert("caddytest4.tailb575b.ts.net"): got cert
2024/05/31 05:45:47.031 DEBUG   tls.handshake   using externally-managed certificate    {"remote_ip": "100.111.108.23", "remote_port": "47022", "sni": "caddytest4.tailb575b.ts.net", "names": ["caddytest4.tailb575b.ts.net"], "expiration": "2024/08/29 04:45:46.000"}

I guess this change ends up tickling caddy's QUIC support in ways we weren't before, so I also need to wait for tailscale/tailscale#12184 to fix our UDP support in tsnet. I'll also update docs to no longer recommend using the tailscale+tls listener, but will still leave it around for people that are already using it. So, a bit more to do before this could be merged, but that's what it's doing.

@willnorris willnorris force-pushed the will/certs branch 2 times, most recently from bc546c5 to 509c374 Compare June 2, 2024 21:31
@willnorris willnorris marked this pull request as ready for review June 2, 2024 21:32
Signed-off-by: Will Norris <will@tailscale.com>
Copy link
Contributor

@irbekrm irbekrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation, it makes perfect sense- tested a setup with a node without a full ts.net hostname specified and worked as expected.

Should we also change the example here https://github.com/tailscale/caddy-tailscale/blob/main/examples/proxyauth.caddyfile#L33 that still uses the tailscale+tls listener?

At this time, the Tailscale plugin for Caddy doesn't support using Caddy's native HTTPS resolvers.
You will need to use the `tailscale+tls` bind protocol with a configuration like this:
Caddy's automatic HTTPS support can be used with the Tailscale network listener like any other site.
If the site address includes the full `ts.net` hostname, no additional configuration is necessary:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link https://tailscale.com/kb/1153/enabling-https from here? Or add a note that this will issue a LE cert for the tailnet hostname of the node (assuming that there will be some users who aren't that familiar with Tailscale HTTPS).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, good idea. done.

@willnorris
Copy link
Member Author

Should we also change the example here https://github.com/tailscale/caddy-tailscale/blob/main/examples/proxyauth.caddyfile#L33 that still uses the tailscale+tls listener?

oh, definitely! Thanks, I'll fix that.

@willnorris willnorris force-pushed the will/certs branch 2 times, most recently from 7316f15 to 4dae5bf Compare June 3, 2024 16:42
This provides an alternate implementation for tscert.TailscaledDialer
that tries to find a tsnet server for the requested certificate.
This allows caddy-tailscale to be used with caddy's auto_https support.

Fixes #19

Signed-off-by: Will Norris <will@tailscale.com>
@willnorris willnorris merged commit f2562ba into main Jun 3, 2024
3 checks passed
@willnorris willnorris deleted the will/certs branch June 3, 2024 16:52
willnorris added a commit that referenced this pull request Jun 7, 2024
This provides an http.RoundTripper implementation that dynamically
routes requests to the correct tsnet server's LocalAPI based on the
ClientHelloInfo in the context.

Updates #19
Updates #53
Updates #66
willnorris added a commit that referenced this pull request Jun 7, 2024
This provides an http.RoundTripper implementation that dynamically
routes requests to the correct tsnet server's LocalAPI based on the
ClientHelloInfo in the context.

Previously, we were just overriding the tscert Dialer. That worked fine
the first time it dialed a LocalAPI, and would correctly dial the right
tsnet server. However, tscert caches the Transport with that Dialer, so
requests that should be routed to different tsnet servers would be
routed incorrectly.

Updates #19
Updates #53
Updates #66
willnorris added a commit that referenced this pull request Jun 7, 2024
This provides an http.RoundTripper implementation that dynamically
routes requests to the correct tsnet server's LocalAPI based on the
ClientHelloInfo in the context.

Previously, we were just overriding the tscert Dialer. That worked fine
the first time it dialed a LocalAPI, and would correctly dial the right
tsnet server. However, tscert caches the Transport with that Dialer, so
requests that should be routed to different tsnet servers would be
routed incorrectly.

Updates #19
Updates #53
Updates #66

Signed-off-by: Will Norris <will@tailscale.com>
willnorris added a commit that referenced this pull request Jun 7, 2024
This provides an http.RoundTripper implementation that dynamically
routes requests to the correct tsnet server's LocalAPI based on the
ClientHelloInfo in the context.

Previously, we were just overriding the tscert Dialer. That worked fine
the first time it dialed a LocalAPI, and would correctly dial the right
tsnet server. However, tscert caches the Transport with that Dialer, so
requests that should be routed to different tsnet servers would be
routed incorrectly.

Updates #19
Updates #53
Updates #66

Signed-off-by: Will Norris <will@tailscale.com>
willnorris added a commit that referenced this pull request Jun 7, 2024
This provides an http.RoundTripper implementation that dynamically
routes requests to the correct tsnet server's LocalAPI based on the
ClientHelloInfo in the context.

Previously, we were just overriding the tscert Dialer. That worked fine
the first time it dialed a LocalAPI, and would correctly dial the right
tsnet server. However, tscert caches the Transport with that Dialer, so
requests that should be routed to different tsnet servers would be
routed incorrectly.

Updates #19
Updates #53
Updates #66

Signed-off-by: Will Norris <will@tailscale.com>
willnorris added a commit that referenced this pull request Jun 7, 2024
This provides an http.RoundTripper implementation that dynamically
routes requests to the correct tsnet server's LocalAPI based on the
ClientHelloInfo in the context.

Previously, we were just overriding the tscert Dialer. That worked fine
the first time it dialed a LocalAPI, and would correctly dial the right
tsnet server. However, tscert caches the Transport with that Dialer, so
requests that should be routed to different tsnet servers would be
routed incorrectly.

Updates #19
Updates #53
Updates #66

Signed-off-by: Will Norris <will@tailscale.com>
willnorris added a commit that referenced this pull request Jun 8, 2024
This provides an http.RoundTripper implementation that dynamically
routes requests to the correct tsnet server's LocalAPI based on the
ClientHelloInfo in the context.

Previously, we were just overriding the tscert Dialer. That worked fine
the first time it dialed a LocalAPI, and would correctly dial the right
tsnet server. However, tscert caches the Transport with that Dialer, so
requests that should be routed to different tsnet servers would be
routed incorrectly.

Updates #19
Updates #53
Updates #66

Signed-off-by: Will Norris <will@tailscale.com>
willnorris added a commit that referenced this pull request Jun 18, 2024
This provides an http.RoundTripper implementation that dynamically
routes requests to the correct tsnet server's LocalAPI based on the
ClientHelloInfo in the context.

Previously, we were just overriding the tscert Dialer. That worked fine
the first time it dialed a LocalAPI, and would correctly dial the right
tsnet server. However, tscert caches the Transport with that Dialer, so
requests that should be routed to different tsnet servers would be
routed incorrectly.

Updates #19
Updates #53
Updates #66

Signed-off-by: Will Norris <will@tailscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for auto_tls
2 participants