Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of ARI #286

Merged
merged 3 commits into from
May 7, 2024
Merged

Initial implementation of ARI #286

merged 3 commits into from
May 7, 2024

Conversation

mholt
Copy link
Member

@mholt mholt commented May 4, 2024

Add ARI (ACME Renewal Information) support.

ARI is currently in draft spec and subject to change: https://datatracker.ietf.org/doc/draft-ietf-acme-ari/03/

For certificates issued with ACME, if the ACME server supports it, CertMagic now considers ARI to trigger renewals.

As usual, the maintenance timer keeps certificates renewed, but now we also refresh ARI if it is time to poll again, and we also use ARI as a determining factor for triggering renewal in addition to the expiration date / lifetime. By default, maintenance routines run every 10 minutes so we will likely not miss an ARI change.

We also store ARI along with the certificate and its other metadata. This means that multiple Caddy instances can share the same ARI data. While we don't strictly synchronize the updating of ARI, it is merely an HTTP request, not a whole transaction with strict CA rate limits, so I don't think we need the complexity there. It is theoretically possible that two instances will update ARI at the same time and both write it to storage, but there's really no harm in that.

We conform to ARI spec recommendations and standards, to the best of my knowledge. Thanks to @aarongable for answering some questions there (though one is still open; not urgent though).

We will ignore ARI if a certificate is dangerously close to the end of its lifetime (last 5% currently) to ensure that buggy ARI implementations on either side do not put the site's availability at risk. Once a certificate is extremely close to expiring, we will renew no matter what ARI says.

I'm still testing this change and finishing it up, but I expect it won't be too much longer before it goes out.

TODO:

  • Log when ARI window changes
  • Better logging to indicate what triggers a renewal (expiration, ARI window, etc)
  • On-demand TLS integration

@mholt mholt mentioned this pull request May 4, 2024
@mholt mholt linked an issue May 4, 2024 that may be closed by this pull request
@mholt
Copy link
Member Author

mholt commented May 5, 2024

@aarongable @beautifulentropy Would you / anyone from Let's Encrypt be interested in doing a quick once-over to maybe verify the various client aspects of ARI that you're most concerned about as a CA?

Copy link
Contributor

@oliverpool oliverpool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts regarding an edge case (but note that I am not experienced with dealing with certificates, just a bit of Go experience)

config.go Outdated Show resolved Hide resolved
@mholt
Copy link
Member Author

mholt commented May 6, 2024

Latest commit ensures CA is the same before setting Replaces on the ACME order.

We also don't set Replaces on the 2nd retry (3rd attempt) of ACME transactions. This is in case the reason for the failed order is a rejected Replaces value. I chose 2nd retry in case the first failure is maybe sporadic, the first retry will fix it. 2nd retry is early on in the process but then if that fails too, then we know it's not related to ARI, so we can just keep retrying normally after that and still gain the benefits of updating the server state when it eventually does succeed.

I've also complexified the logic for deciding whether a certificate needs renewal. Before this PR, we just checked the expiration date, the current time, and the configured lifetime ratio. Now this change considers ARI if it is available, but the logic is nuanced to ensure maximum reliability and flexiblity:

  • If the ARI already has a selected time, use that. We do not also require a window to exist in the metadata. This allows sysadmins to manually schedule renewals independent of ARI which sounds useful. All they have to do is set the value in the JSON accordingly.
  • If there is an ARI window, but no selected time (for some reason?), then select a time. It's ephemeral (we don't store this generated random time) but that's OK. We just need a random time in the window.
  • If the selected time says it's not time to renew, we still check the expiration date to make sure a bug or state inconsistency somewhere doesn't just cause us to sit back and let the cert expire. Always check the expiration date. It has to be in the last 1/20 of the cert lifetime to supercede ARI here.
  • Otherwise, if there's no ARI or selected renewal time, just check the expiration against the current time and the configured renewal window ratio. This is the same as the old behavior.
  • If it still isn't deemed time to renew, do one last check for an emergency renewal, in the last, I dunno, 1/50th the cert's lifetime, OR the last few runs of the maintenance routine before the cert expires. This should help ensure that no matter what (mis)configuration there is, the certificate will be renewed as long as the server is running.

Still need to integrate with on-demand TLS and log when we detect a change to the ARI window.

@mholt
Copy link
Member Author

mholt commented May 6, 2024

The latest commit now checks ARI with on-demand TLS and when loading certs from storage (rather than just checking expiration date).

I've also added logging when a window change is detected, including the explanation URL.

I've also ensured that any updated ARI will trigger a renewal of the certificate if the new ARI window means that the certificate is to be renewed immediately.

After testing several scenarios with this code, I'm confident enough to merge it in, but I want more testing in the field before it gets released in a Caddy 2.8 production release.

@mholt
Copy link
Member Author

mholt commented May 7, 2024

I'm going to merge this so I can get it some practice in the field, but feedback/review is still welcomed if anyone desires to; we can still make fixes in follow-up commits/PRs.

@mholt mholt merged commit 0e88b3e into master May 7, 2024
6 checks passed
@mholt mholt deleted the ari branch May 7, 2024 15:46
DennisRasey pushed a commit to DennisRasey/forgejo that referenced this pull request May 12, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/caddyserver/certmagic](https://github.com/caddyserver/certmagic) | require | minor | `v0.20.0` -> `v0.21.0` |

---

### Release Notes

<details>
<summary>caddyserver/certmagic (github.com/caddyserver/certmagic)</summary>

### [`v0.21.0`](https://github.com/caddyserver/certmagic/releases/tag/v0.21.0)

[Compare Source](caddyserver/certmagic@v0.20.0...v0.21.0)

CertMagic v0.21 introduces some big changes:

-   Draft support for draft-03 of [ACME Renewal Information (ARI)](https://datatracker.ietf.org/doc/draft-ietf-acme-ari/) which assists with deciding when to renew certificates. This augments CertMagic's already-advanced logic using cert lifetime and OCSP/revocation status.
-   New [`ZeroSSLIssuer`](https://pkg.go.dev/github.com/caddyserver/certmagic@v0.21.0#ZeroSSLIssuer) uses the [ZeroSSL API](https://zerossl.com/documentation/api/) to get certificates. ZeroSSL also has an ACME endpoint, which can still be accesed using the existing ACMEIssuer, as always. Their proprietary API is paid, but has extra features like IP certificates, better reliability, and support.
-   DNS challenges should be smoother in some cases as we've improved propagation checking.
-   In the odd case your ACME account disappears from the ACME server, CertMagic will automatically retry with a new account. (This happens in some test/dev environments.)
-   ACME accounts are identified only by their public keys, but CertMagic maps accounts by CA+email for practical/storage reasons. So now you can "pin" an account key to use by specifying your email and the account public key in your config, which is useful if you need to absolutely be sure to use a specific account (like if you get rate limit exemptions from a CA).

Please try it out and report any issues!

Thanks to [@&#8203;Framer](https://github.com/Framer) for their contributions to this release!

#### What's Changed

-   Bump golang.org/x/crypto from 0.14.0 to 0.17.0 by [@&#8203;dependabot](https://github.com/dependabot) in caddyserver/certmagic#264
-   Demote "storage cleaning happened too recently" from WARN to INFO by [@&#8203;francislavoie](https://github.com/francislavoie) in caddyserver/certmagic#270
-   Check DNS propagation at authoritative nameservers only with default resolvers by [@&#8203;pgeh](https://github.com/pgeh) in caddyserver/certmagic#274
-   Retry with new account if account disappeared remotely by [@&#8203;mholt](https://github.com/mholt) in caddyserver/certmagic#269
-   Update readme examples to use TLS-ALPN const from ACMEz by [@&#8203;goksan](https://github.com/goksan) in caddyserver/certmagic#277
-   Initial implementation of ZeroSSL API issuer by [@&#8203;mholt](https://github.com/mholt) in caddyserver/certmagic#279
-   Allow deleting directories via FileStorage by [@&#8203;goksan](https://github.com/goksan) in caddyserver/certmagic#282
-   Use the `email` configuration in the ACME issuer to "pin" an account to a key by [@&#8203;ankon](https://github.com/ankon) in caddyserver/certmagic#283
-   Initial implementation of ARI by [@&#8203;mholt](https://github.com/mholt) in caddyserver/certmagic#286

#### New Contributors

-   [@&#8203;pgeh](https://github.com/pgeh) made their first contribution in caddyserver/certmagic#274
-   [@&#8203;goksan](https://github.com/goksan) made their first contribution in caddyserver/certmagic#277

**Full Changelog**: caddyserver/certmagic@v0.20.0...v0.21.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am" (UTC), Automerge - "before 4am" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNTEuMiIsInVwZGF0ZWRJblZlciI6IjM3LjM1MS4yIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6W119-->

Co-authored-by: Earl Warren <contact@earl-warren.org>
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/3724
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Co-committed-by: Renovate Bot <forgejo-renovate-action@forgejo.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ARI
2 participants