Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SslInfoContributor and SslHealthIndicator #41205

Closed

Conversation

jonatan-ivanov
Copy link
Member

@jonatan-ivanov jonatan-ivanov commented Jun 21, 2024

Production incidents because of invalid certificates are common issues in the industry. SslInfoContributor and SslHealthIndicator in this PR can help to mitigate them, they:

  • Provide extra information about the used certificates (issuer, subject, validity, etc.)
  • Indicate if a certificate is invalid and the reason of it
  • Indicate if a certificate will be invalid within a configurable threshold (e.g.: in a week)

Example /info and /health outputs:

/info of a VALID cert (click here to expand)
{
  "ssl": {
    "bundles": [
      {
        "name": "ssldemo",
        "certificateChains": [
          {
            "alias": "spring-boot",
            "certificates": [
              {
                "version": "V3",
                "signatureAlgorithmName": "SHA256withRSA",
                "validityStarts": "2024-06-21T21:17:02Z",
                "validityEnds": "2024-07-05T21:17:02Z",
                "validity": {
                  "status": "VALID"
                },
                "serialNumber": "9fbcacfed83af328",
                "issuer": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
                "subject": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
              }
            ]
          }
        ]
      }
    ]
  }
}
/health of a VALID cert (click here to expand)
{
  "status": "UP",
  "components": {
    "ping": {
      "status": "UP"
    },
    "ssl": {
      "status": "UP"
    }
  }
}
/info of an EXPIRED cert (click here to expand)
{
  "ssl": {
    "bundles": [
      {
        "name": "ssldemo",
        "certificateChains": [
          {
            "alias": "spring-boot-ssl-sample",
            "certificates": [
              {
                "version": "V3",
                "validity": {
                  "status": "EXPIRED",
                  "message": "Not valid after 2014-10-21T19:48:43Z"
                },
                "validityStarts": "2014-07-23T19:48:43Z",
                "validityEnds": "2014-10-21T19:48:43Z",
                "signatureAlgorithmName": "SHA256withRSA",
                "serialNumber": "7207ee6e",
                "issuer": "CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown",
                "subject": "CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown"
              }
            ]
          }
        ]
      }
    ]
  }
}
/health of an EXPIRED cert (click here to expand)
{
  "status": "OUT_OF_SERVICE",
  "components": {
    "ping": {
      "status": "UP"
    },
    "ssl": {
      "status": "OUT_OF_SERVICE",
      "details": {
        "certificateChains": [
          {
            "alias": "spring-boot-ssl-sample",
            "certificates": [
              {
                "version": "V3",
                "validityEnds": "2014-10-21T19:48:43Z",
                "validityStarts": "2014-07-23T19:48:43Z",
                "signatureAlgorithmName": "SHA256withRSA",
                "validity": {
                  "status": "EXPIRED",
                  "message": "Not valid after 2014-10-21T19:48:43Z"
                },
                "serialNumber": "7207ee6e",
                "issuer": "CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown",
                "subject": "CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown"
              }
            ]
          }
        ]
      }
    }
  }
}
/info of a cert that WILL_EXPIRE_SOON (click here to expand)
{
  "ssl": {
    "bundles": [
      {
        "name": "ssldemo",
        "certificateChains": [
          {
            "alias": "spring-boot",
            "certificates": [
              {
                "version": "V3",
                "validityEnds": "2024-06-22T21:32:22Z",
                "validityStarts": "2024-06-21T21:32:22Z",
                "signatureAlgorithmName": "SHA256withRSA",
                "validity": {
                  "status": "WILL_EXPIRE_SOON",
                  "message": "Certificate will expire within threshold (PT168H) at 2024-06-22T21:32:22Z"
                },
                "subject": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
                "serialNumber": "64d019d1dd94eee0",
                "issuer": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
              }
            ]
          }
        ]
      }
    ]
  }
}
/health of a cert that WILL_EXPIRE_SOON (click here to expand)
{
  "status": "UP",
  "components": {
    "ping": {
      "status": "UP"
    },
    "ssl": {
      "status": "WARNING",
      "details": {
        "certificateChains": [
          {
            "alias": "spring-boot-ssl-sample",
            "certificates": [
              {
                "version": "V3",
                "validityEnds": "2024-06-22T21:32:22Z",
                "validityStarts": "2024-06-21T21:32:22Z",
                "signatureAlgorithmName": "SHA256withRSA",
                "validity": {
                  "status": "WILL_EXPIRE_SOON",
                  "message": "Certificate will expire within threshold (PT168H) at 2024-06-22T21:32:22Z"
                },
                "subject": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
                "serialNumber": "64d019d1dd94eee0",
                "issuer": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
              }
            ]
          }
        ]
      }
    }
  }
}

If you want to play with it, start spring-boot-smoke-test-tomcat-ssl, the cert in resources/sample.jks is already EXPIRED, you can generate a VALID one via

keytool -genkeypair -storepass secret -keypass password -keystore sample.jks -storetype JKS -dname "CN=localhost, OU=Spring, O=VMware, L=Palo Alto, ST=California, C=US" -validity 14 -alias spring-boot -keyalg RSA -ext "SAN=DNS:localhost,IP:::1,IP:127.0.0.1"

or one that WILL_EXPIRE_SOON via:

keytool -genkeypair -storepass secret -keypass password -keystore sample.jks -storetype JKS -dname "CN=localhost, OU=Spring, O=VMware, L=Palo Alto, ST=California, C=US" -validity 1 -alias spring-boot -keyalg RSA -ext "SAN=DNS:localhost,IP:::1,IP:127.0.0.1"

@jonatan-ivanov jonatan-ivanov added the status: waiting-for-triage An issue we've not yet triaged label Jun 21, 2024
@jonatan-ivanov jonatan-ivanov mentioned this pull request Jun 21, 2024
15 tasks
@mhalbritter
Copy link
Contributor

mhalbritter commented Jun 27, 2024

Hey @jonatan-ivanov, that's quite a cool feature, thank you! The implementation looks good, too.

This works for all SSL bundles, and not only for the one used by the webserver, right?

@jonatan-ivanov
Copy link
Member Author

Great to hear! The main focus is the webserver but I think this should work for all SSL bundles (not tested yet) since using an expired cert can cause issues in every place they are used.

I can go ahead and work on the TODO items/docs/tests, in the meantime, can I get some feedback on two important items?

  1. Can/should we introduce a new (common) health status: WARNING (returns 200 but indicates that something is not right)? Or should we keep the current custom status in the PR (WILL_EXPIRE_SOON_STATUS)?
  2. Is calling WebServerSslBundle.get multiple times ok? See in the PR, in AbstractConfigurableWebServerFactory, and in NettyRSocketServerFactory or should there be just one instance (i.e.: a @Bean)?

@scottfrederick
Copy link
Contributor

Is calling WebServerSslBundle.get multiple times ok?

I think this is fine. However, I'm not sure we need to use WebServerSslBundle here.

That class adapts the discrete server.ssl.* properties to SslBundle design. If we ever decide to deprecate and remove the discrete server.ssl.* properties and only support bundles, then that class would go away.

We could just focus the InfoContributor on SSL bundles and document that you would need to convert from server.ssl.* properties to bundle properties to get the benefit of the InfoContributor.

@jonatan-ivanov
Copy link
Member Author

That class adapts the discrete server.ssl.* properties to SslBundle design. If we ever decide to deprecate and remove the discrete server.ssl.* properties and only support bundles, then that class would go away.

👍🏼 That's the exact same use-case I'm using WebServerSslBundle in this PR for: to be backwards compatible with the server.ssl.* properties. If only bundles will be supported and WebServerSslBundle will be removed in the future, the small block of code in SslInfo where WebServerSslBundle is used should also be removed since there won't be any properties to be backward compatible with.

We could just focus the InfoContributor on SSL bundles and document that you would need to convert from server.ssl.* properties to bundle properties to get the benefit of the InfoContributor.

That would simplify the changes in the PR a bit but also complicate the life of the users: simply enabling the feature might not do anything for them if they are not using bundles or it would work just half-way for them if they use bundles just not for the webserver. If it is not a big issue to use WebServerSslBundle, I would prefer supporting the server.ssl.* properties unless you have plans to deprecate them (and I guess eventually remove them).

@scottfrederick
Copy link
Contributor

Along with server.ssl.* (and management.server.ssl.*), we have spring.rsocket.server.ssl.*, spring.rabbitmq.ssl.*, and spring.kafka.*.ssl.* properties where users can currently choose to use discrete keystore and truststore properties or use a bundle. The web servers are the only place where we have an adapter like WebServerSslBundle that can be used to easily implement support for the discrete properties in the InfoContributor. We could choose to treat the discrete web server SSL properties specially as you've done by using the adapter (they are almost certainly the most commonly used SSL properties), or only support bundles so everything is treated equally and document that only bundles are supported. Let's see what others think about this.

In either case, I don't think it's necessary to make WebServerSslBundle a bean.

@jonatan-ivanov
Copy link
Member Author

Treating the discrete web server SSL properties specially somewhat makes sense to me since that's what WebServerSslBundle is also doing but having more "non-bundle" properties (I forgot about rabbitmq, kafka, and rsocket) drives me to the other direction: even if it would be nice to have backward-compatible support for the web server properties, it might make more sense to only support bundles and treat everything equally. :)

@jonatan-ivanov
Copy link
Member Author

I removed support for server.ssl.* properties (WebServerSslBundle) and added configprops support to set the warning threshold. Please let me know if this seems approximately ok and I can go ahed to write tests and docs.

@wilkinsona wilkinsona added for: team-meeting An issue we'd like to discuss as a team to make progress status: waiting-for-internal-feedback An issue that needs input from a member or another Spring Team type: enhancement A general enhancement and removed for: team-meeting An issue we'd like to discuss as a team to make progress status: waiting-for-triage An issue we've not yet triaged labels Jul 15, 2024
@jonatan-ivanov jonatan-ivanov force-pushed the cert-info branch 2 times, most recently from 1c6695d to 2cabf11 Compare August 3, 2024 00:17
@scottfrederick scottfrederick added the for: team-meeting An issue we'd like to discuss as a team to make progress label Aug 8, 2024
@mhalbritter
Copy link
Contributor

We need to discuss if we want to have this warning status and if we do, to which http code to map it.

@jonatan-ivanov
Copy link
Member Author

My two cents about having a dedicated WARNING status:

  • I imagine this status as something that means "Right now it's fine and not immediately actionable but if you don't do something, it might turn into an error sometime in the future".
  • Within this context, I think the appropriate HTTP status code is 200 since at the time of the response everything is still successfully working and load balancers should not remove the instance out of rotation, letting traffic flowing is totally fine.
  • Now if you ask me about the reason phrase... 😈 I think there is some wiggle room there since in the HTTP status line, the reason phrase is optional and OK is "only" recommended (not mandated) for 200. So theoretically, 200 Warning can also be returned but 200 OK would describe a situation just fine: things are still OK. (But if you really want to make Spring Boot cool, my personal recommendation is 200 🧨🧐🤔🤨🙊😬😱. 🙃)

Other than the feature introduced in this PR (the certificate will expire soon), I think there could be a few more use-cases where a WARNING status can be useful. For example for the disk space health indicator, when the disk is close to be full (within threshold), it's status can be WARNING. Another example can be updating local cache from an external resource periodically. If the update job fails the local cache might be stale. This could be fine for a while but can cause errors in the future.

@jonatan-ivanov jonatan-ivanov marked this pull request as ready for review August 14, 2024 05:30
@wilkinsona wilkinsona removed the status: waiting-for-internal-feedback An issue that needs input from a member or another Spring Team label Aug 14, 2024
@wilkinsona
Copy link
Member

We discussed this today and a couple of things came up:

  1. We're considering merging this without the custom status and then possibly adding that more broadly in a later milestone.
  2. It should be possible to see details of the certificates even when none of them have expired.

@wilkinsona wilkinsona removed the for: team-meeting An issue we'd like to discuss as a team to make progress label Aug 14, 2024
@jonatan-ivanov
Copy link
Member Author

What do you mean by the custom status? Do mean what is in the PR right now (WILL_EXPIRE_SOON ) or the proposed WARNING status? If he former, what should we use instead (UP)?

You can see all of the certs on the info endpoint, do you also want the health endpoint to show all of them under a ~valid/not-valid key?

@wilkinsona
Copy link
Member

Sorry, I quickly wrote that comment in the meeting and it was obviously too terse.

What do you mean by the custom status? Do mean what is in the PR right now (WILL_EXPIRE_SOON) or the proposed WARNING status? If the former, what should we use instead (UP)?

We're not keen on an indicator-specific status such as WILL_EXPIRE_SOON but quite like that idea of a more general purpose WARNING status. We'd like to give ourselves a bit of time to confirm that it will indeed be general purpose.

In the meantime, I think UP would be the status to use. Moritz is going to take a look at this.

You can see all of the certs on the info endpoint, do you also want the health endpoint to show all of them under a ~valid/not-valid key?

We'd like it to show something as additional details (so they'll only appear when details are visible) but we're not yet 100% sure exactly what it should be. Moritz is going to take a look at this too.

@jonatan-ivanov
Copy link
Member Author

jonatan-ivanov commented Aug 14, 2024

[..] but quite like that idea of a more general purpose WARNING status. We'd like to give ourselves a bit of time to confirm that it will indeed be general purpose.

Great to hear! Btw since this is targeted to a milestone release would it make sense to use new Status("WARNING") temporarily only in the health indicator without adding it to the Status class and see if we get any user feedback? Based on it (if any) we can introduce a general WARNING status in the Status class if it makes sense or change the health indicator to report UP before the RC. /cc @mhalbritter

@mhalbritter fyi: I modified the health indicator a bit so that now it returns the whole chain instead of the separate certs since if a cert is invalid in a chain, the whole chain is invalid. I think this also helps troubleshooting if there is an invalid cert (I updated the examples in the description).

@mhalbritter
Copy link
Contributor

I'd remove the warning bit completely for M2. Then we can take our time and come up with a WARNING status design, and if that's ready, we'll add the warning back to the ssl indicator (and possible to more health indicators). Does that sound good?

@jonatan-ivanov
Copy link
Member Author

jonatan-ivanov commented Aug 16, 2024

Makes sense, I changed it to UP also rebased it on main and squashed the changes.

@mhalbritter mhalbritter added this to the 3.4.x milestone Aug 19, 2024
mhalbritter pushed a commit that referenced this pull request Aug 19, 2024
@mhalbritter
Copy link
Contributor

Thanks @jonatan-ivanov !

The health endpoint now always adds details about invalid and valid chains.

@mhalbritter mhalbritter modified the milestones: 3.4.x, 3.4.0-M2 Aug 19, 2024
mhalbritter added a commit that referenced this pull request Aug 19, 2024
@jonatan-ivanov jonatan-ivanov deleted the cert-info branch August 19, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants