-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certificate refresh cycle does not work with laptop sleep #1199
Comments
I'm not sure what I'd expect from Go. In effect, the CPU is asleep when the refresh goroutine should run. So when the CPU awakes, does the refresh goroutine run immediately, or is it dropped? Here's the code in question: https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/main/proxy/proxy/client.go#L238-L242 I'll need to test this out myself. |
After playing around with this, I see the problem is that |
Is this is a bug with our implementation or in Go? |
When we say |
But perhaps it is a bug in Go with time.After -- the question is does Go make any claims about an elapsed duration when the CPU is asleep for part of it? |
From the docs on time:
This might just be a limitation of the hardware Go is running on. If the monotonic clock stops when the computer goes to sleep, I'm not sure what we could do here. |
I think this would be a valid workaround. Another solution I was thinking of was having a flag to lower the refresh time for certs. e.g. if a refresh is scheduled for every 20-30 minutes, the time between (laptop waking up)<->(needing to connect to db) is lower
Probably the issue. Our eng team uses M1 macs, so I'm not surprised it's causing an issue. |
Perhaps giving the refresh a func (c *Client) refreshCertAfter(instance string, timeToRefresh time.Duration, ctx Context) {
select {
case <-ctx.Done(): // potentially wall clock based?
case <-time.After(timeToRefresh): // retain monotonic clock functionality
} |
did some digging, seems like this is an accepted proposal for Go: golang/go#36141 |
Nice find! I'm curious to explore what implementation options we have in the meantime. |
Hi! Any update on a possible fix? This is causing inconvenience in our workflows so a fix would be greatly appreciated. Thanks! |
@honDhan Do you know what version of the proxy you are using? Looking at head, I suspect this might already be fixed since we check the cert expiration here: https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/49d3003c018afdc0cde54340d5be808f9dcd5c84/proxy/proxy/client.go#L380 |
I tested the latest version of the proxy, and see that this issue persists. In short, after waking from sleep, the proxy doesn't know to immediately refresh the certificate. I have a possible fix and will send a PR for discussion. |
@honDhan I did some testing with my patch and believe my PR fixes the issue. We'll be releasing this on the first Tuesday of next month as part of the usual release cadence, but if you'd like to try a dev build off of main, I'd be curious to hear if you see an improvement. |
Hey! Sorry for the lack of response. I am using .32.1 and it's been going well so far. No complaints from team members either. Thank you so much! |
Fantastic. Thanks for the update! |
In Go, when two time.Time objects have a monotonic clock value, calls to time.After will use the monotonic readings to compare the two values. However, when the CPU goes to sleep and later awakes, the monotonic clock will stop on some systems. As a result, the time comparison will produce inaccurate results. See the Go documentation on time for a discussion of the details: https://pkg.go.dev/time#hdr-Monotonic_Clocks. In an earlier commit [1], we added code to ensure the ephemeral certificate was not expired. If it was, we blocked the connection attempt on a certificate refresh. This solved one bug where a laptop went to sleep, later awoke, and was unable to connect given the existing certificate had expired. However, the change in #686 fixed this bug, but only for clients connecting with built-in authentication. [1] #686 When the Go Connector is configured to use built-in authentication, it compares the ephemeral certificate's expiration (a time.Time object produced from JSON deserialization and therefore lacking a monotonic clock reading) with time.Now(). Because only one of the two time values has a monotonic reading (i.e. time.Now()), the wall clock time is used to compare the two vales and calls to time.After produce valid results. When the Go Connector is configured to use auto IAM authentication, it updates the certificate's expiration to be the earlier of the certificate's expiration or the OAuth2 token's expiration. In the majority of cases, the OAuth2 token will expire earlier and so its expiration will be set to the certificate's expiration. Further, the OAuth2 token's expiration is a time.Time value *with* a monotonic clock reading. As a result, calls to time.After will be using time values that both have a monotonic clock reading, and so are susceptible to producing the wrong result when a machine has been asleep for some time. For example, with debug logging enabled in the Proxy, we observed the following logs after suspending the VM running the Proxy for more than an hour, resuming the VM, and then attempting to connect: Accepted connection from 127.0.0.1:56782 Now = 2024-03-10T04:34:37Z, Current cert expiration = 2024-03-10T00:38:48Z Cert is valid = true Dialing <INSTANCE_IP_ADDR> connection aborted - error reading from instance: remote error: tls: bad certificate Even though the cert expiration is clearly before "now," the Go Connector incorrectly concludes the certificate is still valid. This is on account of the erroneous monotonic clock value in time.Now that is then used in calls to time.After. This commit resolves this last bug by stripping the monotonic clock value from both times before doing the comparison. In addition to time.Round(0), time.UTC() will also strip the monotonic reading. This commit uses time.UTC(). Past, related issues: GoogleCloudPlatform/cloud-sql-proxy#1788 GoogleCloudPlatform/cloud-sql-proxy#1199 #402
In Go, when two time.Time objects have a monotonic clock value, calls to time.After will use the monotonic readings to compare the two values. However, when the CPU goes to sleep and later awakes, the monotonic clock will stop on some systems. As a result, the time comparison will produce inaccurate results. See the Go documentation on time for a discussion of the details: https://pkg.go.dev/time#hdr-Monotonic_Clocks. In an earlier commit [1], we added code to ensure the ephemeral certificate was not expired. If it was, we blocked the connection attempt on a certificate refresh. This solved one bug where a laptop went to sleep, later awoke, and was unable to connect given the existing certificate had expired. However, the change in #686 fixed this bug, but only for clients connecting with built-in authentication. [1] #686 When the Go Connector is configured to use built-in authentication, it compares the ephemeral certificate's expiration (a time.Time object produced from JSON deserialization and therefore lacking a monotonic clock reading) with time.Now(). Because only one of the two time values has a monotonic reading (i.e. time.Now()), the wall clock time is used to compare the two vales and calls to time.After produce valid results. When the Go Connector is configured to use auto IAM authentication, it updates the certificate's expiration to be the earlier of the certificate's expiration or the OAuth2 token's expiration. In the majority of cases, the OAuth2 token will expire earlier and so its expiration will be set to the certificate's expiration. Further, the OAuth2 token's expiration is a time.Time value *with* a monotonic clock reading. As a result, calls to time.After will be using time values that both have a monotonic clock reading, and so are susceptible to producing the wrong result when a machine has been asleep for some time. For example, with debug logging enabled in the Proxy, we observed the following logs after suspending the VM running the Proxy for more than an hour, resuming the VM, and then attempting to connect: Accepted connection from 127.0.0.1:56782 Now = 2024-03-10T04:34:37Z, Current cert expiration = 2024-03-10T00:38:48Z Cert is valid = true Dialing <INSTANCE_IP_ADDR> connection aborted - error reading from instance: remote error: tls: bad certificate Even though the cert expiration is clearly before "now," the Go Connector incorrectly concludes the certificate is still valid. This is on account of the erroneous monotonic clock value in time.Now that is then used in calls to time.After. This commit resolves this last bug by stripping the monotonic clock value from both times before doing the comparison. In addition to time.Round(0), time.UTC() will also strip the monotonic reading. This commit uses time.UTC(). Past, related issues: GoogleCloudPlatform/cloud-sql-proxy#1788 GoogleCloudPlatform/cloud-sql-proxy#1199 #402
In Go, when two time.Time objects have a monotonic clock value, calls to time.After will use the monotonic readings to compare the two values. However, when the CPU goes to sleep and later awakes, the monotonic clock will stop on some systems. As a result, the time comparison will produce inaccurate results. See the Go documentation on time for a discussion of the details: https://pkg.go.dev/time#hdr-Monotonic_Clocks. In an earlier commit [1], we added code to ensure the ephemeral certificate was not expired. If it was, we blocked the connection attempt on a certificate refresh. This solved one bug where a laptop went to sleep, later awoke, and was unable to connect given the existing certificate had expired. However, the change in #686 fixed this bug, but only for clients connecting with built-in authentication. [1] #686 When the Go Connector is configured to use built-in authentication, it compares the ephemeral certificate's expiration (a time.Time object produced from JSON deserialization and therefore lacking a monotonic clock reading) with time.Now(). Because only one of the two time values has a monotonic reading (i.e. time.Now()), the wall clock time is used to compare the two vales and calls to time.After produce valid results. When the Go Connector is configured to use auto IAM authentication, it updates the certificate's expiration to be the earlier of the certificate's expiration or the OAuth2 token's expiration. In the majority of cases, the OAuth2 token will expire earlier and so its expiration will be set to the certificate's expiration. Further, the OAuth2 token's expiration is a time.Time value *with* a monotonic clock reading. As a result, calls to time.After will be using time values that both have a monotonic clock reading, and so are susceptible to producing the wrong result when a machine has been asleep for some time. For example, with debug logging enabled in the Proxy, we observed the following logs after suspending the VM running the Proxy for more than an hour, resuming the VM, and then attempting to connect: Accepted connection from 127.0.0.1:56782 Now = 2024-03-10T04:34:37Z, Current cert expiration = 2024-03-10T00:38:48Z Cert is valid = true Dialing <INSTANCE_IP_ADDR> connection aborted - error reading from instance: remote error: tls: bad certificate Even though the cert expiration is clearly before "now," the Go Connector incorrectly concludes the certificate is still valid. This is on account of the erroneous monotonic clock value in time.Now that is then used in calls to time.After. This commit resolves this last bug by stripping the monotonic clock value from both times before doing the comparison. In addition to time.Round(0), time.UTC() will also strip the monotonic reading. This commit uses time.UTC(). Past, related issues: GoogleCloudPlatform/cloud-sql-proxy#1788 GoogleCloudPlatform/cloud-sql-proxy#1199 #402
* chore: improve debug logging This commit adds a few extra debug logs to make it clear when a certificate is invalid and being refreshed. In addition, to help with the readability of the debug logs, this commit rounds all refresh duration to the minute (e.g., 56m0s vs 55m59s). * fix: strip monotonic clock reading in cert check In Go, when two time.Time objects have a monotonic clock value, calls to time.After will use the monotonic readings to compare the two values. However, when the CPU goes to sleep and later awakes, the monotonic clock will stop on some systems. As a result, the time comparison will produce inaccurate results. See the Go documentation on time for a discussion of the details: https://pkg.go.dev/time#hdr-Monotonic_Clocks. In an earlier commit [1], we added code to ensure the ephemeral certificate was not expired. If it was, we blocked the connection attempt on a certificate refresh. This solved one bug where a laptop went to sleep, later awoke, and was unable to connect given the existing certificate had expired. However, the change in #686 fixed this bug, but only for clients connecting with built-in authentication. [1] #686 When the Go Connector is configured to use built-in authentication, it compares the ephemeral certificate's expiration (a time.Time object produced from JSON deserialization and therefore lacking a monotonic clock reading) with time.Now(). Because only one of the two time values has a monotonic reading (i.e. time.Now()), the wall clock time is used to compare the two vales and calls to time.After produce valid results. When the Go Connector is configured to use auto IAM authentication, it updates the certificate's expiration to be the earlier of the certificate's expiration or the OAuth2 token's expiration. In the majority of cases, the OAuth2 token will expire earlier and so its expiration will be set to the certificate's expiration. Further, the OAuth2 token's expiration is a time.Time value *with* a monotonic clock reading. As a result, calls to time.After will be using time values that both have a monotonic clock reading, and so are susceptible to producing the wrong result when a machine has been asleep for some time. For example, with debug logging enabled in the Proxy, we observed the following logs after suspending the VM running the Proxy for more than an hour, resuming the VM, and then attempting to connect: Accepted connection from 127.0.0.1:56782 Now = 2024-03-10T04:34:37Z, Current cert expiration = 2024-03-10T00:38:48Z Cert is valid = true Dialing <INSTANCE_IP_ADDR> connection aborted - error reading from instance: remote error: tls: bad certificate Even though the cert expiration is clearly before "now," the Go Connector incorrectly concludes the certificate is still valid. This is on account of the erroneous monotonic clock value in time.Now that is then used in calls to time.After. This commit resolves this last bug by stripping the monotonic clock value from both times before doing the comparison. In addition to time.Round(0), time.UTC() will also strip the monotonic reading. This commit uses time.UTC(). Past, related issues: GoogleCloudPlatform/cloud-sql-proxy#1788 GoogleCloudPlatform/cloud-sql-proxy#1199 #402 Fixes #749
Question
What is the best way to set up Cloud SQL Proxy on MacOS for local access to databases? This is so that our engineering team can securely connected to our databases.
I currently have things set up using
launchd
andlaunchctl
.Additional Context
As mentioned, I have things set up with
launchd
on MacOS.The issue is that sometimes, the proxy does not refresh the certificate. This happens when laptops go to sleep and wake up after the refresh time:
Notice how there was a refresh scheduled for 10:14, but it did not go through.
A possible solution is to have the launchd service restart on sleep/wake, but that isn't a feature of launchd and I'd prefer not to do that. Is this an issue with the proxy, Go, or MacOS?
The text was updated successfully, but these errors were encountered: