Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix context.DeadlineExceeded error comparison in retry.IsTemporary #1488

Merged
merged 1 commit into from
Nov 10, 2022

Conversation

ethan-lowman-dd
Copy link
Contributor

When using the default retry predicate, context.DeadlineExceeded errors wrapped in a *url.Error are misclassified as temporary and retryable due to an error equality comparison that should be an errors.Is.

This bug results in behavior where timeouts cause the client to continue retrying (and therefore sleeping due to the retry backoff) past the client timeout. This script can be used to reproduce the issue. It simulates 5 seconds of latency on every HTTP request and a 10 second client timeout.

git clone git clone https://gist.github.com/7f7af09a5ca5ee9146e41b16f0d87e7d.git go-containerregistry-timeout-test
cd go-containerregistry-timeout-test
go run main.go

Output:

# --------------------- SNIP --------------------- 

2022/11/09 20:53:52 POST /v2/example/blobs/uploads/ HTTP/1.1
Host: 127.0.0.1:56629
User-Agent: go-containerregistry/v0.12.0
Content-Length: 0
Content-Type: application/json
Accept-Encoding: gzip


2022/11/09 20:53:57 ----- CLIENT TIMEOUT (10s) -----
2022/11/09 20:53:57 <-- context deadline exceeded POST http://127.0.0.1:56629/v2/example/blobs/uploads/ (4.9922905s)
2022/11/09 20:53:57 retrying Post "http://127.0.0.1:56629/v2/example/blobs/uploads/": context deadline exceeded

# --------------------- SNIP --------------------- 

2022/11/09 20:54:01 Failed to dump request HEAD http://127.0.0.1:56629/v2/example/blobs/sha256:41f2c4d9dad071c0003fecaafe115dd1913b05d8945915c591b3b983ab6849cb: context deadline exceeded
2022/11/09 20:54:01 <-- context deadline exceeded HEAD http://127.0.0.1:56629/v2/example/blobs/sha256:41f2c4d9dad071c0003fecaafe115dd1913b05d8945915c591b3b983ab6849cb (3.917µs)
2022/11/09 20:54:01 retrying Head "http://127.0.0.1:56629/v2/example/blobs/sha256:41f2c4d9dad071c0003fecaafe115dd1913b05d8945915c591b3b983ab6849cb": context deadline exceeded
2022/11/09 20:54:01 elapsed: 14.239486292s
panic: Head "http://127.0.0.1:56629/v2/example/blobs/sha256:9283b721d9d35fbb004f574b23254199beb438562a3641f963f28353923f456d": context deadline exceeded

goroutine 1 [running]:
main.main()
        /Users/ethan.lowman/go/src/github.com/ethan-lowman-dd/go-containerregistry-timeout-test/main.go:75 +0x220
exit status 2

We see that the extra retry backoff sleeps result in a little over 4 seconds of latency past the client timeout (14.2 seconds elapsed).

If we use the code in this PR (via a replace github.com/google/go-containerregistry v0.12.0 => github.com/ethan-lowman-dd/go-containerregistry v0.11.1-0.20221110014927-92a137e06282 directive in go.mod), this is the output:


# --------------------- SNIP --------------------- 

2022/11/09 20:58:47 POST /v2/example/blobs/uploads/ HTTP/1.1
Host: 127.0.0.1:56680
User-Agent: go-containerregistry/v0.12.0
Content-Length: 0
Content-Type: application/json
Accept-Encoding: gzip


2022/11/09 20:58:52 ----- CLIENT TIMEOUT (10s) -----
2022/11/09 20:58:52 <-- context deadline exceeded POST http://127.0.0.1:56680/v2/example/blobs/uploads/ (4.987488292s)
2022/11/09 20:58:52 <-- context deadline exceeded POST http://127.0.0.1:56680/v2/example/blobs/uploads/ (4.982909167s)
2022/11/09 20:58:52 <-- context deadline exceeded POST http://127.0.0.1:56680/v2/example/blobs/uploads/ (4.983053458s)
2022/11/09 20:58:52 <-- context deadline exceeded POST http://127.0.0.1:56680/v2/example/blobs/uploads/ (4.985901458s)
2022/11/09 20:58:52 <-- context deadline exceeded POST http://127.0.0.1:56680/v2/example/blobs/uploads/ (4.987369916s)
2022/11/09 20:58:52 elapsed: 10.002162625s
panic: Post "http://127.0.0.1:56680/v2/example/blobs/uploads/": context deadline exceeded

goroutine 1 [running]:
main.main()
        /Users/ethan.lowman/go/src/github.com/ethan-lowman-dd/go-containerregistry-timeout-test/main.go:75 +0x220
exit status 2

We now see that elapsed time is very close to the client timeout of 10 seconds.

@codecov-commenter
Copy link

Codecov Report

Merging #1488 (92a137e) into main (353a117) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1488   +/-   ##
=======================================
  Coverage   73.19%   73.19%           
=======================================
  Files         115      115           
  Lines        8812     8812           
=======================================
  Hits         6450     6450           
  Misses       1712     1712           
  Partials      650      650           
Impacted Files Coverage Δ
internal/retry/retry.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jonjohnsonjr jonjohnsonjr merged commit 76ae819 into google:main Nov 10, 2022
@jonjohnsonjr
Copy link
Collaborator

Thanks!

@ethan-lowman-dd
Copy link
Contributor Author

@jonjohnsonjr Is there a scheduled release coming up? If not, would it be possible to cut a minor or patch release soon to make this fix available?

@jonjohnsonjr
Copy link
Collaborator

We should, if only to do a release with #1468 picked up.

@ethan-lowman-dd
Copy link
Contributor Author

Thank you 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants