Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP 206 response when downloading the standback 0.2.10 dependency causes cargo to abort #8647

Closed
Byron opened this issue Aug 25, 2020 · 32 comments
Assignees
Labels
C-bug Category: bug

Comments

@Byron
Copy link
Member

Byron commented Aug 25, 2020

Problem

After running cargo update, a new version of standback (0.2.10) got introduced. When cargo tries to fetch it, it receives an unexpected HTTP status.

➜  gitoxide git:(main) ✗ cargo check
error: failed to download from `https://crates.io/api/v1/crates/standback/0.2.10/download`

Caused by:
  failed to get 200 response from `https://static.crates.io/crates/standback/standback-0.2.10.crate`, got 206

The issue would happen repeatedly and I was unable to run any cargo command I was interested in.

The download should succeed.

Steps

cargo new dl-test
cd dl-test
cargo add standback
cargo check

Possible Solution(s)

Luckily I download each and every crate using criner, so I could fix it by copying the downloaded crate into the cargo cache:

cp ~/dev/the-lean-crate/criner/criner.db/assets/st/an/standback/0.2.10-download:1.0.0.crate  cache/gh.neting.cc-1ecc6299db9ec823/standback-0.2.10.crate

This prevented cargo from trying to download anything.

Notes

Output of cargo version:

cargo 1.45.1 (f242df6ed 2020-07-22)

running on MacOS 11 Beta 7 (20A5354i).

This is the very first time I see this issue. The issue does not occour on CI.

@Byron Byron added the C-bug Category: bug label Aug 25, 2020
@ehuss
Copy link
Contributor

ehuss commented Aug 25, 2020

Can you try running with the environment variables CARGO_HTTP_DEBUG=true and CARGO_LOG=cargo::ops::registry=debug (or maybe trace instead of debug)? That should print some debug information that might illuminate why that is happening. I'm pretty sure there is no resume logic in Cargo's downloads, so I'm a bit surprised that is happening.

@ehuss
Copy link
Contributor

ehuss commented Aug 25, 2020

Oh, I just noticed that you mentioned you are on macOS 11. Can you post the output of curl --version, too?

@Byron
Copy link
Member Author

Byron commented Aug 25, 2020

After deleting my manually placed crate download file and running cargo check on my test repository, it worked flawlessly on the first try. Thus I would consider this issue not reproducible anymore.

The next time this happens, I will provide a new issue along with the debug output.
Thanks for your swift help!

For completeness, here is the output of curl --version:

curl 7.64.1 (x86_64-apple-darwin20.0) libcurl/7.64.1 (SecureTransport) LibreSSL/2.8.3 zlib/1.2.11 nghttp2/1.41.0
Release-Date: 2019-03-27
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS GSS-API HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz MultiSSL NTLM NTLM_WB SPNEGO SSL UnixSockets

@Byron Byron closed this as completed Aug 25, 2020
@itsfarseen
Copy link

I'm having this issue while trying to install suckit(https://github.com/skallwar/suckit).

~)                                                                                                             0
$ cargo install suckit
    Updating crates.io index
  Downloaded suckit v0.1.0
  Downloaded 1 crate (269.3 KB) in 7.97s
  Installing suckit v0.1.0
  Downloaded thin-slice v0.1.1
  Downloaded phf_generator v0.8.0
  Downloaded structopt v0.3.17
  Downloaded time-macros v0.1.0
  Downloaded publicsuffix v1.5.4
  Downloaded rand_pcg v0.2.1
  Downloaded new_debug_unreachable v1.0.4
  Downloaded time-macros-impl v0.1.1
  Downloaded time v0.2.16
  Downloaded dtoa-short v0.3.2
  Downloaded cssparser v0.27.2
  Downloaded cookie v0.14.2
  Downloaded hyper-tls v0.4.3
  Downloaded difference v2.0.0
  Downloaded phf_macros v0.8.0
  Downloaded reqwest v0.10.8
  Downloaded servo_arc v0.1.1
  Downloaded structopt-derive v0.4.10
  Downloaded cssparser-macros v0.6.0
  Downloaded cookie_store v0.12.0
  Downloaded ipnet v2.3.0
  Downloaded html5ever v0.25.1
  Downloaded time v0.1.44
error: failed to compile `suckit v0.1.0`, intermediate artifacts can be found at `/tmp/cargo-installpCdPgY`

Caused by:
  failed to download from `https://crates.io/api/v1/crates/standback/0.2.10/download`

Caused by:
  failed to get 200 response from `https://static.crates.io/crates/standback/standback-0.2.10.crate`, got 206

rustc_info.json.txt

Cargo version: cargo 1.47.0-nightly (ab32ee8 2020-08-10)

$ curl --version
curl 7.71.1 (x86_64-pc-linux-gnu) libcurl/7.71.1 OpenSSL/1.1.1g zlib/1.2.11 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh2/1.9.0 nghttp2/1.41.0
Release-Date: 2020-07-01
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

OS: Arch Linux

@itsfarseen
Copy link

itsfarseen commented Aug 27, 2020

Output of CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=trace cargo install suckit 2>debug.log
https://gist.github.com/happycoder97/61095b14883d0ce106aef66686050113

@itsfarseen
Copy link

As a workaround, taking hints from what @Byron did, I did:

cd ~/.cargo/registry/cache/gh.neting.cc-1ecc6299db9ec823
wget https://static.crates.io/crates/standback/standback-0.2.10.crate

@ehuss ehuss reopened this Aug 27, 2020
@pietroalbini
Copy link
Member

Seems like bad data was cached on the CloudFront edge node close to you. We invalidated the caches, and the problem seems to have disappeared. If it happens again please comment here, and we'll contact AWS support.

@ehuss
Copy link
Contributor

ehuss commented Aug 27, 2020

I spent some time investigating, and was able to reproduce with command-line curl (thanks @happycoder97 for the logs!). We cleared the Cloudfront cache, and it seems to have resolved the cached entries that were exhibiting the problem. I have no idea why Cloudfront was responding that way. Cargo doesn't issue range requests, and AFAIK servers are not supposed to issue a range response without a range header.

It seems like an issue with Cloudfront to me. From the repro I had, running curl --resolve static.crates.io:443:13.227.234.3 https://static.crates.io/crates/standback/standback-0.2.10.crate -O --verbose resulted in:

HTTP log
* Added static.crates.io:443:13.227.234.3 to DNS cache
* Hostname static.crates.io was found in DNS cache
*   Trying 13.227.234.3...
* TCP_NODELAY set
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to static.crates.io (13.227.234.3) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [230 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [100 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [4830 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=crates.io
*  start date: Nov 19 00:00:00 2019 GMT
*  expire date: Dec 19 12:00:00 2020 GMT
*  subjectAltName: host "static.crates.io" matched cert's "static.crates.io"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f84e280f600)
> GET /crates/standback/standback-0.2.10.crate HTTP/2
> Host: static.crates.io
> User-Agent: curl/7.64.1
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0< HTTP/2 206
< content-type: application/x-tar
< content-length: 24154
< date: Mon, 24 Aug 2020 00:32:06 GMT
< last-modified: Sun, 23 Aug 2020 06:18:57 GMT
< etag: "e6faef30928fca76beffc38e502b20a3"
< cache-control: public,max-age=31536000,immutable
< x-amz-version-id: 1Vbb_a_I6D6myOepAVl3p6wOgnbJjXcK
< accept-ranges: bytes
< content-range: bytes 0-24153/24154
< server: AmazonS3
< x-cache: Hit from cloudfront
< via: 1.1 bef39d77a2b440566326f82fc0af7d28.cloudfront.net (CloudFront)
< x-amz-cf-pop: BOM51-C1
< x-amz-cf-id: RfV0GSUEVO1t7As16qt9xfpnmIoJD0zpxNqiI_uKZIlFUTBIs9pq2g==
< age: 315991
<
{ [15924 bytes data]
100 24154  100 24154    0     0  20697      0  0:00:01  0:00:01 --:--:-- 20679
* Connection #0 to host static.crates.io left intact
* Closing connection 0

which clearly shows a 206 response, but the client did not include the Range header.

I tried tricking cloudfront into caching a range response for other crates, but I wasn't able to force it to happen.

It may be possible to update cargo to treat a 206 response with a full range as success, so that is an option if this appears to be a more widespread problem.

Just leaving some notes for myself:

@Byron
Copy link
Member Author

Byron commented Sep 3, 2020

@ehuss It is happening again when updating nu-shell.

Updating nu
    Updating crates.io index
  Installing nu v0.19.0
  Downloaded nu-test-support v0.19.0
  Downloaded nu_plugin_match v0.19.0
  Downloaded nu-value-ext v0.19.0
error: failed to compile `nu v0.19.0`, intermediate artifacts can be found at `/var/folders/03/pf3rx1yx0sn7lnwpcb_nq6c80000gn/T/cargo-installgfbKMc`

Caused by:
  failed to download from `https://crates.io/api/v1/crates/nu_plugin_ps/0.19.0/download`

Caused by:
  failed to get 200 response from `https://static.crates.io/crates/nu_plugin_ps/nu_plugin_ps-0.19.0.crate`, got 206


Updated 0 packages.

@ehuss ehuss reopened this Sep 3, 2020
@alexcrichton
Copy link
Member

If it's still an issue for you @Byron, can you gist the output of these debug logs?

@Byron
Copy link
Member Author

Byron commented Sep 3, 2020

I was luckier this time: here is the output of stderr of CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=trace cargo install-update -a
cargo-install.log

@ehuss
Copy link
Contributor

ehuss commented Sep 3, 2020

Thanks for the info @Byron. I'm chatting with @pietroalbini now about contacting Amazon support about this. For reference, the following command is returning a 206 response when I don't think it should:

curl --resolve static.crates.io:443:13.227.234.3 https://static.crates.io/crates/nu_plugin_ps/nu_plugin_ps-0.19.0.crate -O --verbose

@pietroalbini
Copy link
Member

I contacted AWS support for this, we should get a reply soon.

@pietroalbini pietroalbini self-assigned this Sep 4, 2020
@pietroalbini
Copy link
Member

AWS confirmed this is an issue specific to the Mumbai PoP, and they raised an internal ticket to get it fixed.

@tarsir
Copy link

tarsir commented Sep 8, 2020

I ran into this with the inventory crate and, going off of what @pietroalbini said regarding it being a regional thing, I tried connecting to a US VPN and cargo build made it past the error, in case someone's looking for a workaround in the meantime.

@KarthikeyanRanasthala
Copy link

@pietroalbini Any update on this from AWS yet? Still facing this issue.

@itsfarseen
Copy link

itsfarseen commented Sep 8, 2020 via email

@pietroalbini
Copy link
Member

Sorry for the delay responding to this!

I left another reply on the support ticket, I'll escalate if I don't get a reply soon.

@pietroalbini
Copy link
Member

Hello everyone!

As an update, we have confirmed with AWS that this was a known problem on their end, and they're rolling back the change that caused this. In the meantime, we disabled all PoPs outside of the US/EU/Canada to avoid you hitting the Mumbai one. We understand you might get worse latencies due to this, but it's the most "granular" setting available to us.

We're now working with AWS to disable just the Mumbai PoP on our account and re-enable the rest of them, so hopefully your latency should get close to normal again in the near future.

@pietroalbini
Copy link
Member

@tarsir @KarthikeyanRanasthala @happycoder97 did any of you save the debug log of the cargo failure by any chance?

You won't be able to reproduce the error anymore, as we disabled that PoP, but AWS Support might benefit from the Request ID in it if you gathered it previously.

@pietroalbini
Copy link
Member

Got a reply from AWS support that they disabled the Mumbai POP from our account, so I re-enabled all the other POPs outside of EU/US/Canada. If this happens again please comment on the issue again.

Thanks to everyone who reported this, and sorry for the trouble!

@Cogitri
Copy link

Cogitri commented Sep 10, 2020

Unfortunately it seems like this happens again, this time on our CI machines that are located in the USA (IIRC) @pietroalbini .

Here's the debug log of cargo: https://gist.github.com/Cogitri/a4dc2a38aa7f4b2ec0a828e2acce7e84

@ehuss
Copy link
Contributor

ehuss commented Sep 10, 2020

@Cogitri Thanks for the info! Does the debug log have the IP address for static.crates.io? It should look something like Connected to static.crates.io (13.227.234.3).

From what I can see from the logs, it is the EWR52-C3 POP which is Newark, but I can't find the IP range for that POP.

HTTP/2 206 
content-type: application/x-tar
content-length: 36673
date: Thu, 10 Sep 2020 02:32:40 GMT
last-modified: Tue, 25 Aug 2020 20:19:08 GMT
etag: "e31f5623b36e4652b84f44b5557eadf0"
cache-control: public,max-age=31536000,immutable
x-amz-version-id: LH6JH634yqfVML5rxj4b23xkpaJBzaPy
accept-ranges: bytes
content-range: bytes 0-36672/36673
server: AmazonS3
x-cache: Hit from cloudfront
via: 1.1 46b177c3d09733bd56d354391fb7400d.cloudfront.net (CloudFront)
x-amz-cf-pop: EWR52-C3
x-amz-cf-id: OkD5s4x4C5uawVLMWlw7WB_sm0TytfQH5L71Y9eITCKGm-IszTAY2A==
age: 69923

@ehuss ehuss reopened this Sep 10, 2020
@pietroalbini
Copy link
Member

Reopened the support ticket with AWS and asked to bump its severity. For anyone who encounter this, please gather the debug logs with the CARGO_HTTP_DEBUG=true and CARGO_LOG=cargo::ops::registry=debug environment variables and post them here, along with a "this is happening to me too". Thanks, and sorry for the trouble!

@pietroalbini
Copy link
Member

Tracking this on the status page too: https://status.crates.io/incidents/hbrpxrpbsdc8

@Cogitri
Copy link

Cogitri commented Sep 11, 2020

Here's the entire log of running CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug cargo build - sorry for not including it earlier, Github gists didn't like such a big file apparently and I didn't think of attaching the file here :)
Hopefully that contains everything that's required.

raw.txt

Also, it seems like some of our CI runners are located in the EU as well and the same errors can be seen on those as well

@pietroalbini
Copy link
Member

@Cogitri logs from Europe would be beneficial as well -- this is only affecting some POPs from our understanding, and the Request IDs we get from the log are helping AWS identify which ones.

@Cogitri
Copy link

Cogitri commented Sep 11, 2020

Oh, seems like the build actually went through on our x86 CI runner this time (which is the only one located in Europe, all others are in the US), let's see if it fails on consecutive runs

@pietroalbini
Copy link
Member

As an update, we saw no more invalid responses in our CDN logs for the past 12 hours. We're continuing to monitor the situation, and AWS is rolling out a permanent fix.

@pietroalbini
Copy link
Member

AWS is continuing to roll out a permanent fix, but our logs didn't show any invalid response since the last update. Closing the issue.

@pietroalbini
Copy link
Member

We got confirmation from AWS that the fix finished rolling out!

@Cogitri
Copy link

Cogitri commented Sep 16, 2020

Great, seems like it works for me on my CI runners now, thanks for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug
Projects
None yet
Development

No branches or pull requests

8 participants