Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"error: component download failed" #2169

Closed
SimonSapin opened this issue Dec 21, 2019 · 8 comments
Closed

"error: component download failed" #2169

SimonSapin opened this issue Dec 21, 2019 · 8 comments
Assignees
Milestone

Comments

@SimonSapin
Copy link
Contributor

Describe the problem you are trying to solve

In every build job on Servo’s CI, rustup starts by downloading the Nightly pinned in the repository’s rust-toolchain file. Sometimes this fails:

https://community-tc.services.mozilla.com/tasks/T10RQXfSRCCODoDHP5bkAA/runs/0/logs/https%3A%2F%2Fcommunity-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FT10RQXfSRCCODoDHP5bkAA%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Flive.log#L395

+ rustup component add rustc-dev
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
info: latest update on 2019-12-21, rust version 1.42.0-nightly (01a46509a 2019-12-20)
info: downloading component 'cargo'
info: downloading component 'rust-std'
info: downloading component 'rustc'
info: installing component 'cargo'
info: installing component 'rust-std'
 17.7 MiB /  17.7 MiB (100 %)  12.4 MiB/s in  1s ETA:  0s
info: installing component 'rustc'
 58.0 MiB /  58.0 MiB (100 %)   9.9 MiB/s in  5s ETA:  0s
info: downloading component 'rustc-dev'
error: component download failed for rustc-dev-x86_64-unknown-linux-gnu
error: caused by: could not download file from 'https://static.rust-lang.org/dist/2019-12-21/rustc-dev-nightly-x86_64-unknown-linux-gnu.tar.xz' to '/root/.rustup/downloads/5f961624bb639e235648b23601aa71e5237335cd2a0d75031c5d04626911bafa.partial'
error: caused by: failed to make network request
error: caused by: https://static.rust-lang.org/dist/2019-12-21/rustc-dev-nightly-x86_64-unknown-linux-gnu.tar.xz: connection error: Connection reset by peer (os error 104)
error: caused by: Connection reset by peer (os error 104)
error: backtrace:
error: stack backtrace:
   0: error_chain::backtrace::imp::InternalBacktrace::new::h61cc8d59908b943e (0x55e294da8f3f)
   1: download::reqwest_be::download::h264042167b323b11 (0x55e294c4d38e)
   2: rustup::utils::utils::download_file_with_resume::h25c8b384bf6144e9 (0x55e294bf1d2e)
   3: rustup::dist::manifestation::Manifestation::update::ha584df3fa09d6def (0x55e294bd8011)
   4: rustup::toolchain::Toolchain::add_component::h8b8992361331488b (0x55e294c294ac)
   5: rustup_init::rustup_mode::main::h6d7ce19928d9f88a (0x55e294ac8cc7)
   6: rustup_init::run_rustup_inner::hc570d3222a327e8f (0x55e294afb347)
   7: rustup_init::main::hc72017b61fdb958c (0x55e294afa7f5)
   8: std::rt::lang_start::{{closure}}::h1f7d8a651250686e (0x55e294a952f3)
   9: main (0x55e294aff83e)
  10: __libc_start_main (0x7f8946f2ab97)
  11: <unknown> (0x55e294a91029)

Describe the solution you'd like

I was going to suggest making rustup automatically retry downloads a few times, but GitHub’s "might be related" issues suggestions lead me to #1667, then #2121, then https://github.com/rust-lang/rustup/blob/master/CONTRIBUTING.md#rustup_max_retries

Does the latter mean that the download already failed four times before a visible error: component download failed is emitted?

Is there a time delay between these retries? What would you think of adding one, perhaps with exponential back-off?

@kinnison
Copy link
Contributor

If it was retrying it would indicate such. As such I can only conclude that a 'connection reset by peer' error is somehow not being communicated to rustup as a retryable error.

The download loop is here: https://github.com/rust-lang/rustup/blob/master/src/dist/manifestation.rs#L177 it currently doesn't insert any delays on the basis that most failure modes warrant an immediate retry (or likely won't be resolved by retrying anyway -- such as a loss of connectivity). My guess is that the ErrorKind is something which isn't caught in that match. If you can induce that failure mode it'd be worth trying to set it up and in the default branch of the match, output the error so you can begin to find a fix.

@SimonSapin
Copy link
Contributor Author

Unfortunately I don’t know how to reliably reproduce this error. However I see that the component download failed message comes from ComponentDownloadFailed, which we could add in that match.

@SimonSapin
Copy link
Contributor Author

Oh never mind, ComponentDownloadFailed is only ever created (added to an error chain) as a result of that match. According to the output the next error in the chain is could not download file which comes from either DownloadingFile or DownloadNotExists. The latter is only created in src/utils/utils.rs for some HTTP error codes, but in this case it looks like we don’t get to HTTP at all so we likely have DownloadingFile here. But that one is matched and mapped to OperationResult::Retry. So maybe it did fail four times after all?

@kinnison
Copy link
Contributor

The RetryingDownload message should be obvious (it's at info: level) and its absence suggests that it absolutely did not attempt to retry.

@kinnison
Copy link
Contributor

Okay, so after some more research, I think this ends up as DownloadNotExists because otherwise it ought to be retrying. The error trace suggests the originating issue is a connection reset by peer and the failed to make network request bit means the error chained out before being converted to an http code. As such, the retry logic won't catch it. I need to look harder at what happens to those error messages.

@kinnison kinnison self-assigned this Jan 10, 2020
@kinnison kinnison added this to the 1.22.0 milestone Jan 10, 2020
@kinnison
Copy link
Contributor

I've been back over this and cannot see why it would not have been retrying unless the rustup version you had was too old. Perhaps add a 'rustup --version` to your CI script to be sure it's at least 1.21 since that's where retry logic was released.

@SimonSapin
Copy link
Contributor Author

at least 1.21 since that's where retry logic was released.

Ah! Good to know.

The log in the issue description is for CI task https://community-tc.services.mozilla.com/tasks/T10RQXfSRCCODoDHP5bkAA/runs/0 which runs in a Docker image build by task https://community-tc.services.mozilla.com/tasks/H0g0jHrZRGmozGe41fT2zg which ran on 2019-12-10, installing the then-current rustup with curl https://sh.rustup.rs | sh. Based on https://github.com/rust-lang/rustup/blob/master/CHANGELOG.md that was likely 1.20.2.

We’ve since upgraded to 1.21, and as far as I know haven’t seen this kind of download error again. I’ll close this as fixed in 1.21 now, and reopen if we ever observe this again.

Thanks for your help!

@kinnison
Copy link
Contributor

Thanks Simon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants