Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout batch downloads, not each download #6285

Merged
merged 1 commit into from
Nov 9, 2018

Conversation

alexcrichton
Copy link
Member

This commit switches the timeout logic implemented in #6130 to timeout
an entire batch of downloads instead of each download individually.
Previously if any pending download didn't receive data in 30s we would
time out, or if any pending download didn't receive 10 bytes in 30s we
would time out. On very slow network connections this is highly likely
to happen as a trickle of incoming bytes may not be spread equally
amongst all connections, and not all connections may actually be active
at any one point in time.

The fix is to instead apply timeout logic for an entire batch of
downloads. Only if zero total data isn't received in the timeout window
do we time out. Or in other words, if any data for any download is
receive we consider it as not being timed out. Similarly any progress on
any download counts as progress towards our speed limit.

Closes #6284

@alexcrichton
Copy link
Member Author

@ehuss this actually means that the complexity of #6130 I think is required because curl only does per-connection timeout and it's a very good point that we basically don't want that any more with parallel downloads

@alexcrichton
Copy link
Member Author

r? @ehuss

@dwijnand
Copy link
Member

dwijnand commented Nov 8, 2018

Look at that turnaround time, from issue to PR 😀

src/cargo/core/package.rs Outdated Show resolved Hide resolved
This commit switches the timeout logic implemented in rust-lang#6130 to timeout
an entire batch of downloads instead of each download individually.
Previously if *any* pending download didn't receive data in 30s we would
time out, or if *any* pending download didn't receive 10 bytes in 30s we
would time out. On very slow network connections this is highly likely
to happen as a trickle of incoming bytes may not be spread equally
amongst all connections, and not all connections may actually be active
at any one point in time.

The fix is to instead apply timeout logic for an entire batch of
downloads. Only if zero total data isn't received in the timeout window
do we time out. Or in other words, if any data for any download is
receive we consider it as not being timed out. Similarly any progress on
any download counts as progress towards our speed limit.

Closes rust-lang#6284
@alexcrichton
Copy link
Member Author

Updated! Ideally with working tests this time too

@ehuss
Copy link
Contributor

ehuss commented Nov 9, 2018

because curl only does per-connection timeout and it's a very good point that we basically don't want that any more with parallel downloads

I'm not sure I understand this statement, per-connection timeouts would be a good thing, right? The problem wasn't per-connection timeouts, but that #6130 implemented per-package timeouts? Packages could be enqueued into libcurl, but not necessarily started right away, and the timeout code didn't know when they actually started (and if it took more than 30 seconds to get started, all of them would time out at once).

IIUC, with this change, if one connection hangs, it remains hung until all other transfers finish, and then times out, and then retries. That may not be catastrophic, and is hopefully rare, but seems a little odd.

Am I understanding this correctly?

Unfortunately I can't think of any workarounds — it seems like libcurl doesn't given enough internal visibility for cargo to know what's happening. The only other option I can think of is to fix the original problem of extracting blocking the main thread.

@alexcrichton
Copy link
Member Author

Er sorry, the term "per connection" there wasn't quite right, it's "per Easy" which we currently map to "per package". So given the way we're using libcurl right now there's no easy way (afaik) to get a per-connection timeout, but that's what this logic is basically switching to which is timing out everything.

Otherwise yeah you're right, if you have two connections and one hangs you don't realize until the other one is finished entirely. I don't think that'll happen much in practice as it's basically crates.io or static.crates.io, which are likely both soon to be cloudfront as the same fronting service.

@alexcrichton
Copy link
Member Author

Oh also, even with extracting not blocking the main thread, we'll still want this logic. All the timeout logic before this series of parallel downloads were always applied to just one Easy which corresponds to just one package. With parallel downloads we have a lot of Easy handles active at the same time (a bunch of packages all at once). Internally curl does all the enqueueing and scheduling of what to download when.

If we timed out per-Easy though then if 1000 packages started and the first 999 didn't finish downloading in 30s then the last package would time out immediately for having no network activity. All in all I think we'll unconditionally want this logic of "time out the entire transfer" as I think it's more in line with the purpose of the timeout, which is to make sure that Cargo doesn't hang when nothing is happening on the network for too long.

@ehuss
Copy link
Contributor

ehuss commented Nov 9, 2018

Oh also, even with extracting not blocking the main thread, we'll still want this logic.

I meant rolling back #6130 and doing non-blocking extraction instead, and relying on libcurl to do the right thing with its own internal timeout handling. Custom timeout handling wouldn't be necessary if the wait function was fast, correct?

@ehuss
Copy link
Contributor

ehuss commented Nov 9, 2018

@bors r+

Do you want to backport this to beta?

@bors
Copy link
Contributor

bors commented Nov 9, 2018

📌 Commit 4e1e3f7 has been approved by ehuss

@alexcrichton
Copy link
Member Author

With my current understanding of libcurl (which is most certainly not perfect by any measure) I believe that if we rolled back #6130 and then did non-blocking extraction we would still find ourselves requiring this PR (and reapplying #6130 before it). I don't think libcurl has a native way of doing the timeout logic that we desire.

To be clear as well though, I think the absolutely perfect timeout logic would be to have two limits: a no activity duration and a speed requirement duration (if it goes slower we kill it). Those two limits would be applied per TCP connection that libcurl makes. This PR is a bit coarser and it applies those limits for all active TCP connections instead of each individual one.

@bors
Copy link
Contributor

bors commented Nov 9, 2018

⌛ Testing commit 4e1e3f7 with merge 5f448d5...

bors added a commit that referenced this pull request Nov 9, 2018
Timeout batch downloads, not each download

This commit switches the timeout logic implemented in #6130 to timeout
an entire batch of downloads instead of each download individually.
Previously if *any* pending download didn't receive data in 30s we would
time out, or if *any* pending download didn't receive 10 bytes in 30s we
would time out. On very slow network connections this is highly likely
to happen as a trickle of incoming bytes may not be spread equally
amongst all connections, and not all connections may actually be active
at any one point in time.

The fix is to instead apply timeout logic for an entire batch of
downloads. Only if zero total data isn't received in the timeout window
do we time out. Or in other words, if any data for any download is
receive we consider it as not being timed out. Similarly any progress on
any download counts as progress towards our speed limit.

Closes #6284
@bors
Copy link
Contributor

bors commented Nov 9, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: ehuss
Pushing 5f448d5 to master...

@bors bors merged commit 4e1e3f7 into rust-lang:master Nov 9, 2018
@alexcrichton alexcrichton deleted the more-timeouts branch November 9, 2018 15:33
bors added a commit that referenced this pull request Nov 10, 2018
[beta] Timeout batch downloads, not each download

This is a beta backport of #6285
@ehuss ehuss modified the milestones: 1.32.0, 1.31.0 Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frequent network timeouts on slow connection
4 participants