-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crate fetching is slow #45
Comments
This is by design. There is no high latency, it's just being cordial to @Shnatsel Would you vote against a flag that would allow overriding this limit? I'm cautious and probably wouldn't add it but it might be 'fine' if we also enforced that one sets a custom user agent in these situations. (Similar to how |
We would need to clear higher request rate with the crates.io team. Right now the documentation explicitly mentions one request per second for crawlers but says nothing about other uses. I believe docs.rs makes much higher request rates. I have asked in the crates.io Discord once, but received no reply. Right now you can speed up this process by running |
@Owez could you post the output of |
My time seems normal and does feel ~2 secs delay. Without a bulk api the delay seems unfortunately borderline unusable, especially on a crate with a larger dependency count. Could automatic non- |
Such granular caching would add complexity, and I am not eager to do so. Although I fear we will have to go down that road sooner or later, at least because people will likely want to view both I see two ways we could solve this without going for a crate-granular caching system:
|
We could estimate the time for fetching them one-by-one and the time for downloading the complete dump, and adaptively choose to refresh the cache. I'm slightly concerned how this would affect script behavior that wants to rely on either of the two behaviors. |
Could an ETL job work here? If so, you can probably use GitHub Actions for this in a completely automated manner by periodically running a job that commits to a git repo like this: https://github.com/RustSec/advisory-db/blob/master/.github/workflows/publish-web.yml That could either be fetched via git (where the periodically running job commits deltas from the previous state), or served over HTTP via GitHub Pages |
I don't think we want to keep old files around, as that is just duplicate content. However, it does becomes more attractive after thinking about it for a while. In particular, we could use a Github repo or just a git branch that has its version history regularly squashed. This would mitigate the growth and give Github a chance to remove old files (otherwise we'd already be at a few gigabytes after a year). This would also allow us to use freely available CDNs for Github files. |
GitHub Pages will give you a CDN for free. I was suggesting that if you regenerated a text file (or multiple text files, ideally) you wouldn't need to leave "old files" around as the repo would be updated with the parts that changed. If you don't want/care about that, you could just force push every time, with GitHub Pages (or even just https://raw.githubusercontent.com/) serving the content over HTTP(s). |
I'm running a test run using the
crates
command and as I'm writing it's fetching crates. It seems to take quite a long time, even with a relatively low dependency crate such as cargo-supply-chain (only 79 deps listed), as seen here:It would be useful and more functional if it fetched these in parallel, especially due to the high latency with the crates.io api.
The text was updated successfully, but these errors were encountered: