-
Notifications
You must be signed in to change notification settings - Fork 844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch the backend to something other than Git-based solutions #2780
Comments
The quickest example that comes to my mind is to have a Git mirror repo and provide its contents as a tar file with a web server like Nginx, renewing the repo once a day with a cronjob like |
I have also experienced this issue once or twice. When I first encountered this, I was pretty sure that something's wrong and even thought of stop using stack after a (impatient) series of Ctrl-C and re-runs. Though this isn't a everyday issue, I can assume that this will act as a high barrier for stack users, especially those who have never met this issue before. I strongly agree that we need a fix for this, as soon as possible. Using an independent server seems like a valid solution to me, but there might be some issues I couldn't figure out. Either way, I think this should be treated as a high-priority issue. |
That's indeed a good and proven approach. That's also how Hackage's package index works, i.e. which is versioned, contains sha256 (for TUF) & md5 (useful to mirror tooling) hashes in TUF-records, and even allows for fast incremental updates (since the index is only appended to, so there's always a common prefix we can resume from). The logic for all this (and more) is implemented in |
I'm not going to end up making any decisions here (I don't handle day-to-day management of Stack anymore). However, I'll throw in a few thoughts:
Herbert: please don't turn this issue into a discussion of the complaints I'm raising. I'm pointing them out here to try and encourage you to engage more respectfully on issues in the future. |
/cc @edsko, who is the main author of the |
For anyone having a go, a good place to start is the example client which is quite compact (it also demos using http-client as the http impl) https://github.com/well-typed/hackage-security/tree/master/example-client It may also be useful to look at the use of the interface in cabal-install where it iterates over the index, getting every revision of every .cabal file. You'd probably want something like that plus converting info the cached formats that stack uses. In principle the interface supports doing index conversions incrementally, by saving a archive directory index and starting from there (though it has to validate the saved info to know doing an incremental conversion is ok). |
I don't see anything like that in the comment. Yes, having 10000 commits might be more costly than copying the resulting file. Git has lots of ways of fixing that, such as squashing or shallow checkouts. In recent git versions we can now write |
@alexanderkjeldaas The comment says that shallow checkout is more costly in a long run.
|
@xtendo-org maybe I'm misunderstanding, but I thought the "subsequent fetches don't use the But to see how It seems that shallow clones, by cloning every tag at depth 1 basically picks up everything when there are 153 releases to shallow clone. If the cloning is done by date, the results could be very different. Git can do "anything", so if it needs to fetch less, then there is likely a way to achieve that. |
While I still think Git is the better way overall, it seems that enough people are having connectivity issues with Github that it's worth switching the backend. I have a PR at #2827. This does not address switching to hackage-security for downloads... that codebase does still intimidate me, and I'm not sure how I feel about the partial download bit and having to switch to uncompressed streams for it. |
@snoyberg it's worth noting that the partial/incremental downloading works on the compressed stream. It's a range get on the tail of the .tar.gz file. |
I've tried but couldn't find the issue that addresses exactly this problem, so I'm creating a new one. Please let me know if there is, or any misinformation I have.
According to this comment by a GitHub engineer, it seems using GitHub (or any Git-based solution at all) as a package manager's backend causes severe damage to the tool's performance.
I have recently attempted a Haskell "boot camp" at the company I'm working at, and recommended all participants to install Stack. The most frequently raised inquiry/complaint was that it took ages to install. Some people reported "70 minutes and still not complete." It's 2016, and I think we can agree that if a programming language's tooling takes more than an hour to download and install, something's certainly wrong. As @snoyberg pointed out, the time it takes to actually use it is important, so we should consider this not a performance trouble but a blocker for anyone who ever attempts to enter Haskell.
Although less dramatic, the problem with a Git-based backend is not limited to the initial installation, but pervasive in the whole tooling. For example, suppose I'd like to choose the latest nightly as the project's resolver. A shell script like
takes less than 1.5 seconds to run because the heaviest task here is to download one HTML file. On the other hand, Stack's built-in command for the same task,
stack --resolver nightly solver --update-config
, may take more than 10 seconds because it has to git-fetch a repository that contains more than ten thousand commits regarding more than nine thousand files, which is generally expensive according to the aforementioned comment.One solution I can think of is to make the Stack command line tool switch to using an independent server (e.g. the Stackage website) as backend and avoid GitHub. If Git or GitHub is necessary for versioning or something, that's fine; we can still rely on it, just make it cached or mirrored somewhere so the command line tool won't directly depend on them.
The text was updated successfully, but these errors were encountered: