Switch the backend to something other than Git-based solutions #2780

xtendo-org · 2016-11-15T15:16:37Z

I've tried but couldn't find the issue that addresses exactly this problem, so I'm creating a new one. Please let me know if there is, or any misinformation I have.

According to this comment by a GitHub engineer, it seems using GitHub (or any Git-based solution at all) as a package manager's backend causes severe damage to the tool's performance.

I have recently attempted a Haskell "boot camp" at the company I'm working at, and recommended all participants to install Stack. The most frequently raised inquiry/complaint was that it took ages to install. Some people reported "70 minutes and still not complete." It's 2016, and I think we can agree that if a programming language's tooling takes more than an hour to download and install, something's certainly wrong. As @snoyberg pointed out, the time it takes to actually use it is important, so we should consider this not a performance trouble but a blocker for anyone who ever attempts to enter Haskell.

Although less dramatic, the problem with a Git-based backend is not limited to the initial installation, but pervasive in the whole tooling. For example, suppose I'd like to choose the latest nightly as the project's resolver. A shell script like

sed -i 's/^resolver: .*/resolver: '$(curl -s https://www.stackage.org/snapshots | grep -o -m 1 "nightly-[0-9]\+-[0-9]\+-[0-9]\+")'/' stack.yaml

takes less than 1.5 seconds to run because the heaviest task here is to download one HTML file. On the other hand, Stack's built-in command for the same task, stack --resolver nightly solver --update-config, may take more than 10 seconds because it has to git-fetch a repository that contains more than ten thousand commits regarding more than nine thousand files, which is generally expensive according to the aforementioned comment.

One solution I can think of is to make the Stack command line tool switch to using an independent server (e.g. the Stackage website) as backend and avoid GitHub. If Git or GitHub is necessary for versioning or something, that's fine; we can still rely on it, just make it cached or mirrored somewhere so the command line tool won't directly depend on them.

The text was updated successfully, but these errors were encountered:

xtendo-org · 2016-11-15T15:23:09Z

The quickest example that comes to my mind is to have a Git mirror repo and provide its contents as a tar file with a web server like Nginx, renewing the repo once a day with a cronjob like git fetch origin && git rebase origin/master master. This should require little coding/engineering, I'm guessing.

heejongahn · 2016-11-15T15:39:52Z

I have also experienced this issue once or twice. When I first encountered this, I was pretty sure that something's wrong and even thought of stop using stack after a (impatient) series of Ctrl-C and re-runs. Though this isn't a everyday issue, I can assume that this will act as a high barrier for stack users, especially those who have never met this issue before.

I strongly agree that we need a fix for this, as soon as possible. Using an independent server seems like a valid solution to me, but there might be some issues I couldn't figure out. Either way, I think this should be treated as a high-priority issue.

hvr · 2016-11-15T19:05:33Z

@xtendo-org

The quickest example that comes to my mind to have a Git mirror repo and provide its contents as a tar file

That's indeed a good and proven approach. That's also how Hackage's package index works, i.e.

https://hackage.haskell.org/01-index.tar.gz

which is versioned, contains sha256 (for TUF) & md5 (useful to mirror tooling) hashes in TUF-records, and even allows for fast incremental updates (since the index is only appended to, so there's always a common prefix we can resume from). The logic for all this (and more) is implemented in hackage-security.

snoyberg · 2016-11-21T06:36:02Z

I'm not going to end up making any decisions here (I don't handle day-to-day management of Stack anymore). However, I'll throw in a few thoughts:

The surest way to ensure that no one wants to do something is to be lectured to by Herbert (read: why I'm now opposed to PVP instead of simply not following it). Herbert: we get it, you and Duncan won this battle because you control Hackage. I still think you came up with an incredibly backwards solution.
I tried using hackage-security, and couldn't make heads or tails of that library. So if someone wants to see hackage-security happen here, perhaps someone who actually understands the library will want to contribute a patch.
All that said: given that hackage-security is the new reality hoisted upon us, I actually lean towards moving to it in place of Git, even if it's technically inferior. We may as well reduce additional code paths needed to be followed in various tooling.
I don't think the issues raised here are in any way insurmountable. We can easily go back to the shallow clones that we've done in the past with the new metadata in snapshots, or simply default to not having the Hackage revision detection in place. (Side note: Hackage revisions is another example of a terrible feature.)

Herbert: please don't turn this issue into a discussion of the complaints I'm raising. I'm pointing them out here to try and encourage you to engage more respectfully on issues in the future.

23Skidoo · 2016-11-21T08:57:20Z

/cc @edsko, who is the main author of the hackage-security library.

dcoutts · 2016-11-21T10:17:55Z

I tried using hackage-security, and couldn't make heads or tails of that library

For anyone having a go, a good place to start is the example client which is quite compact (it also demos using http-client as the http impl)

https://github.com/well-typed/hackage-security/tree/master/example-client

It may also be useful to look at the use of the interface in cabal-install where it iterates over the index, getting every revision of every .cabal file. You'd probably want something like that plus converting info the cached formats that stack uses. In principle the interface supports doing index conversions incrementally, by saving a archive directory index and starting from there (though it has to validate the saved info to know doing an incremental conversion is ok).

alexanderkjeldaas · 2016-11-26T14:13:54Z

According to this comment by a GitHub engineer, it seems using GitHub (or any Git-based solution at all) as a package manager's backend causes severe damage to the tool's performance.

I don't see anything like that in the comment.

Yes, having 10000 commits might be more costly than copying the resulting file. Git has lots of ways of fixing that, such as squashing or shallow checkouts.

In recent git versions we can now write git clone --shallow-since=<date>, and give all clients the same set of objects. With a reasonable caching strategy on the server side, it should be possible to reuse the calculated bundles.

xtendo-org · 2016-11-27T05:18:05Z

@alexanderkjeldaas The comment says that shallow checkout is more costly in a long run.

... most of the initial clones are shallow, meaning that not the whole history is fetched, but just the top commit. But then subsequent fetches don't use the --depth=1 option. Ironically, this practice can be much more expensive than full fetches/clones, especially over the long term.

alexanderkjeldaas · 2016-11-27T19:12:58Z

@xtendo-org maybe I'm misunderstanding, but I thought the "subsequent fetches don't use the --depth=1 option" implied that the repo was converted into a full clone.

But to see how --depth=1 doesn't make sense for that repo, look at
https://github.com/CocoaPods/CocoaPods/releases
and then read
https://blogs.gnome.org/simos/2009/04/18/git-clones-vs-shallow-git-clones/

It seems that shallow clones, by cloning every tag at depth 1 basically picks up everything when there are 153 releases to shallow clone.

If the cloning is done by date, the results could be very different.

Git can do "anything", so if it needs to fetch less, then there is likely a way to achieve that.

snoyberg · 2016-12-04T00:57:18Z

While I still think Git is the better way overall, it seems that enough people are having connectivity issues with Github that it's worth switching the backend. I have a PR at #2827. This does not address switching to hackage-security for downloads... that codebase does still intimidate me, and I'm not sure how I feel about the partial download bit and having to switch to uncompressed streams for it.

dcoutts · 2016-12-13T13:52:02Z

having to switch to uncompressed streams for it

@snoyberg it's worth noting that the partial/incremental downloading works on the compressed stream. It's a range get on the tail of the .tar.gz file.

xtendo-org mentioned this issue Jan 5, 2017

Why was #2827 not included in v1.3.2? #2896

Closed

simnalamburt mentioned this issue Jan 13, 2017

Release the next version of chips xtendo-org/chips#27

Closed

decentral1se closed this as completed Sep 12, 2017

decentral1se added the type: enhancement label Sep 12, 2017

decentral1se added this to the P2: Should milestone Sep 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch the backend to something other than Git-based solutions #2780

Switch the backend to something other than Git-based solutions #2780

xtendo-org commented Nov 15, 2016 •

edited

Loading

xtendo-org commented Nov 15, 2016 •

edited

Loading

heejongahn commented Nov 15, 2016

hvr commented Nov 15, 2016

snoyberg commented Nov 21, 2016

23Skidoo commented Nov 21, 2016

dcoutts commented Nov 21, 2016 •

edited

Loading

alexanderkjeldaas commented Nov 26, 2016

xtendo-org commented Nov 27, 2016

alexanderkjeldaas commented Nov 27, 2016

snoyberg commented Dec 4, 2016

dcoutts commented Dec 13, 2016

Switch the backend to something other than Git-based solutions #2780

Switch the backend to something other than Git-based solutions #2780

Comments

xtendo-org commented Nov 15, 2016 • edited Loading

xtendo-org commented Nov 15, 2016 • edited Loading

heejongahn commented Nov 15, 2016

hvr commented Nov 15, 2016

snoyberg commented Nov 21, 2016

23Skidoo commented Nov 21, 2016

dcoutts commented Nov 21, 2016 • edited Loading

alexanderkjeldaas commented Nov 26, 2016

xtendo-org commented Nov 27, 2016

alexanderkjeldaas commented Nov 27, 2016

snoyberg commented Dec 4, 2016

dcoutts commented Dec 13, 2016

xtendo-org commented Nov 15, 2016 •

edited

Loading

xtendo-org commented Nov 15, 2016 •

edited

Loading

dcoutts commented Nov 21, 2016 •

edited

Loading