-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Git history to prune large 'old files'? #38
Comments
Thanks for creating the issue, I agree, it's easier to keep the discussion here.
Exactly! On the other hand, I did a
I think we could prune the repo down to 1 MB or so instead of 11 MB, but the question is if it's worth it, given that it requires a force push. However, should we decide to do this, then having it ready before the v0.3.0 release seems like a perfect time. |
I like the idea in principle, but I think this repository has enough followers that it may cause harm to drop the old content. In practice, the repository is quite small even at 11MiB. If you decided to do it, you could keep a copy of the old repository around at Maybe it's possible to keep the old history around in a separate git ref which doesn't get cloned by default. But in that case I guess the content would be harder to discover. |
Agreed. Had the repo been at 100 MB, then we probably would have done it, but at this size it does not seem worth the potential harm to users. (The idea with shrinking the repo was of course to make it easier for users to make the initial download, especially those who happen to be on a slow Internet connection, as may be common in parts of Asia, etc). So, for now. I'm fine with keeping it as it is, and just being careful when adding large content in the future. Closing this issue for now. We can always refer back and re-open at a later point. |
Also, if Go ever does shallow Git clone, this issue would be resolved I think. (upstream issue golang/go#13078) |
I just learned that Go did this to their repository recently, the discussion in there and how they went about it is pretty interesting: I think it probably doesn't change anything with respect to what we might do to this repository. |
Thanks for the link! It was an interesting read to see how they resolved it.
Most likely not. If we end up doing a pruning, then I'd suggest we use bfg as suggested on the GitHub link you posted. Also, if we do this, then perhaps in the next few weeks, as the intention is to have v0.3.0 released some time in early December. I'm kind of still a bit on the fence. I don't think we need the rewrite. However, should we ever do one, now is basically the perfect time to. As we move from v0.2 to v0.3, since users will have to do manual changes to get the latest release anyways (updating to the latest API, etc). |
Until we decide for sure. I'll re-open the issue. Also, this may help get input from other users of the repo who it may affect. I'll also re-name the title to include a mention of Git history rewrite. |
Some large paths:
This graph shows how much space will be saved, assuming you eliminate large file paths: |
@pwaller why matplotlib? |
The current intention is to clone llir/llvm into llir/llvm-legacy, to preserve the complete history. Then, to start clean, we will keep any fine currently in HEAD, and it's entire history at that path. Since we need to do a force push anyway, this seem to be the time to really get the size of the repo down. If anyone currently using the repo has some input or feedback, feel welcome to contribute your thoughts. |
@mewmew and I propose to run the following:
|
See https://github.com/llir/llvm-clean for the new repository. The intent is to force push the HEAD of that repository into |
https://github.com/reedkotler/scala-llc doesn't seem to contain any go code? |
Oh, the code match was from the BNF https://github.com/reedkotler/scala-llc/blob/ff3578b14171a5332e1c7f972c0c40b32f7a9e4c/ll.bnf#L187
We can remove it from the list. |
I'd like to trim the |
On the 30th of November we pruned the using BFG to reduce its initial download size. The following commands were run at the old revision d3f412d. $ du --apparent-size -sch .git
9.6M .git
9.6M total
# Kill objects at and before v0.2.1
git rev-list --objects 7a17b32c1767cfeb5287d164e92865adb98985c8 | awk '{print $1}' > killset.txt
# Kill unwanted objects - testdata, textmapper and other experimental code.
git rev-list --objects --all | git cat-file --batch-check='%(objectname) %(rest)' | egrep '(/testdata/| l/|\.tm$)' | awk '{print $1}' >> killset.txt
bfg -bi killset.txt
git repack -a && git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ du --apparent-size -sch .git
934K .git
934K total |
Update summary, 23/11/2018: This repository currently requires ~10MiB of download, which isn't ideal considering the source is only a few hundreds of kilobytes. @mewmew and I propose to shrink it to ~800kiB, to give a faster "Go install" experience for anyone using the repository.
The reason for the blowup is that there were some large test cases (including sqlite) which measure in the 10's of MiBs, and various other bits relating to parsing were also quite large. Those have now moved into other repositories in the
llir
organization, so we don't need to download those anymore if you just want to import llir.Original issue text.
I just saw @mewmew's comment in ec48d54 but thought it would be easier to have a separate issue for discussion - the commit itself is very long so if I commented on the commit the discussion would be way down at the bottom!
First, can I clarify the question - are you asking how to remove lots of old large assets from the history of the repository?
If that is the question, the answer is, yes you can do it, but anyone who cloned the repository needs to know about it otherwise they might get in a mess, since it requires rewriting history. At least, that's the best I know. See github's guidance on the issue.
The text was updated successfully, but these errors were encountered: