Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulky Repo #1

Closed
gavindsouza opened this issue Aug 30, 2021 · 8 comments
Closed

Bulky Repo #1

gavindsouza opened this issue Aug 30, 2021 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@gavindsouza
Copy link
Contributor

gavindsouza commented Aug 30, 2021

This repository contains all of ERPNext's histories, and everything apart from the healthcare module was deleted in one commit.

I feel we should instead only keep history (or commits) pertaining to the healthcare module. This would bring down the size of the repo considerably for starters. That alone will reduce the overhead to develop and maintain this app. Debugging across histories will be easier since we won't be dealing with so much baggage (909MiB at this point alone) 😅.

cc: @ChillarAnand

@gavindsouza gavindsouza added the enhancement New feature or request label Aug 30, 2021
@ChillarAnand ChillarAnand self-assigned this Aug 30, 2021
@ChillarAnand
Copy link
Contributor

ChillarAnand commented Sep 2, 2021

History Commits Size
No rewrite 38k ~1.7G
Non-healthcare rewrite 10K ~0.3G
Complete rewrite 7k ~30M
Fresh repo - ~3M

Since we moved healthcare as a separate app, the directory structure has changed. Due to this, while completely re-writing the history, git is pruning commits related to healthcare also. This might cause issues in future when we have to run git blame/bisect etc.

If we re-write history for all modules excluding healthcare, then repo size is coming to ~300MB.

cc: @ankush, @hasnain2808

@ankush
Copy link
Collaborator

ankush commented Sep 2, 2021

I think just by getting rid of translation and docs you'll save a HUGE amount of space.

Check this: (large objects in ERPNext repo history): https://gist.github.com/ankush/ac3c49401ee240ea43780acf41fcbf00

90%+ of files >100 KB are just translations or doc images!

@ankush
Copy link
Collaborator

ankush commented Sep 2, 2021

have you tried this: https://github.com/rtyley/bfg-repo-cleaner ?

@ChillarAnand
Copy link
Contributor

Thanks, @ankush. Except yarn.lock file, all other files >100KB are not required.

Ran $ bfg --strip-blobs-bigger-than 100K and the repo size is down to ~140MB.

@gavindsouza
Copy link
Contributor Author

gavindsouza commented Sep 2, 2021

Since we moved healthcare as a separate app, the directory structure has changed. Due to this, while completely re-writing the history, git is pruning commits related to healthcare also. This might cause issues in future when we have to run git blame/bisect etc.

could we get rid of everything except healthcare from ERPNext and then change the paths, and cherry-pick all later commits after, or is it too late/pointless for that now?

ChillarAnand pushed a commit that referenced this issue Sep 2, 2021
@ChillarAnand
Copy link
Contributor

@gavindsouza cherry picking the later commits will take additional time. Since some additional healthcare prs are merged & code cleanup is done in erpnext, it makes sense to cherry pick the later commits and apply them.

With the above approach, repo size is reduced to ~5MB & history is preserved without any mangling.

Build is passing now & basic testing is done. Need to do one more round of testing to make sure nothing is broken.

@gavindsouza
Copy link
Contributor Author

Just cloned the repo again, inital clone size has come down to 3.3MiB 😄

@ChillarAnand
Copy link
Contributor

Closing this as the repo size has reduced considerably.

@ruchamahabal, @gavindsouza, @ankush Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants