Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The arrow-rs repo is very large #5908

Closed
alamb opened this issue Jun 17, 2024 · 3 comments · Fixed by #5982
Closed

The arrow-rs repo is very large #5908

alamb opened this issue Jun 17, 2024 · 3 comments · Fixed by #5982
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted

Comments

@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

Describe the bug
Whenever I do git pull apache to pull arrow-rs it requires over 1GB

To Reproduce

andrewlamb@Andrews-MacBook-Pro-2:/tmp$ git clone git@github.com:apache/arrow-rs.git
Cloning into 'arrow-rs'...
remote: Enumerating objects: 1317790, done.
remote: Counting objects: 100% (140124/140124), done.
remote: Compressing objects: 100% (16954/16954), done.
remote: Total 1317790 (delta 127408), reused 135019 (delta 122748), pack-reused 1177666
Receiving objects: 100% (1317790/1317790), 1.02 GiB | 33.01 MiB/s, done.
Resolving deltas: 100% (1172574/1172574), done.

Receiving objects: 100% (1317790/1317790), 1.02 GiB | 33.01 MiB/s, done.

!!!!

Expected behavior
It has only source code and should be much smaller

Additional context
I strongly believe this is related to the https://github.com/apache/arrow-rs/actions/runs/9552252515 that pushes a preview version of the docs to https://arrow.apache.org/rust/

@alamb alamb added the bug label Jun 17, 2024
@alamb
Copy link
Contributor Author

alamb commented Jun 17, 2024

I think I can just remove the history of the asf-branch and avoid all this hisotry, I will try so

@alamb
Copy link
Contributor Author

alamb commented Jun 17, 2024

Here is what I did to fix it now:

git fetch apache
# make a new root commit
git checkout --orphan new-asf-site apache/asf-site
# commit in the current copy
git commit -m "Initial asf-site commit"
# make a new branch
git checkout -b asf-site
# force push it to apache 
git push -f apache

My reading of doing this is that each commit to arrow-rs that makes documentation results in 7MB of docs getting pushed to the asf-branch 🤯

andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ git push -f apache
Enumerating objects: 5871, done.
Counting objects: 100% (5871/5871), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3675/3675), done.
Writing objects: 100% (5871/5871), 7.83 MiB | 2.75 MiB/s, done.
Total 5871 (delta 4562), reused 3103 (delta 2065), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (4562/4562), done.
To github.com:apache/arrow-rs.git
 + 47a8dd03b8b...b6a61fb3a76 asf-site -> asf-site (forced update)
branch 'asf-site' set up to track 'apache/asf-site'.

@alamb
Copy link
Contributor Author

alamb commented Jun 17, 2024

My temporary workaround seems to have improved things:

Before that change

andrewlamb@Andrews-MacBook-Pro-2:/tmp$ du -s -h arrow-rs/
1.1G	arrow-rs/

After the change:

andrewlamb@Andrews-MacBook-Pro-2:/tmp$ du -s -h arrow-rs/
 47M	arrow-rs/

Maybe we should fix up the CI job to avoid saving any history of the old docs 🤔

@tustvold tustvold added good first issue Good for newcomers enhancement Any new improvement worthy of a entry in the changelog help wanted and removed bug labels Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants