Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace 'show-ref' command with 'branch' and 'tag' #14201

Closed
wants to merge 2 commits into from
Closed

Replace 'show-ref' command with 'branch' and 'tag' #14201

wants to merge 2 commits into from

Conversation

skyline75489
Copy link
Contributor

Supports #14180.

I'm no expert at git but I found that git show-ref is much slower than git branch and git tag. With this PR, the loading time of the PyTorch repo homepage drops from ~20s to 5s.

I also added the reverse logic for GetTags which somehow left out in #13673. The reverse logic exists in the gogit variant.

@codecov-io
Copy link

Codecov Report

Merging #14201 (1389ab4) into master (632800e) will increase coverage by 0.02%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #14201      +/-   ##
==========================================
+ Coverage   42.00%   42.02%   +0.02%     
==========================================
  Files         733      733              
  Lines       78713    78736      +23     
==========================================
+ Hits        33061    33092      +31     
+ Misses      40218    40215       -3     
+ Partials     5434     5429       -5     
Impacted Files Coverage Δ
modules/git/repo_tag_nogogit.go 78.78% <76.66%> (-21.22%) ⬇️
modules/git/repo_branch_nogogit.go 78.78% <100.00%> (+0.73%) ⬆️
routers/user/setting/profile.go 38.88% <0.00%> (-0.44%) ⬇️
services/pull/pull.go 42.85% <0.00%> (+0.50%) ⬆️
modules/queue/unique_queue_disk_channel.go 55.38% <0.00%> (+1.53%) ⬆️
modules/log/file.go 75.20% <0.00%> (+1.60%) ⬆️
modules/process/manager.go 75.00% <0.00%> (+2.50%) ⬆️
models/unit.go 49.31% <0.00%> (+2.73%) ⬆️
modules/git/utils.go 80.55% <0.00%> (+2.77%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 632800e...1389ab4. Read the comment docs.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Dec 31, 2020
return branchNames, nil
}
if err != nil {
// This shouldn't happen... but we'll tolerate it for the sake of peace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the old comment is wrong, right? The position is where the return value is. So i moved it here which makes more sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we should just delete this comment. The old position just does not seem right to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should still be the original position. It means err == io.EOF in fact is not an err. At a normal situation, reader shouldn't return that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lunny has the correct interpretation. Please revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I misunderstanding this? The err == io.EOF seems to be the only way to break out the loop normally, but "at a normal situation, reader shouldn't return that"? How should we break out the loop, though

@6543
Copy link
Member

6543 commented Dec 31, 2020

@skyline75489 can you add unit tests?

Benchmark tests would be nice too but I'm ok without them :)

@6543 6543 added type/refactoring Existing code has been cleaned up. There should be no new functionality. performance/bigrepo Performance Issues affecting Big Repositories labels Dec 31, 2020
@6543 6543 added this to the 1.14.0 milestone Dec 31, 2020
@skyline75489
Copy link
Contributor Author

I've not yet tested this enough. On my mac the performance is not stable. I think it's because some kind of disk caching by macOS. Overall show-ref is still slower since it fetches all the hashes that end up being unused. i will test this on Windows later.

My main concern is that i don't really know if the change is completely harmless. Is there any situation that show-ref handles but branch/tag does not.

@skyline75489
Copy link
Contributor Author

So this is what I got from my Windows machine with SSD:

Operation Duration
show-ref --heads 4.06s
show-ref --tag 4.05s
branch --list 0.27s
tag --list 0.05s

This is on the same machine but with a 5400 rpm hard disk:

Operation Duration
show-ref --heads 5.47s
show-ref --tag 5.53s
branch --list 0.54s
tag --list 0.04s

The performance seem to be CPU-bounded rather than IO-bounded. The ~20s results was on my old MBP2015.

@6543
Copy link
Member

6543 commented Dec 31, 2020

... that i don't really know if the change is completely harmless ...

@zeripath ping ;)

@skyline75489
Copy link
Contributor Author

FYI, the gogit variant performs basically the same as ‘show-ref’. We are not using gogit by default, though.

@skyline75489
Copy link
Contributor Author

skyline75489 commented Jan 1, 2021

@6543 So i was trying this with a smaller repo (200 branches) and found that the perf diff is not very obvious. I guess the perf diff will only emerge with a very large amount of branches/objects. I don’t I can easily add a benchmark test until I can find a way to mock a very large repo.

By the way the PyTorch repo really reveals some internal issues in gitea. Using large repos like PyTorch & TensorFlow is a true test for gitea to see if it’s scalable enough.

@lunny
Copy link
Member

lunny commented Jan 1, 2021

How about to keep the function callShowRef but add the two new functions? So that reviews will be easier.

@skyline75489
Copy link
Contributor Author

@lunny That’s was my initial implementation. Then I found that callShowRef Is not being used at all. So I deleted it. You could always use ‘Split’ mode on GitHub (click the gear button than you can choose it) which will make reviewing this diff easier. If that still bothers you, let me know and I’ll add callShowRef back.

@zeripath
Copy link
Contributor

zeripath commented Jan 1, 2021

If we are talking about performance you must tell us what version of git you are using.

Also I wonder if you have a poorly or unpacked repo. That could explain the huge difference.

@zeripath
Copy link
Contributor

zeripath commented Jan 1, 2021

git branch and git show-ref do different things. I suspect that you'll be missing other branches.

@skyline75489
Copy link
Contributor Author

On windows I’m using git 2.29.2. On macOS I’m using the git provided by Apple.

I’m not sure what you mean by ‘poorly’ or ‘unpacked’. Here’s how I get the PyTorch repo:

  1. First git clone —mirror https://github.com/pytorch/pytorch.git
  2. Create the repo in my self hosted gitea instance
  3. Set the remote git remote set-url —push origin git@mygitea/myrepo.git.
  4. Finally git push —mirror

Be warned that the clone & push steps take a considerable amount of time if you wanna try this yourself.

As for the behavior of show-ref vs branch, I also suspect that they have different behaviors . But after testing it with PyTorch repo, the result of branch/tag are identical to show-ref, except for the formatting, of course. Still this is my concern, too. I’m glad you brought it up.

Copy link
Contributor

@zeripath zeripath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced issue specifically references v1.13 - please ensure that this is actually needed on master since the no-go-git PR was merged.

@zeripath
Copy link
Contributor

zeripath commented Jan 2, 2021

#14180 (comment) appears to indicate that show-ref is not the problem.

@skyline75489
Copy link
Contributor Author

skyline75489 commented Jan 19, 2021

@zeripath I tested this on a Linux machine and found that the performance difference between git branch & git show-ref --heads are minimum. I don't have a clue why but on macOS & Windows, the performance difference is significant. So I guess this explains why no one noticed this before since most deployment of Gitea would be Linux-based.

EDIT: Seems that # of CPU cores matter a lot. My old MBP has only 2C4T. And git show-ref is fast on both Windows & Linux with a 8+ cores CPU.

@skyline75489
Copy link
Contributor Author

I'll close this for now since I can not provide a definitive conclusion that this works for every repo & machine. Let's focus on whatever hot paths that causes #14180 .

@go-gitea go-gitea locked and limited conversation to collaborators Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. performance/bigrepo Performance Issues affecting Big Repositories type/refactoring Existing code has been cleaned up. There should be no new functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants