Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow to open a repo if there’s many branches in it #14180

Closed
1 of 6 tasks
skyline75489 opened this issue Dec 29, 2020 · 28 comments · Fixed by #25719
Closed
1 of 6 tasks

Slow to open a repo if there’s many branches in it #14180

skyline75489 opened this issue Dec 29, 2020 · 28 comments · Fixed by #25719
Labels
performance/speed performance issues with slow downs topic/ui Change the appearance of the Gitea UI

Comments

@skyline75489
Copy link
Contributor

skyline75489 commented Dec 29, 2020

  • Gitea version (or commit ref): 1.13.0
  • Git version: git 2.29.2.windows
  • Operating system: Windows 10 1809
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Didn’t really try
    • Yes (provide example URL)
    • No
  • Log gist:

Description

The title may be a bit misleading.

I was building a local mirror of PyTorch manually using git push —mirror. And I found that opening the repo is exceptionally slow (~11 seconds). I also built a clone of gitea itself and the time to open the repo is reasonable (~2 seconds).

I think the reason is that PyTorch has over 4000 branches in it.

Screenshots

@lunny lunny added the performance/speed performance issues with slow downs label Dec 29, 2020
@skyline75489 skyline75489 changed the title Slow to open a repo if there’s many submodules in it Slow to open a repo if there’s many branches in it Dec 30, 2020
@skyline75489
Copy link
Contributor Author

So I did a little debugging with Gitea code and found that the reason is the number of branches.

@skyline75489
Copy link
Contributor Author

The root cause of this seem to be git show-ref used by gitea becomes very slow for a repo like PyTorch which have 4000+ branches

@zeripath
Copy link
Contributor

zeripath commented Jan 1, 2021

Could you use pprof to confirm where the delay is?

@skyline75489
Copy link
Contributor Author

I’m new to golang. I’ll try pprof later. Based on my debugging using logging, I found most of the time was spent in GetTags & GetBranches in repoAssginment. This is why I filed the show-ref PR.

By the way because repoAssignment is called every time we need to get the repo context, the entire PyTorch repo feels unresponsive. Everything is slow, be it opening issues, opening commits.

@zeripath
Copy link
Contributor

zeripath commented Jan 2, 2021

Yes. Before I broke my hand this was precisely the kind of thing I was working on.

Ok. pprof can be enabled by setting ENABLE_PPROF=true in [server] in app.ini

Once you have that running on your server you can run:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Ok. pprof can be enabled by setting ENABLE_PPROF=true in [server] in app.ini

And get a SVG on your browser with: web. top would give some other data.

@zeripath
Copy link
Contributor

zeripath commented Jan 2, 2021

I've just reread the issue opening comment - please could you try again on current master. You will likely find it much faster

@skyline75489
Copy link
Contributor Author

skyline75489 commented Jan 2, 2021 via email

@zeripath
Copy link
Contributor

zeripath commented Jan 2, 2021

Interestingly I am not able to duplicate the slow down on linux - ah found it

@skyline75489
Copy link
Contributor Author

On Linux there's buff/cache, which will aggresively cache everything used on the file system. So i think on Linux it should be better. But still, 'show-ref' feels like an unnecessary slow path for me. All the hashes retrieved end up being just ignored. There has to be a better way, right? Even it's not 'branch/tag'.

@zeripath
Copy link
Contributor

zeripath commented Jan 2, 2021

is it this path that is the slow down http://localhost/gitea/administrator/pytorch/branches/ ?

@skyline75489
Copy link
Contributor Author

The one with 4000+ branches on the same page? TBH I never once successfully opened the page until I added pagination myself.

The repo homepage feels considerably slow to me, which is the first thing I noticed.

@zeripath
Copy link
Contributor

zeripath commented Jan 2, 2021

OK I've managed to get pprof results.

(pprof) list repo.Branches    
Total: 12.63s
ROUTINE ======================== code.gitea.io/gitea/routers/repo.Branches in /home/andrew/src/go/gitea/routers/repo/branch.go
         0      4.80s (flat, cum) 38.00% of Total
         .          .     50:	ctx.Data["IsMirror"] = ctx.Repo.Repository.IsMirror
         .          .     51:	ctx.Data["CanPull"] = ctx.Repo.CanWrite(models.UnitTypeCode) || (ctx.IsSigned && ctx.User.HasForkedRepo(ctx.Repo.Repository.ID))
         .          .     52:	ctx.Data["PageIsViewCode"] = true
         .          .     53:	ctx.Data["PageIsBranches"] = true
         .          .     54:
         .      2.22s     55:	ctx.Data["Branches"] = loadBranches(ctx)
         .      2.58s     56:	ctx.HTML(200, tplBranch)
         .          .     57:}
         .          .     58:
         .          .     59:// DeleteBranchPost responses for delete merged branch
         .          .     60:func DeleteBranchPost(ctx *context.Context) {
         .          .     61:	defer redirect(ctx)
(pprof) list repo.loadBranches
Total: 12.63s
ROUTINE ======================== code.gitea.io/gitea/routers/repo.loadBranches in /home/andrew/src/go/gitea/routers/repo/branch.go
         0      2.22s (flat, cum) 17.58% of Total
         .          .    195:	repoIDToGitRepo := map[int64]*git.Repository{}
         .          .    196:	repoIDToGitRepo[ctx.Repo.Repository.ID] = ctx.Repo.GitRepo
         .          .    197:
         .          .    198:	branches := make([]*Branch, len(rawBranches))
         .          .    199:	for i := range rawBranches {
         .      640ms    200:		commit, err := rawBranches[i].GetCommit()
         .          .    201:		if err != nil {
         .          .    202:			ctx.ServerError("GetCommit", err)
         .          .    203:			return nil
         .          .    204:		}
         .          .    205:
         .          .    206:		var isProtected bool
         .          .    207:		branchName := rawBranches[i].Name
         .          .    208:		for _, b := range protectedBranches {
         .          .    209:			if b.BranchName == branchName {
         .          .    210:				isProtected = true
         .          .    211:				break
         .          .    212:			}
         .          .    213:		}
         .          .    214:
         .      920ms    215:		divergence, divergenceError := repofiles.CountDivergingCommits(ctx.Repo.Repository, git.BranchPrefix+branchName)
         .          .    216:		if divergenceError != nil {
         .          .    217:			ctx.ServerError("CountDivergingCommits", divergenceError)
         .          .    218:			return nil
         .          .    219:		}
         .          .    220:
         .      660ms    221:		pr, err := models.GetLatestPullRequestByHeadInfo(ctx.Repo.Repository.ID, branchName)
         .          .    222:		if err != nil {
         .          .    223:			ctx.ServerError("GetLatestPullRequestByHeadInfo", err)
         .          .    224:			return nil
         .          .    225:		}
         .          .    226:		headCommit := commit.ID.String()
(pprof) list git.callShowRef
Total: 12.63s
ROUTINE ======================== code.gitea.io/gitea/modules/git.callShowRef in /home/andrew/src/go/gitea/modules/git/repo_branch_nogogit.go
         0       10ms (flat, cum) 0.079% of Total
         .          .     63:		}
         .          .     64:		if err != nil {
         .          .     65:			return nil, err
         .          .     66:		}
         .          .     67:
         .       10ms     68:		branchName, err := bufReader.ReadString('\n')
         .          .     69:		if err == io.EOF {
         .          .     70:			// This shouldn't happen... but we'll tolerate it for the sake of peace
         .          .     71:			return branchNames, nil
         .          .     72:		}
         .          .     73:		if err != nil {

callshowref is not the problem

@skyline75489
Copy link
Contributor Author

Interesting. That explains why using 'branch/tag' still feels slower, comparing to repos that have a smaller size.

Way to go, PyTorch.

@zeripath
Copy link
Contributor

zeripath commented Jan 2, 2021

it's Gitea's fault - not pytorches tbh.

@zeripath
Copy link
Contributor

zeripath commented Jan 2, 2021

OK - this means that your PR will definitely be helpful - but if we're doing optimisations we should be guided by what actually takes time.

I'm not quite sure why:

         .      2.58s     56:	ctx.HTML(200, tplBranch)

takes so long - I'd have to check - but it's likely that anything that reduces the number of branches will improve that.


In terms of main repo showing being slow that's likely to do with generating the history for each file. That is unfortunately a slightly difficult problem - and is necessarily slow - we need to ajax getting that info instead of delaying render. However you would likely benefit from enabling the cache:

https://docs.gitea.io/en-us/config-cheat-sheet/#cache---lastcommitcache-settings-cachelast_commit

@skyline75489
Copy link
Contributor Author

@zeripath I was joking about PyTorch. I'm glad my findings turned out to be useful to gitea.

@lunny
Copy link
Member

lunny commented Jan 6, 2021

Last commit cache is only for treepath view page but not branches page ?

@skyline75489
Copy link
Contributor Author

So I was trying pprof but got something like this:

(pprof) top10
Showing nodes accounting for 1110ms, 70.70% of 1570ms total
Showing top 10 nodes out of 328
      flat  flat%   sum%        cum   cum%
     360ms 22.93% 22.93%      360ms 22.93%  runtime.memclrNoHeapPointers
     170ms 10.83% 33.76%      170ms 10.83%  runtime.pthread_cond_wait
     160ms 10.19% 43.95%      530ms 33.76%  os/exec.(*Cmd).Start
     140ms  8.92% 52.87%      140ms  8.92%  runtime.kevent
      70ms  4.46% 57.32%       70ms  4.46%  runtime.memmove
      70ms  4.46% 61.78%       70ms  4.46%  runtime.nanotime1
      50ms  3.18% 64.97%       60ms  3.82%  runtime.usleep
      40ms  2.55% 67.52%       40ms  2.55%  internal/poll.(*fdMutex).increfAndClose
      30ms  1.91% 69.43%       30ms  1.91%  syscall.syscall
      20ms  1.27% 70.70%       20ms  1.27%  runtime.(*mspan).refillAllocCache
(pprof) list repo.Branches
Total: 1.57s

I don't know what I did wrong. The steps I took is:

Anything else I need to do to get the method trace?

@michaelbutler
Copy link

We will be watching this too. Our repo has 1600+ branches and as we evaluate Gitea we are seeing 8-10 second page load times. Git clones don't seem to be affected.

@lunny
Copy link
Member

lunny commented Feb 4, 2021

The branches list on repo home page should be loaded asynchronously.

@lunny lunny added the topic/ui Change the appearance of the Gitea UI label Feb 4, 2021
@sandsenter
Copy link

any progress for this issue?
We found that for two repositories, one repo is a mirror repository, the other is a normal repository for developing.
Both have hundreds of branches, the mirror repository's page is opened faster than the other one.

@zeripath
Copy link
Contributor

zeripath commented Jan 8, 2022

any progress for this issue? We found that for two repositories, one repo is a mirror repository, the other is a normal repository for developing. Both have hundreds of branches, the mirror repository's page is opened faster than the other one.

When you say progress have you tried main recently? There have been some improvements there.

However, we still need to stop loading all of the branches for every page.

@lunny
Copy link
Member

lunny commented Jul 13, 2022

There is also another example https://gitea1.dev.blender.org/blender-foundation/blender/branches

@lunny
Copy link
Member

lunny commented Aug 4, 2022

To accelerate the branches list, maybe we have to sync all branches into database like we did with tags.

@lafriks
Copy link
Member

lafriks commented Aug 4, 2022

To accelerate the branches list, maybe we have to sync all branches into database like we did with tags.

I agree, this would also help for PR creating page where branches need to be selected and other places where we currently show all branches to make that dropdown to show only top branches and make it async searchable

@delvh
Copy link
Member

delvh commented Aug 4, 2022

Yes, I can somewhat understand that proposal, but I do have to say that this will be difficult and error-prone to implement/ maintain:
This database entry will need to be updated correctly with every single push. Not to forget all UI-related branch interactions: Creating a branch, deleting a branch, renaming a branch...
If we do that, many potential edge cases are just waiting to invalidate the stored state of branches.
And in the worst case, if we store no longer valid branches there could even be a security issue in case someone names a branch after a security-relevant bug(fix) and deletes the branch later on, but the stored state still displays the branch.

@lunny
Copy link
Member

lunny commented Aug 5, 2022

Yes, I can somewhat understand that proposal, but I do have to say that this will be difficult and error-prone to implement/ maintain: This database entry will need to be updated correctly with every single push. Not to forget all UI-related branch interactions: Creating a branch, deleting a branch, renaming a branch... If we do that, many potential edge cases are just waiting to invalidate the stored state of branches. And in the worst case, if we store no longer valid branches there could even be a security issue in case someone names a branch after a security-relevant bug(fix) and deletes the branch later on, but the stored state still displays the branch.

Yes, that's why I hesitate so long to post the comment. Since we have stored tags in database and it works well, what's the different from storing branches names? And if we have a better method, I would like to give up the idea.

lunny added a commit that referenced this issue Jun 29, 2023
Related #14180
Related #25233 
Related #22639
Close #19786
Related #12763 

This PR will change all the branches retrieve method from reading git
data to read database to reduce git read operations.

- [x] Sync git branches information into database when push git data
- [x] Create a new table `Branch`, merge some columns of `DeletedBranch`
into `Branch` table and drop the table `DeletedBranch`.
- [x] Read `Branch` table when visit `code` -> `branch` page
- [x] Read `Branch` table when list branch names in `code` page dropdown
- [x] Read `Branch` table when list git ref compare page
- [x] Provide a button in admin page to manually sync all branches.
- [x] Sync branches if repository is not empty but database branches are
empty when visiting pages with branches list
- [x] Use `commit_time desc` as the default FindBranch order by to keep
consistent as before and deleted branches will be always at the end.

---------

Co-authored-by: Jason Song <i@wolfogre.com>
@lunny
Copy link
Member

lunny commented Jul 3, 2023

I think this could be closed per #22743. I have tested pytorch which have over 8700 branches and the home page takes about 1200ms in my macBook pro.
Future PRs could make the branches loading async.

silverwind added a commit that referenced this issue Jul 21, 2023
- Send request to get branch/tag list, use loading icon when waiting for
response.
- Only fetch when the first time branch/tag list shows.
- For backend, removed assignment to `ctx.Data["Branches"]` and
`ctx.Data["Tags"]` from `context/repo.go` and passed these data wherever
needed.
- Changed some `v-if` to `v-show` and used native `svg` as mentioned in
#25719 (comment) to
improve perfomance when there are a lot of branches.
- Places Used the dropdown component:

     Repo Home Page
    
<img width="1429" alt="Screen Shot 2023-07-06 at 12 17 51"
src="https://github.com/go-gitea/gitea/assets/17645053/6accc7b6-8d37-4e88-ae1a-bd2b3b927ea0">

    Commits Page

<img width="1431" alt="Screen Shot 2023-07-06 at 12 18 34"
src="https://github.com/go-gitea/gitea/assets/17645053/2d0bf306-d1e2-45a8-a784-bc424879f537">

    Specific commit -> operations -> cherry-pick
    
<img width="758" alt="Screen Shot 2023-07-06 at 12 23 28"
src="https://github.com/go-gitea/gitea/assets/17645053/1e557948-3881-4e45-a625-8ef36d45ae2d">

    Release Page
    
<img width="1433" alt="Screen Shot 2023-07-06 at 12 25 05"
src="https://github.com/go-gitea/gitea/assets/17645053/3ec82af1-15a4-4162-a50b-04a9502161bb">

- Demo


https://github.com/go-gitea/gitea/assets/17645053/d45d266b-3eb0-465a-82f9-57f78dc5f9f3

- Note:

UI of dropdown menu could be improved in another PR as it should apply
to more dropdown menus.

Fix #14180

---------

Co-authored-by: silverwind <me@silverwind.io>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance/speed performance issues with slow downs topic/ui Change the appearance of the Gitea UI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants