Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after upgrade to 1.13.0+rc2 consumes too much memory #13718

Closed
2 of 6 tasks
kuznetz opened this issue Nov 27, 2020 · 21 comments
Closed
2 of 6 tasks

after upgrade to 1.13.0+rc2 consumes too much memory #13718

kuznetz opened this issue Nov 27, 2020 · 21 comments
Labels
type/question Issue needs no code to be fixed, only a description on how to fix it yourself.

Comments

@kuznetz
Copy link

kuznetz commented Nov 27, 2020

  • Gitea version (or commit ref): 1.13.0+rc2
  • Git version: 2.26.2
  • Operating system: official docker, hosted on Debian GNU/Linux 10 (buster)
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
  • Log gist:

Description

After upgrading version from 1.12 (to fix issue #13605), new version takes all avaliable (~1,5GB) memory, and falling docker host.
After limiting to 512MB in docker configuration, gitea reach maximum and not respond to git or web browser.
It allocating memory by 50-10MB, each 1-3 min, until reach maximum.
After 10-15 min after overflow, it looks like the server is restarting and the loop repeats.
Past version consumes ~200mb with similar repos.

Screenshots

@kuznetz kuznetz changed the title after upgrade 1.13.0+rc2 consumes too much memory after upgrade to 1.13.0+rc2 consumes too much memory Nov 27, 2020
@zeripath
Copy link
Contributor

We need logs. We can't even begin to fix an issue like this without some logs.

If I had to guess, and that is all I can do without logs, the issue is going to be the generation of repo stats.

BUT WE NEED LOGS.

Let me just repeat that...

We need logs.

This is so important that we have added information to the issue template:

- Log gist:
<!-- It really is important to provide pertinent logs -->
<!-- Please read https://docs.gitea.io/en-us/logging-configuration/#debugging-problems -->
<!-- In addition, if your problem relates to git commands set `RUN_MODE=dev` at the top of app.ini -->

@kuznetz
Copy link
Author

kuznetz commented Nov 28, 2020

Thanks for deciding to consider my problem. Log and config:
https://gist.github.com/kuznetz/d6b0bcd5ca49ac812dd6d40330206826

@melroy89

This comment has been minimized.

@zeripath
Copy link
Contributor

@kuznetz OK I don't think those logs appear to be very helpful. I think we need some pprof details from the heap.

My suspicion is that this is a go-git related issue - I'm not certain why this has suddenly become a big issue from 1.12 to 1.13, perhaps we migrated some more important code to go-git or more people are trying repositories with large files in them.

In any case my suspicion is that if you try master right now - you'll find that the situation is much better.

@MorphBonehunter
Copy link
Contributor

MorphBonehunter commented Dec 29, 2020

My gitea container was capped to 256MB and works reliable the last month with an avg of 156MB usage.
After Upgrade to 1.13.0 the container was oom-killed if i try to login in the web-gui (the log shows that not even the POST finished).
After adjust the cap i was able to use gitea again with an usage around 294MB.
After update to 1.13.1 the usage is now again on the pre 1.13. level (atm at 164 MB), so the update fixed it for me.

@lunny
Copy link
Member

lunny commented Dec 29, 2020

@MorphBonehunter Glad to hear that. But it seems the converting go-git to git command only occurred on master now.

@cthu1hoo
Copy link

cthu1hoo commented Feb 6, 2021

Mine seems to consistently run out of a 512MB quota:

2309bcb03ff6   gitea-ttrss-docker_app_1           16.85%    511.5MiB / 512MiB     99.90% 

Right after startup it seems to consume about 200MB:

gitea-ttrss-docker_app_1           1.17%     201.2MiB / 512MiB

After pushing one commit in a tiny repository:

2309bcb03ff6   gitea-ttrss-docker_app_1           0.79%     443.4MiB / 512MiB     86.60% 

Can I do something to reduce ram usage? I'm using 1.13.2. Vanilla Gogs on the same exact repository used (quite literally) ten times less memory.

@techknowlogick
Copy link
Member

@cthu1hoo just like above, this doesn't help us. By saying it uses x amount of RAM we can't do anything about that as we don't know what is going on. We NEED logs, and debugging information to be able to help you. Maybe you have large binary files in your git repo, and go-git (the underlying git library that we are transitioning away from) reads the whole file into memory, but again we do not know what is going on if you don't provide us detailed information. Are there git garbage collect cron jobs going on that you have enabled on binary start? Have you tried the suggestions from above such as testing if the changes upcoming in 1.14.0-dev help? There is so much we do not know if you don't tell us.

@cthu1hoo
Copy link

cthu1hoo commented Feb 7, 2021

I'll be happy to provide more information, just point me at the document or something describing how to get it.

Maybe you have large binary files in your git repo

I have a bunch of fairly old repositories (some started before git was a thing) but there should be nothing particularly large in them.

Are there git garbage collect cron jobs going on that you have enabled on binary start?

I'm using Docker setup at mostly default settings, I haven't tweaked anything like that.


I've cloned my tt-rss.org production instance (1.13.2) to a :latest docker hub build, and I'm seeing an improvement, I think:

1.13.2:

gitea-ttrss-docker_app_1   1.60%     247.3MiB / 512MiB   48.29%

image

1.14.0+dev-665-g19fccdc45:

gitea-ttrss-docker-dev_app_1   2.63%     152.2MiB / 512MiB   29.72%

image

However, even though I can clone and push a few things into repositories, production is under some load from other people, while test instance isn't, which could explain at least some difference in RAM consumption.

I don't really follow Gitea development closely, is development version stable enough, or is it broken a lot? If I can rely on :latest docker hub image to actually function properly, I could upgrade my production instance to it which would provide better information. Quite a bunch of people are using tt-rss so I would prefer to not break things for them.

@zeripath
Copy link
Contributor

zeripath commented Feb 7, 2021

From your info above it appears that master is better than 1.13 for you by at least 2 times.

It looks like you're yet another report of the go-git memory problems.


However to as tech says above:

We need more logs.

In this case, we need pprof data:

[server]
ENABLE_PPROF = true

The way you use pprof is on the server do:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60
web

should give you an SVG map of where memory usage is. top would produce a list of top users. list <functionname> should give you a line-by-line analysis if the source is available. help will show you more information and options.

More detailed gitea logs can be generated by looking at: https://docs.gitea.io/en-us/logging-configuration/#debugging-problems

SQL logs could be obtained using:

[database]
LOG_SQL=true
...

Additional git logs using:

RUN_MODE = dev

However the pprof data is what we need. My guess is you will find the mem usage is all in the go-git code.

@cthu1hoo
Copy link

cthu1hoo commented Feb 7, 2021

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60

This seems to be the CPU profile, am I reading this wrong or is heap profile needed instead?

I've noticed one more thing: there seems to be much higher RAM usage when trying to push over HTTP, as opposed to SSH. My Gitea instance is behind Cloudflare so some people do that.

Trying to push a large repository over HTTPS on 1.13.2 reliably OOMs and crashes the instance in a few seconds:

C:tt-rss:$ git push gogs-https-1 master
Username for 'https://git.tt-rss.org': fox
Password for 'https://fox@git.tt-rss.org':
Enumerating objects: 72210, done.
Counting objects: 100% (72210/72210), done.
Delta compression using up to 16 threads
Compressing objects: 100% (26630/26630), done.
Writing objects: 100% (72210/72210), 92.34 MiB | 8.22 MiB/s, done.
Total 72210 (delta 41122), reused 72179 (delta 41120)
error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502
fatal: the remote end hung up unexpectedly
fatal: the remote end hung up unexpectedly
Everything up-to-date

I could crash 1.13 instance updating a very small repository over HTTP.

The same exact push finishes successfully on 1.14-dev instance with container RAM usage peaking at ~470MB for a short time before going back to ~170MB. I had to bump container RAM limit to ~700MB before I could finish the push on 1.13 in one go.

Here's some pprof top reports from 1.13:

Fetching profile over HTTP from https://git.tt-rss.org/debug/pprof/profile?seconds=30
Saved profile in /home/fox/pprof/pprof.gitea.samples.cpu.008.pb.gz
File: gitea
Type: cpu
Time: Feb 7, 2021 at 1:07pm (MSK)
Duration: 30.54s, Total samples = 3.50s (11.46%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2810ms, 80.29% of 3500ms total
Dropped 116 nodes (cum <= 17.50ms)
Showing top 10 nodes out of 231
      flat  flat%   sum%        cum   cum%
     810ms 23.14% 23.14%      810ms 23.14%  golang.org/x/crypto/argon2.blamkaSSE4
     380ms 10.86% 34.00%      380ms 10.86%  runtime.futex
     330ms  9.43% 43.43%      330ms  9.43%  golang.org/x/crypto/argon2.mixBlocksSSE2
     280ms  8.00% 51.43%      280ms  8.00%  runtime.epollwait
     270ms  7.71% 59.14%      270ms  7.71%  golang.org/x/crypto/argon2.xorBlocksSSE2
     200ms  5.71% 64.86%      320ms  9.14%  runtime.scanobject
     200ms  5.71% 70.57%      360ms 10.29%  syscall.Syscall
     130ms  3.71% 74.29%      130ms  3.71%  runtime.memclrNoHeapPointers
     120ms  3.43% 77.71%      120ms  3.43%  runtime.duffzero
      90ms  2.57% 80.29%       90ms  2.57%  runtime.usleep
(pprof)
Fetching profile over HTTP from https://git.tt-rss.org/debug/pprof/heap
Saved profile in /home/fox/pprof/pprof.gitea.alloc_objects.alloc_space.inuse_objects.inuse_space.029.pb.gz
File: gitea
Type: inuse_space
Time: Feb 7, 2021 at 1:08pm (MSK)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 159.80MB, 85.65% of 186.57MB total
Dropped 43 nodes (cum <= 0.93MB)
Showing top 10 nodes out of 150
      flat  flat%   sum%        cum   cum%
      64MB 34.30% 34.30%       64MB 34.30%  golang.org/x/crypto/argon2.initBlocks
   28.01MB 15.01% 49.32%    28.51MB 15.28%  github.com/syndtr/goleveldb/leveldb/memdb.New
   20.43MB 10.95% 60.27%    20.43MB 10.95%  code.gitea.io/gitea/modules/public.glob..func1
   18.94MB 10.15% 70.42%    18.94MB 10.15%  bytes.makeSlice
    8.79MB  4.71% 75.13%    10.29MB  5.51%  gopkg.in/ini%2ev1.(*Section).NewKey
    6.96MB  3.73% 78.86%     6.96MB  3.73%  text/template/parse.(*Tree).newText
    4.50MB  2.41% 81.27%     4.50MB  2.41%  text/template/parse.(*Tree).newPipeline
       3MB  1.61% 82.88%        3MB  1.61%  gopkg.in/ini%2ev1.(*parser).readValue
       3MB  1.61% 84.49%        3MB  1.61%  gopkg.in/ini%2ev1.readKeyName
    2.17MB  1.16% 85.65%     2.17MB  1.16%  github.com/denisenkom/go-mssqldb/internal/cp.init

One more heap profile right near the end of it when RAM usage was spiking:

Fetching profile over HTTP from https://git.tt-rss.org/debug/pprof/profile?seconds=5
Saved profile in /home/fox/pprof/pprof.gitea.samples.cpu.006.pb.gz
File: gitea
Type: cpu
Time: Feb 7, 2021 at 1:01pm (MSK)
Duration: 5.13s, Total samples = 1.34s (26.12%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top                                                                                                                         Showing nodes accounting for 670ms, 50.00% of 1340ms total
Showing top 10 nodes out of 303
      flat  flat%   sum%        cum   cum%
     290ms 21.64% 21.64%      420ms 31.34%  regexp.(*Regexp).tryBacktrack
      60ms  4.48% 26.12%       60ms  4.48%  regexp.(*bitState).shouldVisit (inline)                                                       60ms  4.48% 30.60%       60ms  4.48%  runtime.memclrNoHeapPointers
      60ms  4.48% 35.07%       60ms  4.48%  syscall.Syscall
      50ms  3.73% 38.81%       90ms  6.72%  runtime.scanobject
      40ms  2.99% 41.79%       40ms  2.99%  crypto/sha1.blockAMD64
      30ms  2.24% 44.03%       50ms  3.73%  regexp.(*bitState).push
      30ms  2.24% 46.27%       30ms  2.24%  regexp.(*inputString).step
      30ms  2.24% 48.51%      100ms  7.46%  runtime.newobject
      20ms  1.49% 50.00%       20ms  1.49%  aeshashbody

I haven't seen "go-git" anywhere but I don't really know what it should look like. If any of those top reports look useful, I can attach the relevant .pprof files here. I can also dump all pprof files I've collected if that would be of any help.

I haven't enabled pprof on my test (1.14-dev) instance but on all similar tests it performed much better than 1.13 and never crashed once. So, maybe this really is fixed in master and I just need to wait until next stable version.

@zeripath
Copy link
Contributor

zeripath commented Feb 7, 2021

sorry heap is the correct one - I've had to copy and paste that information multiple times so probably copied the wrong one.

@zeripath
Copy link
Contributor

zeripath commented Feb 7, 2021

I think top is unfortunately not the most useful thing here - you'd need to actively look into where the memory use is using web to generate the svg around the point too much memory is being used.

CPU profile is not helpful here,


One thing that will be using more memory is the default to use argon2 for password authentication - change PASSWORD_HASH_ALGO to use a less memory intensive alternative.

@cthu1hoo
Copy link

cthu1hoo commented Feb 7, 2021

Changing PASSWORD_HASH_ALGO to bcrypt seemingly stopped RAM spikes entirely on large pushes on 1.13 (over HTTPS, SSH is similar). RAM usage tops out at ~430MB and drops afterwards. I could safely lower container memory quota to 512MB without any crashes, so far.

Could you explain why would this have such a huge effect while processing received data? HTTP authentication on every request or something?

Here's two SVG files generated from heap profile while near top memory usage (after changes to bcrypt) - first during the push being processed, second after it was finished. Web wouldn't work for me (WSL2) but apparently those could be generated directly via svg. 430 MB is still somewhat high but manageable.

svgs.zip

Thanks.

@zeripath
Copy link
Contributor

zeripath commented Feb 7, 2021

argon2 has an unfortunately high memory requirement. Whilst it's a very secure hash fn it is very much not ideal for low memory environments and for frequent hash fn comparisons.

It's extremely secure by default but not great for people who really want low memory.

That's probably the cause of most of your memory issues. There's an issue with go-git loading all of the data into memory which is fixed on 1.14/master.

The heap looks reasonable from svgs you've sent. (That heap is only ~130Mb not 400Mb.)

@6543
Copy link
Member

6543 commented Feb 7, 2021

@cthu1hoo issue ref for argon2: #14294

@cthu1hoo
Copy link

cthu1hoo commented Feb 7, 2021

Yeah, I think this needs a very prominent warning because a few concurrent git pushes can starve an instance running on a much more serious hardware than a low memory VPS. The way argon eats memory doesn't look very scalable to me.

Anyway, thanks to everyone for help.

@zeripath
Copy link
Contributor

zeripath commented Feb 7, 2021

@cthu1hoo why don't you propose a change to the documentation for this.

@lunny
Copy link
Member

lunny commented Feb 8, 2021

I think we have sent a PR on master to change the default PASSWORD_HASH_ALGO to another one? It seems not.

@lunny lunny added the type/question Issue needs no code to be fixed, only a description on how to fix it yourself. label Feb 8, 2021
@6543
Copy link
Member

6543 commented Feb 8, 2021

thats waht #14294 is for ... it's still an ongoing discussion ... :/

@lunny
Copy link
Member

lunny commented Feb 19, 2021

Since #14675 merged, I think this can be closed. Feel free to reopen it.

@lunny lunny closed this as completed Feb 19, 2021
@go-gitea go-gitea locked and limited conversation to collaborators May 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/question Issue needs no code to be fixed, only a description on how to fix it yourself.
Projects
None yet
Development

No branches or pull requests

8 participants