Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitea 1.20 leaking resources #26442

Closed
lafriks opened this issue Aug 10, 2023 · 19 comments
Closed

Gitea 1.20 leaking resources #26442

lafriks opened this issue Aug 10, 2023 · 19 comments
Labels

Comments

@lafriks
Copy link
Member

lafriks commented Aug 10, 2023

Description

After upgrade from 1.19 to 1.20 we noticed that at one point (about twice a day) Gitea would become unstable and would not accept git push anymore and would fail with:

.ers/web/repo/http.go:485:serviceRPC() [E] Fail to serve RPC(upload-pack) in /srv/git/repositories/xxx/web.git: exit status 128 - error: git upload-pack: git-pack-objects died with error

We could not pinpoint exact cause of what is reason for this as in one point it was that it can not create process that was failing at linux syscall level.

Our temporary fix to get it stable again was by disabling /user/events websocket endpoint at reverse proxy level. This does help but of course on degradation of UX.

At peak times we have about 50 users using it simultaneously so it's not that high number.

If anyone has any ideas on where to look to pinpoint the source of problem let me know :)

Gitea Version

1.20

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.25.1

Operating System

Ubuntu 20.04

How are you running Gitea?

From binary using systemd

Database

PostgreSQL

@silverwind
Copy link
Member

silverwind commented Aug 10, 2023

/user/events websocket

It's not WebSocket, it's still EventSource 😆. But yes I could see that this has potential to consume many resources, thought I don't recall it having changed any lately. EventSource code should be pretty much unchanged since at least 2-3 minor gitea versions.

@lafriks
Copy link
Member Author

lafriks commented Aug 10, 2023

Yes, it's a wild guess, hard to tell what's the real source for the problem but that what helped. There was no such problems with 1.19, nothing else has changed - nor user count or nor behavior

@silverwind
Copy link
Member

silverwind commented Aug 10, 2023

Might need to pprof it. Maybe we already expose a endpoint for it? If not, that might be a nice opt-in debug feature, chi has a built-in middleware for it.

@lunny
Copy link
Member

lunny commented Aug 23, 2023

I also found a similar problem when closing Gitea

2023/08/23 13:24:09 ...eful/manager_unix.go:195:handleSignals() [W] PID 15167. Received SIGINT. Shutting down...
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55673, 200 OK in 21726747.0ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55703, 200 OK in 21725337.4ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:53478, 200 OK in 23888501.5ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:53447, 200 OK in 23891926.5ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55843, 200 OK in 21714246.7ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55815, 200 OK in 21716689.5ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55626, 200 OK in 21746951.8ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55596, 200 OK in 21747738.2ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55565, 200 OK in 21748612.8ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55758, 200 OK in 21722494.1ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:53545, 200 OK in 23883790.2ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:53575, 200 OK in 23882836.3ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55730, 200 OK in 21724003.3ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eful/server_hooks.go:46:doShutdown() [I] PID: 15167 Listener ([::]:3000) closed.
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:53514, 200 OK in 23884622.5ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:53637, 200 OK in 23881324.9ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:55786, 200 OK in 21720757.0ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 ...eb/routing/logger.go:102:func1() [I] router: completed GET /user/events for 127.0.0.1:53606, 200 OK in 23881950.5ms @ events/events.go:18(events.Events)
2023/08/23 13:24:09 cmd/web.go:355:listen() [I] HTTP Listener: 0.0.0.0:3000 Closed
2023/08/23 13:24:09 .../graceful/manager.go:168:doHammerTime() [W] Setting Hammer condition

@lafriks
Copy link
Member Author

lafriks commented Sep 14, 2023

even after disabling /user/events endpoint in reverse proxy we still experience problem few times a week when gitea will stop accepting git push'es and throw errors and only thing that helps to get it back working is gitea restart

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x297d4d7 m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: g 0: unknown pc 0x297d4d7
stack: frame={sp:0x7ffd921887e0, fp:0x0} stack=[0x7ffd9198a0a0,0x7ffd921890b0)
0x00007ffd921886e0: 0x0000000000000000 0x0000000000000000
0x00007ffd921886f0: 0x0000000000000000 0x0000000000000000
0x00007ffd92188700: 0x0000000000000000 0x0000000000000000
0x00007ffd92188710: 0x0000000000000000 0x0000000000000000
0x00007ffd92188720: 0x0000000000000000 0x0000000000000000
0x00007ffd92188730: 0x0000000000000000 0x0000000000000000
0x00007ffd92188740: 0x0000000000000000 0x0000000000000000
0x00007ffd92188750: 0x0000000000000000 0x0000000000000000
0x00007ffd92188760: 0x00007ffd92188830 0x000000000045d409 <runtime.pcvalue+0x0000000000000209>
0x00007ffd92188770: 0x00000000052270ac 0x0000000000000000
0x00007ffd92188780: 0x0000000000000000 0x0000000000000000
0x00007ffd92188790: 0x0000000000000000 0x0000000000000000
0x00007ffd921887a0: 0x0000000200000000 0x0000000000000000
0x00007ffd921887b0: 0x0000000000440634 <runtime.main+0x00000000000000b4> 0x00000000004407b1 <runtime.main+0x0000000000000231>
0x00007ffd921887c0: 0x00000000006a8d04 <encoding/gob.(*Encoder).encodeSingle.func1+0x0000000000000004> 0x00000000006a8d04 <encoding/gob.(*Encoder).encodeSingle.func1+0x0000000000000004>
0x00007ffd921887d0: 0x0000000000000000 0x0000000000000000
0x00007ffd921887e0: <0x0000000000000000 0x00000000006bfcd0 <github.com/yuin/goldmark/util.init+0x0000000000004090>
0x00007ffd921887f0: 0x0000000000016fc8 0x0000000000000000
0x00007ffd92188800: 0x00000000052270ac 0x0000000000000000
0x00007ffd92188810: 0x0000000000000000 0x00000000052100e0
0x00007ffd92188820: 0x0000000005984118 0x00000000063aebe0
0x00007ffd92188830: 0x00007ffd92188870 0x000000000045ded0 <runtime.pcdatavalue+0x0000000000000050>
0x00007ffd92188840: 0x0000000005984118 0x00000000063aebe0
0x00007ffd92188850: 0x0000000000016fc8 0x0000000000440745 <runtime.main+0x00000000000001c5>
0x00007ffd92188860: 0xfffffffe7fffffff 0xffffffffffffffff
0x00007ffd92188870: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd92188880: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd92188890: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd921888a0: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd921888b0: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd921888c0: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd921888d0: 0xffffffffffffffff 0xffffffffffffffff
runtime: g 0: unknown pc 0x297d4d7
stack: frame={sp:0x7ffd921887e0, fp:0x0} stack=[0x7ffd9198a0a0,0x7ffd921890b0)
0x00007ffd921886e0: 0x0000000000000000 0x0000000000000000
0x00007ffd921886f0: 0x0000000000000000 0x0000000000000000
0x00007ffd92188700: 0x0000000000000000 0x0000000000000000
0x00007ffd92188710: 0x0000000000000000 0x0000000000000000
0x00007ffd92188720: 0x0000000000000000 0x0000000000000000
0x00007ffd92188730: 0x0000000000000000 0x0000000000000000
0x00007ffd92188740: 0x0000000000000000 0x0000000000000000
0x00007ffd92188750: 0x0000000000000000 0x0000000000000000
0x00007ffd92188760: 0x00007ffd92188830 0x000000000045d409 <runtime.pcvalue+0x0000000000000209>
0x00007ffd92188770: 0x00000000052270ac 0x0000000000000000
0x00007ffd92188780: 0x0000000000000000 0x0000000000000000
0x00007ffd92188790: 0x0000000000000000 0x0000000000000000
0x00007ffd921887a0: 0x0000000200000000 0x0000000000000000
0x00007ffd921887b0: 0x0000000000440634 <runtime.main+0x00000000000000b4> 0x00000000004407b1 <runtime.main+0x0000000000000231>
0x00007ffd921887c0: 0x00000000006a8d04 <encoding/gob.(*Encoder).encodeSingle.func1+0x0000000000000004> 0x00000000006a8d04 <encoding/gob.(*Encoder).encodeSingle.func1+0x0000000000000004>
0x00007ffd921887d0: 0x0000000000000000 0x0000000000000000
0x00007ffd921887e0: <0x0000000000000000 0x00000000006bfcd0 <github.com/yuin/goldmark/util.init+0x0000000000004090>
0x00007ffd921887f0: 0x0000000000016fc8 0x0000000000000000
0x00007ffd92188800: 0x00000000052270ac 0x0000000000000000
0x00007ffd92188810: 0x0000000000000000 0x00000000052100e0
0x00007ffd92188820: 0x0000000005984118 0x00000000063aebe0
0x00007ffd92188830: 0x00007ffd92188870 0x000000000045ded0 <runtime.pcdatavalue+0x0000000000000050>
0x00007ffd92188840: 0x0000000005984118 0x00000000063aebe0
0x00007ffd92188850: 0x0000000000016fc8 0x0000000000440745 <runtime.main+0x00000000000001c5>
0x00007ffd92188860: 0xfffffffe7fffffff 0xffffffffffffffff
0x00007ffd92188870: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd92188880: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd92188890: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd921888a0: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd921888b0: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd921888c0: 0xffffffffffffffff 0xffffffffffffffff
0x00007ffd921888d0: 0xffffffffffffffff 0xffffffffffffffff

goroutine 1 [runnable, locked to thread]:
runtime.mapassign_faststr(0x2d29820?, 0xc0002221b0?, {0x31be26d, 0x3})
/usr/local/go/src/runtime/map_faststr.go:203 +0x3f4 fp=0xc000214f50 sp=0xc000214f48 pc=0x418c34
github.com/go-enry/go-enry/v2/data.init()
/go/pkg/mod/github.com/go-enry/go-enry/v2@v2.8.4/data/frequencies.go:605 +0x812ae fp=0xc00021f730 sp=0xc000214f50 pc=0xd09ece
runtime.doInit(0x637e720)
/usr/local/go/src/runtime/proc.go:6525 +0x126 fp=0xc00021f860 sp=0xc00021f730 pc=0x44dec6
runtime.doInit(0x638b740)
/usr/local/go/src/runtime/proc.go:6502 +0x71 fp=0xc00021f990 sp=0xc00021f860 pc=0x44de11
runtime.doInit(0x637bb60)
/usr/local/go/src/runtime/proc.go:6502 +0x71 fp=0xc00021fac0 sp=0xc00021f990 pc=0x44de11
runtime.doInit(0x63a79e0)
/usr/local/go/src/runtime/proc.go:6502 +0x71 fp=0xc00021fbf0 sp=0xc00021fac0 pc=0x44de11
runtime.doInit(0x6395d00)
/usr/local/go/src/runtime/proc.go:6502 +0x71 fp=0xc00021fd20 sp=0xc00021fbf0 pc=0x44de11
runtime.doInit(0x63b3340)
/usr/local/go/src/runtime/proc.go:6502 +0x71 fp=0xc00021fe50 sp=0xc00021fd20 pc=0x44de11
runtime.doInit(0x6390740)
/usr/local/go/src/runtime/proc.go:6502 +0x71 fp=0xc00021ff80 sp=0xc00021fe50 pc=0x44de11
runtime.main()
/usr/local/go/src/runtime/proc.go:233 +0x1c6 fp=0xc00021ffe0 sp=0xc00021ff80 pc=0x440746
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00021ffe8 sp=0xc00021ffe0 pc=0x475381

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000084fb0 sp=0xc000084f90 pc=0x440bb6
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:387
runtime.forcegchelper()
/usr/local/go/src/runtime/proc.go:305 +0xb0 fp=0xc000084fe0 sp=0xc000084fb0 pc=0x4409f0
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x475381
created by runtime.init.6
/usr/local/go/src/runtime/proc.go:293 +0x25

goroutine 3 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000085780 sp=0xc000085760 pc=0x440bb6
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:387
runtime.bgsweep(0x0?)
/usr/local/go/src/runtime/mgcsweep.go:319 +0xde fp=0xc0000857c8 sp=0xc000085780 pc=0x42ac3e
runtime.gcenable.func1()
/usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x41fe86
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x475381
created by runtime.gcenable
/usr/local/go/src/runtime/mgc.go:178 +0x6b

goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc000060150?, 0x4597ab0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000085f70 sp=0xc000085f50 pc=0x440bb6
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:387
runtime.(*scavengerState).park(0x6599820)
/usr/local/go/src/runtime/mgcscavenge.go:400 +0x53 fp=0xc000085fa0 sp=0xc000085f70 pc=0x428af3
runtime.bgscavenge(0x0?)
/usr/local/go/src/runtime/mgcscavenge.go:633 +0x65 fp=0xc000085fc8 sp=0xc000085fa0 pc=0x4290e5
runtime.gcenable.func2()
/usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x41fe26
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x475381
created by runtime.gcenable
/usr/local/go/src/runtime/mgc.go:179 +0xaa

goroutine 5 [finalizer wait]:
runtime.gopark(0x1a0?, 0x659b3a0?, 0x60?, 0x78?, 0xc000084770?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000084628 sp=0xc000084608 pc=0x440bb6
runtime.runfinq()
/usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000847e0 sp=0xc000084628 pc=0x41eec7
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x475381
created by runtime.createfing
/usr/local/go/src/runtime/mfinal.go:163 +0x45

goroutine 6 [select]:
runtime.gopark(0xc000092e60?, 0x2?, 0x88?, 0x9b?, 0xc000092db4?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000092be0 sp=0xc000092bc0 pc=0x440bb6
runtime.selectgo(0xc000092e60, 0xc000092db0, 0x0?, 0x0, 0x0?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x7be fp=0xc000092d20 sp=0xc000092be0 pc=0x45125e
code.gitea.io/gitea/modules/log.(*EventWriterBaseImpl).Run(0xc0000ae8a0, {0x45c6a50, 0xc0000a56e0})
/source/modules/log/event_writer_base.go:81 +0x2e9 fp=0xc000092f48 sp=0xc000092d20 pc=0x84eee9
code.gitea.io/gitea/modules/log.(*eventWriterConsole).Run(0x45c6a50?, {0x45c6a50?, 0xc0000a56e0?})
<autogenerated>:1 +0x2f fp=0xc000092f70 sp=0xc000092f48 pc=0x8580cf
code.gitea.io/gitea/modules/log.eventWriterStartGo.func1()
/source/modules/log/event_writer_base.go:157 +0xb1 fp=0xc000092fe0 sp=0xc000092f70 pc=0x84f7b1
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000092fe8 sp=0xc000092fe0 pc=0x475381
created by code.gitea.io/gitea/modules/log.eventWriterStartGo
/source/modules/log/event_writer_base.go:153 +0x1c5

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000086f50 sp=0xc000086f30 pc=0x440bb6
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc000086fe0 sp=0xc000086f50 pc=0x421bf1
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x475381
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 18 [GC worker (idle)]:
runtime.gopark(0xe061282d97646?, 0xc0000a56e0?, 0xd0?, 0x67?, 0x84f7b1?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000086750 sp=0xc000086730 pc=0x440bb6
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000867e0 sp=0xc000086750 pc=0x421bf1
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x475381
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 34 [GC worker (idle)]:
runtime.gopark(0xe061282d9ab99?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000080750 sp=0xc000080730 pc=0x440bb6
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000807e0 sp=0xc000080750 pc=0x421bf1
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x475381
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 19 [GC worker (idle)]:
runtime.gopark(0xe061282daeaff?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00011a750 sp=0xc00011a730 pc=0x440bb6
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00011a7e0 sp=0xc00011a750 pc=0x421bf1
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x475381
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 35 [GC worker (idle)]:
runtime.gopark(0x65e1c00?, 0x1?, 0xe7?, 0x5c?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000080f50 sp=0xc000080f30 pc=0x440bb6
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc000080fe0 sp=0xc000080f50 pc=0x421bf1
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x475381
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 8 [GC mark termination]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:463 fp=0xc0000876f0 sp=0xc0000876e8 pc=0x473160
runtime.gcMarkDone()
/usr/local/go/src/runtime/mgc.go:807 +0xff fp=0xc000087750 sp=0xc0000876f0 pc=0x4209df
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1407 +0x345 fp=0xc0000877e0 sp=0xc000087750 pc=0x421e45
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x475381
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 20 [GC worker (idle)]:
runtime.gopark(0xe061282d98af4?, 0x1?, 0xbe?, 0x5d?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00011af50 sp=0xc00011af30 pc=0x440bb6
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00011afe0 sp=0xc00011af50 pc=0x421bf1
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x475381
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 36 [GC worker (idle)]:
runtime.gopark(0xe061282d04052?, 0x3?, 0xa0?, 0xd7?, 0x0?)
/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000081750 sp=0xc000081730 pc=0x440bb6
runtime.gcBgMarkWorker()
/usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000817e0 sp=0xc000081750 pc=0x421bf1
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x475381
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1199 +0x25

rax 0x0
rbx 0x636d2f8
rcx 0x297d4d7
rdx 0x0
rdi 0x2
rsi 0x7ffd921887e0
rbp 0x4c4fd1e
rsp 0x7ffd921887e0
r8 0x0
r9 0x7ffd921887e0
r10 0x8
r11 0x246
r12 0x71421a0
r13 0x20
r14 0x659aa20
r15 0x1
rip 0x297d4d7
rflags 0x246
cs 0x33
fs 0x0
gs 0x0

@lunny
Copy link
Member

lunny commented Sep 14, 2023

The error about pthread_create failed: Resource temporarily unavailable normally means that you are trying to create too many threads.

Take a look at how many thread numbers are allowed for this linux machine

cat /proc/sys/kernel/threads-max

And change it to a greater number if the memory is enough.

echo 100000 > /proc/sys/kernel/threads-max

You could also have a monitor record for the thread's number.

@lafriks
Copy link
Member Author

lafriks commented Sep 14, 2023

already checked that, does not seems anything out of ordinary (but we will check this when everything goes to hell again):

$ cat /proc/sys/kernel/threads-max
127248

# Gitea process thead count:
$ ps -eLf | egrep ":[0-9][0-9] /usr/local/bin/gitea" | wc -l | sed -e 's/^ *//'
65

# All processes thread count:
$ ps -eo nlwp | tail -n +2 | awk '{ num_threads += $1 } END { print num_threads }'
351

@lunny
Copy link
Member

lunny commented Sep 14, 2023

Maybe you can use Prometheus & Grafana to track the thread number.

@lafriks
Copy link
Member Author

lafriks commented Sep 14, 2023

will try to enable processes monitoring in node_exporter

@lafriks
Copy link
Member Author

lafriks commented Sep 15, 2023

Problem happened again sooner than expected.

Thread/process count is not the problem, memory usage is

Looks like on push on ~600MB repo gitea process memory usage spikes at least +4GB and it does not recover from that and only restart of gitea process helps:

attels

@silverwind
Copy link
Member

A pprof snapshot from admin ui should give some info on the memory allocations.

@lafriks
Copy link
Member Author

lafriks commented Sep 15, 2023

admin ui pprof allows downloading only cpu profile but not memory profile

@silverwind
Copy link
Member

Not sure what admin pprof does, but this should work:

;; Application profiling (memory and cpu)
;; For "web" command it listens on localhost:6060
;; For "serve" command it dumps to disk at PPROF_DATA_PATH as (cpuprofile|memprofile)_<username>_<temporary id>
;ENABLE_PPROF = false
;;
;; PPROF_DATA_PATH, use an absolute path when you start gitea as service
;PPROF_DATA_PATH = data/tmp/pprof ; Path is relative to _`AppWorkPath`_

@lafriks
Copy link
Member Author

lafriks commented Sep 15, 2023

thing is this that it's not easily reproducible and I don't know what impact enabling this is on production system as it could take days/weeks before problem happens

@silverwind
Copy link
Member

Not an expert but I think the pprof webserver has no real perf impact until you actually take snapshots.

@lunny
Copy link
Member

lunny commented Sep 15, 2023

I couldn't reproduce this on my local machine. I have tested main and release/v1.20 for pushing linux repository via http/ssh. The memory has almost no change when pushing.

Do you have any git hooks or web hooks on that repository?

@lafriks
Copy link
Member Author

lafriks commented Sep 15, 2023

it has only woodpecker and slack/mattermost webhooks, no special githooks used on this instance. full text code search is enabled on the server (elastic) but it has been enabled for at least year already and on 1.19 there was no issues with that

@lunny
Copy link
Member

lunny commented Sep 15, 2023

Maybe you can try to disable code indexer temporarily if possible for some days to take a look whether it's the problem.

@wxiaoguang
Copy link
Contributor

Feel free to provide more clues by #28596 (and it has been backported to 1.21)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants