-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.5.7 Server pods are crashing after upgrade #13140
Comments
@vermaxik - the second part with the Link for a full stack trace from argo-server for the invalid memory address panic: https://cloud-native.slack.com/archives/C01QW9QSSSK/p1717081687871169 Can you also add a stack trace for the |
@Joibel thank you, yes it's coming from controller, but somehow related to this, it's dropped after rollback 🤔 The stack race in the message (click to expand), but I can also add it here:
it's not so many of |
Ah, sorry, missed the stack trace above. Thank you. It might give a clue as to what's going wrong in both cases, even if it is rarer. |
We were able to reproduce the issue in QA environment. With the parallelism 20 there was 100 workflows in the queue. The problem occurred right after firsts workflows were being scheduled: the workflows-server pods started restarting and it was possible to catch the some of the stacktraces google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc0012cf040, 0x1?)
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:636 +0x145 fp=0xc0012d7f00 sp=0xc0012d7df0 pc=0xf84325
google.golang.org/grpc.(*Server).serveStreams(0xc000852000, {0x3d1cf40?, 0xc0012cf040})
/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:979 +0x1c2 fp=0xc0012d7f80 sp=0xc0012d7f00 pc=0xfd5702
google.golang.org/grpc.(*Server).handleRawConn.func1()
/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:920 +0x45 fp=0xc0012d7fe0 sp=0xc0012d7f80 pc=0xfd4f65
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0012d7fe8 sp=0xc0012d7fe0 pc=0x4712e1
created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 243
/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:919 +0x185
goroutine 236 [select, 1 minutes]:
runtime.gopark(0xc001557f00?, 0x2?, 0x9?, 0x0?, 0xc001557ed4?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00164ad80 sp=0xc00164ad60 pc=0x43e26e
runtime.selectgo(0xc00164af00, 0xc001557ed0, 0xf8b416?, 0x0, 0xc001274000?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc00164aea0 sp=0xc00164ad80 pc=0x44e6a5
google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc000650820, 0x1)
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:418 +0x113 fp=0xc00164af30 sp=0xc00164aea0 pc=0xf6a273
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc00062a4d0)
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:552 +0x86 fp=0xc00164af90 sp=0xc00164af30 pc=0xf6a986
google.golang.org/grpc/internal/transport.newHTTP2Client.func6()
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_client.go:451 +0x85 fp=0xc00164afe0 sp=0xc00164af90 pc=0xf73f25
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00164afe8 sp=0xc00164afe0 pc=0x4712e1
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 192
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_client.go:449 +0x2433
goroutine 249 [select, 1 minutes]:
runtime.gopark(0xc000d36770?, 0x4?, 0x60?, 0xaf?, 0xc000d366c0?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000d36528 sp=0xc000d36508 pc=0x43e26e
runtime.selectgo(0xc000d36770, 0xc000d366b8, 0x0?, 0x0, 0x0?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000d36648 sp=0xc000d36528 pc=0x44e6a5
google.golang.org/grpc/internal/transport.(*http2Server).keepalive(0xc0012cf520)
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:1152 +0x225 fp=0xc000d367c8 sp=0xc000d36648 pc=0xf88485
google.golang.org/grpc/internal/transport.NewServerTransport.func4()
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x25 fp=0xc000d367e0 sp=0xc000d367c8 pc=0xf810c5
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000d367e8 sp=0xc000d367e0 pc=0x4712e1
created by google.golang.org/grpc/internal/transport.NewServerTransport in goroutine 247
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:339 +0x1b0e
goroutine 250 [IO wait, 1 minutes]:
runtime.gopark(0x45d964b800?, 0xb?, 0x0?, 0x0?, 0x15?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000e4ea20 sp=0xc000e4ea00 pc=0x43e26e
runtime.netpollblock(0x4c5158?, 0x407de6?, 0x0?)
/usr/local/go/src/runtime/netpoll.go:564 +0xf7 fp=0xc000e4ea58 sp=0xc000e4ea20 pc=0x436cf7
internal/poll.runtime_pollWait(0x7fefd1f83fb0, 0x72)
/usr/local/go/src/runtime/netpoll.go:343 +0x85 fp=0xc000e4ea78 sp=0xc000e4ea58 pc=0x46b905
internal/poll.(*pollDesc).wait(0xc00077e700?, 0xc001310000?, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000e4eaa0 sp=0xc000e4ea78 pc=0x4e2ec7
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00077e700, {0xc001310000, 0x8000, 0x8000})
/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000e4eb38 sp=0xc000e4eaa0 pc=0x4e41ba
net.(*netFD).Read(0xc00077e700, {0xc001310000?, 0x8000?, 0x8000?})
/usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000e4eb80 sp=0xc000e4eb38 pc=0x5ec9a5
net.(*conn).Read(0xc00007a2f0, {0xc001310000?, 0x0?, 0x0?})
/usr/local/go/src/net/net.go:179 +0x45 fp=0xc000e4ebc8 sp=0xc000e4eb80 pc=0x5fe585
net.(*TCPConn).Read(0x0?, {0xc001310000?, 0x10401?, 0x1040100000000?})
<autogenerated>:1 +0x25 fp=0xc000e4ebf8 sp=0xc000e4ebc8 pc=0x60f8c5
github.com/soheilhy/cmux.(*bufferedReader).Read(0xc0005406a0, {0xc001310000, 0x0?, 0x8000})
/go/pkg/mod/github.com/soheilhy/cmux@v0.1.5/buffer.go:53 +0x12f fp=0xc000e4ec48 sp=0xc000e4ebf8 pc=0x1f8812f
github.com/soheilhy/cmux.(*MuxConn).Read(0x1010401?, {0xc001310000?, 0x410665?, 0x1010401?})
/go/pkg/mod/github.com/soheilhy/cmux@v0.1.5/cmux.go:297 +0x1e fp=0xc000e4ec78 sp=0xc000e4ec48 pc=0x1f8965e
bufio.(*Reader).Read(0xc00097a6c0, {0xc000bc0580, 0x9, 0x30?})
/usr/local/go/src/bufio/bufio.go:244 +0x197 fp=0xc000e4ecb0 sp=0xc000e4ec78 pc=0x696c77
io.ReadAtLeast({0x3ce05c0, 0xc00097a6c0}, {0xc000bc0580, 0x9, 0x9}, 0x9)
/usr/local/go/src/io/io.go:335 +0x90 fp=0xc000e4ecf8 sp=0xc000e4ecb0 pc=0x4b9cf0
io.ReadFull(...)
/usr/local/go/src/io/io.go:354
golang.org/x/net/http2.readFrameHeader({0xc000bc0580, 0x9, 0xc001287ec0?}, {0x3ce05c0?, 0xc00097a6c0?})
/go/pkg/mod/golang.org/x/net@v0.23.0/http2/frame.go:237 +0x65 fp=0xc000e4ed48 sp=0xc000e4ecf8 pc=0x779945
golang.org/x/net/http2.(*Framer).ReadFrame(0xc000bc0540)
/go/pkg/mod/golang.org/x/net@v0.23.0/http2/frame.go:498 +0x85 fp=0xc000e4edf0 sp=0xc000e4ed48 pc=0x77a085
google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams(0xc0012cf520, 0x1?)
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_server.go:636 +0x145 fp=0xc000e4ef00 sp=0xc000e4edf0 pc=0xf84325
google.golang.org/grpc.(*Server).serveStreams(0xc000852000, {0x3d1cf40?, 0xc0012cf520})
/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:979 +0x1c2 fp=0xc000e4ef80 sp=0xc000e4ef00 pc=0xfd5702
google.golang.org/grpc.(*Server).handleRawConn.func1()
/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:920 +0x45 fp=0xc000e4efe0 sp=0xc000e4ef80 pc=0xfd4f65
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000e4efe8 sp=0xc000e4efe0 pc=0x4712e1
created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 247
/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:919 +0x185
goroutine 239 [select, 1 minutes]:
runtime.gopark(0xc001587f00?, 0x2?, 0x9?, 0x0?, 0xc001587ed4?)
/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc001683d80 sp=0xc001683d60 pc=0x43e26e
runtime.selectgo(0xc001683f00, 0xc001587ed0, 0xf8b416?, 0x0, 0xc0014a8000?, 0x1)
/usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc001683ea0 sp=0xc001683d80 pc=0x44e6a5
google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc000650aa0, 0x1)
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:418 +0x113 fp=0xc001683f30 sp=0xc001683ea0 pc=0xf6a273
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc00062a620)
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/controlbuf.go:552 +0x86 fp=0xc001683f90 sp=0xc001683f30 pc=0xf6a986
google.golang.org/grpc/internal/transport.newHTTP2Client.func6()
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_client.go:451 +0x85 fp=0xc001683fe0 sp=0xc001683f90 pc=0xf73f25
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc001683fe8 sp=0xc001683fe0 pc=0x4712e1
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 226
/go/pkg/mod/google.golang.org/grpc@v1.59.0/internal/transport/http2_client.go:449 +0x2433
Stream closed EOF for argo/argo-workflows-server-9fcd56c9d-5qnkv (argo-server) and nil pointer errorE0603 13:25:27.980203 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 161 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x28f9d80?, 0x54d4f70})
/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0014880c0?})
/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x28f9d80?, 0x54d4f70?})
/usr/local/go/src/runtime/panic.go:914 +0x21f
modernc.org/sqlite/lib._sqlite3VdbeExec(0xc0004dcb40, 0x7fcfac904028)
/go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:73083 +0xca5
modernc.org/sqlite/lib._sqlite3Step(0xc0004dcb40?, 0x7fcfac904028)
/go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:67274 +0x6c
modernc.org/sqlite/lib.Xsqlite3_step(0xc000708500?, 0x7fcfac904028)
/go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:67339 +0xab
zombiezen.com/go/sqlite.(*Stmt).step(0xc0010517a0)
/go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlite.go:799 +0xac
zombiezen.com/go/sqlite.(*Stmt).Step(0xc0010517a0)
/go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlite.go:785 +0xb6
zombiezen.com/go/sqlite/sqlitex.exec(0xc0010517a0, 0x3, 0xc002269708)
/go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlitex/exec.go:293 +0x308
zombiezen.com/go/sqlite/sqlitex.Execute(0xc00102f000?, {0x2e62f90?, 0x2d49?}, 0x2f?)
/go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlitex/exec.go:123 +0x45
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).upsertWorkflow(0xc0007d0ff0, 0xc000deed80)
/go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:237 +0x5bb
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).Update(0xc0007d0ff0, {0x2d47e20?, 0xc000deed80?})
/go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:176 +0x9e
k8s.io/client-go/tools/cache.(*Reflector).watchHandler(0xc0001bbb20, {0x0?, 0x0?, 0x5519740?}, {0x3cf17d0?, 0xc000ed54c0}, 0xc002269df0, 0xc001051800, 0xc000103380)
/go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:506 +0x92e
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0001bbb20, 0xc000103380)
/go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:429 +0x656
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:221 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?)
/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:155 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0004cc000?, {0x3ce28a0, 0xc000830d70}, 0x1, 0xc000103380)
/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:156 +0xaf
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0001bbb20, 0xc000103380)
/go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:220 +0x1cd
github.com/argoproj/argo-workflows/v3/server/workflow.(*workflowServer).Run(0x3d0fe08?, 0xc000778b90?)
/go/src/github.com/argoproj/argo-workflows/server/workflow/workflow_server.go:86 +0x1c
created by github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).Run in goroutine 1
/go/src/github.com/argoproj/argo-workflows/server/apiserver/argoserver.go:269 +0x1409
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x1972f85]
goroutine 161 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0014880c0?})
/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x28f9d80?, 0x54d4f70?})
/usr/local/go/src/runtime/panic.go:914 +0x21f
modernc.org/sqlite/lib._sqlite3VdbeExec(0xc0004dcb40, 0x7fcfac904028)
/go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:73083 +0xca5
modernc.org/sqlite/lib._sqlite3Step(0xc0004dcb40?, 0x7fcfac904028)
/go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:67274 +0x6c
modernc.org/sqlite/lib.Xsqlite3_step(0xc000708500?, 0x7fcfac904028)
/go/pkg/mod/modernc.org/sqlite@v1.29.1/lib/sqlite_linux_amd64.go:67339 +0xab
zombiezen.com/go/sqlite.(*Stmt).step(0xc0010517a0)
/go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlite.go:799 +0xac
zombiezen.com/go/sqlite.(*Stmt).Step(0xc0010517a0)
/go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlite.go:785 +0xb6
zombiezen.com/go/sqlite/sqlitex.exec(0xc0010517a0, 0x3, 0xc002269708)
/go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlitex/exec.go:293 +0x308
zombiezen.com/go/sqlite/sqlitex.Execute(0xc00102f000?, {0x2e62f90?, 0x2d49?}, 0x2f?)
/go/pkg/mod/zombiezen.com/go/sqlite@v1.2.0/sqlitex/exec.go:123 +0x45
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).upsertWorkflow(0xc0007d0ff0, 0xc000deed80)
/go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:237 +0x5bb
github.com/argoproj/argo-workflows/v3/server/workflow/store.(*SQLiteStore).Update(0xc0007d0ff0, {0x2d47e20?, 0xc000deed80?})
/go/src/github.com/argoproj/argo-workflows/server/workflow/store/sqlite_store.go:176 +0x9e
k8s.io/client-go/tools/cache.(*Reflector).watchHandler(0xc0001bbb20, {0x0?, 0x0?, 0x5519740?}, {0x3cf17d0?, 0xc000ed54c0}, 0xc002269df0, 0xc001051800, 0xc000103380)
/go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:506 +0x92e
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0001bbb20, 0xc000103380)
/go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:429 +0x656
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:221 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?)
/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:155 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0004cc000?, {0x3ce28a0, 0xc000830d70}, 0x1, 0xc000103380)
/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:156 +0xaf
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0001bbb20, 0xc000103380)
/go/pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:220 +0x1cd
github.com/argoproj/argo-workflows/v3/server/workflow.(*workflowServer).Run(0x3d0fe08?, 0xc000778b90?)
/go/src/github.com/argoproj/argo-workflows/server/workflow/workflow_server.go:86 +0x1c
created by github.com/argoproj/argo-workflows/v3/server/apiserver.(*argoServer).Run in goroutine 1
/go/src/github.com/argoproj/argo-workflows/server/apiserver/argoserver.go:269 +0x1409
Stream closed EOF for argo/argo-workflows-server-9fcd56c9d-5qnkv (argo-server) |
As mentioned in Slack, the nil pointer error is due to the SQLite change from #13021 / #12736. As is the index out of bounds error; they both go through SQLite in the stack trace. cc @jiachengxu who wrote the PRs Also as Alan asked on Slack, can you identify what action caused the Server to panic? Were you using the UI, the CLI, or the API and which page, command, or method made it panic?
As I wrote on Slack, it would be due to a different PR as this one only impacts the Server. Also likely a different version, since you upgraded/rolled back across two patches and 3.5.6 is a more likely culprit for Controller errors than 3.5.7. |
It looks like the nil pointer error is from https://github.com/argoproj/argo-workflows/blob/main/server/workflow/store/sqlite_store.go#L105, and the |
Hmm but the |
Indeed. The discussion in slack has gone on from here, but not with really any enlightenment as to what is going wrong. |
Yea I just saw and read through the Slack thread (first message since I'm awake / in my TZ). I see you said roughly the same thing as me there. 👍
This was just confirmed on Slack that the Controller error rate is from 3.5.6. EDIT: split into #13149 |
From OP:
i.e.
i.e.
Notably these are different API calls. Also, I'm not a Go expert, but I'm thinking the line number is just the start, so it would include the rest of the statement after, which would include some variables other than |
[zombiezen/go-sqlite] (https://github.com/zombiezen/go-sqlite/blob/main/doc.go#L32) is not thread safe when used through a single connection. The current code is provably racing (run the server with `-race` and a few workflows being run) and it will tell you this if you `argo list` via the server a few times. This change doesn't attempt to move to a multiple connection model, it's a minimal change to stop the server crashing all the time, by mutexing the use of the sql connection. Fixes argoproj#13154 and argoproj#13140 Signed-off-by: Alan Clucas <alan@clucas.org>
I had the same issue with 3.5.7, is this fix available in release version now ? |
In 3.5.8 yes, see the changelog: #13206 |
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened/what you expected to happen?
After upgrading from 3.5.5 to 3.5.7 argo server pods crashed with two types of errors
Recovered from panic: runtime error: invalid memory address or nil pointer dereference
edited by agilgur5: reformatted to make it more readable
Recovered from panic: runtime error: index out of range [70437463654405] with length 64
edited by agilgur5: reformatted to make it more readable
edited by Joibel and agilgur5: The below split into a separate issue - see #13149
and argo controllers are flooded with
Version
3.5.7
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
we have dynamic load, average 1-1,5 pods per second, max 50 running in parallel, up to 100 in pending status, workflows are persisted in postgres database.
Here is part of our config:
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: