Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

driver getting panicked with index out of range #574

Open
jkoleti-uptycs opened this issue Mar 13, 2024 · 9 comments
Open

driver getting panicked with index out of range #574

jkoleti-uptycs opened this issue Mar 13, 2024 · 9 comments

Comments

@jkoleti-uptycs
Copy link

jkoleti-uptycs commented Mar 13, 2024

panic: runtime error: index out of range [1] with length 0

goroutine 199034 [running]:
encoding/binary.bigEndian.PutUint16(...)
	/usr/local/go/src/encoding/binary/binary.go:144
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*chunker).send(0xc004441c20, {0x2b65db0, 0x4866120}, {0x2b55f80?, 0xc00200c928})
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/chunker.go:72 +0x39e
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).send(0xc004441c20, {0x2b65db0, 0x4866120}, {0x2b55f80, 0xc00200c928})
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:260 +0x55
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).send(...)
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/message_queue.go:134
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt4).Close(0xc0032aad00, {0x2b65db0, 0x4866120})
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/bolt4.go:925 +0x1a5
created by github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*server).removeIdleOlderThan in goroutine 199032
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/server.go:187 +0x465

Our Database setup

  • Database: memgraph v2.8 in single instance mode
  • Driver version: neo4j-go-driver v5.10.0 (tried with neo4j-go-driver v5.18, still seeing this issue)
  • Operating system: ubuntu 20.04

We are using the neo4j go driver for connecting to our memgraph database with default config.
One of our applications is getting panicked while running cypher queries.
So built our application with the -race flag enabled and below is the stack trace for a panic.

==================
WARNING: DATA RACE
Write at 0x00c004441c50 by goroutine 198988:
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*chunker).send()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/chunker.go:107 +0x2ca
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).send()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:260 +0x54
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).send()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/message_queue.go:134 +0x1a4
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt4).Close()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/bolt4.go:925 +0x125
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).unregUnlocked.func1.1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:391 +0x61

Previous write at 0x00c004441c50 by goroutine 198992:
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*chunker).beginMessage()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/chunker.go:47 +0x1ec
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).begin()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:45 +0xd6
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).appendGoodbye()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:244 +0xd5
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).appendGoodbye()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/message_queue.go:129 +0x124
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt4).Close()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/bolt4.go:924 +0x108
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*server).removeIdleOlderThan.func1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/server.go:187 +0x61

Goroutine 198988 (running) created at:
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).unregUnlocked.func1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:391 +0x104
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:477 +0x30
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).unreg()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:385 +0x18f
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).Return()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:460 +0x4e4
  github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).executeTransactionFunction.func1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/session_with_context.go:441 +0x85
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:477 +0x30
  github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).runRetriable()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/session_with_context.go:416 +0x5ed
  github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).ExecuteWrite()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/session_with_context.go:365 +0x86
  github.com/neo4j/neo4j-go-driver/v5/neo4j.SessionWithContext.ExecuteWrite-fm()
      <autogenerated>:1 +0xa7
  github.com/prahaladd/gograph/neo.(*Neo4jConnection).ExecuteQuery()
      /root/go/pkg/mod/github.com/prahaladd/gograph@v0.2.0/neo/executor.go:139 +0x536
  ...
  ...
  ...

Goroutine 198992 (running) created at:
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*server).removeIdleOlderThan()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/server.go:187 +0x464
  github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*Pool).CleanUp()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/pool.go:146 +0x313
  github.com/neo4j/neo4j-go-driver/v5/neo4j.(*sessionWithContext).Close.func1()
      /root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/session_with_context.go:655 +0x77
==================
panic: runtime error: index out of range [1] with length 0

goroutine 199034 [running]:
encoding/binary.bigEndian.PutUint16(...)
	/usr/local/go/src/encoding/binary/binary.go:144
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*chunker).send(0xc004441c20, {0x2b65db0, 0x4866120}, {0x2b55f80?, 0xc00200c928})
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/chunker.go:72 +0x39e
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*outgoing).send(0xc004441c20, {0x2b65db0, 0x4866120}, {0x2b55f80, 0xc00200c928})
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/outgoing.go:260 +0x55
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*messageQueue).send(...)
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/message_queue.go:134
github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/bolt.(*bolt4).Close(0xc0032aad00, {0x2b65db0, 0x4866120})
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/bolt/bolt4.go:925 +0x1a5
created by github.com/neo4j/neo4j-go-driver/v5/neo4j/internal/pool.(*server).removeIdleOlderThan in goroutine 199032
	/root/go/pkg/mod/github.com/neo4j/neo4j-go-driver/v5@v5.10.0/neo4j/internal/pool/server.go:187 +0x465

This issue is happening randomly, the occurrence of this issue is not predictable.

Saw a similar issue that was closed here - #525

@kdescoteaux-uptycs
Copy link

Note that this race happens when the ExecuteWrite pool Return invokes unreg() (because connection is old?) and this launches a go routine to Close the connection which cannot even begin to run until after the connection put in the Idle list.
When ExecuteWrite returns the session could then be closed by the caller, which launches a Cleanup, which will close the old connections in the Idle List in more go routines.
It is not safe to leave the Close to run asynchronously from the main flow of ExecuteWrite unless there is a lock in Close to prevent two simultaneous Closes from running.

@StephenCathcart
Copy link
Contributor

Hey there!

Thanks for sharing the stack trace and your observations, it's very helpful. I've bumped up the priority on this one because it looks like it might be tied to the previous issue #525 and we now have a bit more info to work with. I'll keep you in the loop with any updates or breakthroughs.

@aniketkdm
Copy link

Is there any update on this. We are seeing the same / similar behavior.

@StephenCathcart
Copy link
Contributor

Hi @aniketkdm, are you seeing this problem using a Neo4j server, and if so which version? I'm just asking as the original issue was around Memgraph.

@aniketkdm
Copy link

Hello @StephenCathcart, yes we are seeing same / similar problem. We are using Neo4j aura through GCP and when we are running heavy load (reads and writes) we get %!v(PANIC=Format method: runtime error: invalid memory address or nil pointer dereference) randomly. We have tried adding recovery functions in our code but the panics are not getting recovered in our function. We are using github.com/neo4j/neo4j-go-driver/v5 v5.18.0. Even with the recovery functions, we are not getting stacktrace for these panics.

@justinwalz
Copy link

To chime in, we're also seeing this issue

  • server version neo4j:5.19.0-enterprise (three kubernetes statefulsets on EKS)
  • client version github.com/neo4j/neo4j-go-driver/v5 v5.20.0

@StephenCathcart
Copy link
Contributor

Thanks for the extra information @aniketkdm and @justinwalz that narrows things down quite a bit.

@adam-bedrock
Copy link

Hey was this ever resolved?

@StephenCathcart
Copy link
Contributor

StephenCathcart commented Nov 18, 2024

Hey @adam-bedrock not yet, unfortunately. We're able to reproduce the error, but simply locking the close function doesn't fix the issue - it needs a bit of a refactor. I'll keep you posted when we're able to release a fix. Thanks for your patience.

Update: #610 should fix the race condition we're seeing, it'll be in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants