Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon crash #1770

Closed
r2jitu opened this issue Sep 30, 2015 · 9 comments
Closed

Daemon crash #1770

r2jitu opened this issue Sep 30, 2015 · 9 comments
Labels
kind/bug A bug in existing code (including security flaws)
Milestone

Comments

@r2jitu
Copy link

r2jitu commented Sep 30, 2015

The daemon repeatedly crashes every couple minutes while I'm running it on Linux x64 with the prebuilt version 0.3.7. I'm streaming a video from a remote server. I've captured the log of the crash here: https://gist.github.com/r2jitu/997311f4121f4af748a3

@jbenet
Copy link
Member

jbenet commented Sep 30, 2015

@r2jitu thanks-- mind trying 0.3.8-dev? download from master at https://gobuilder.me/github.com/ipfs/go-ipfs/cmd/ipfs

@jbenet
Copy link
Member

jbenet commented Sep 30, 2015

@whyrusleeping is this the go-msg thing?

@whyrusleeping
Copy link
Member

This is an interesting error... does your system have much ram? I believe i have seen similar issues on my 256MB ramnode vps.

@r2jitu
Copy link
Author

r2jitu commented Sep 30, 2015

I tried 0.3.8-dev and it still crashed: https://gist.github.com/r2jitu/28f0e7391a86f2ed5784

I'm running on a shared machine, but I'm pretty sure my resource limits are quite high. What I did notice is that after a minute the number of ipfs threads goes from 11 to 125 and the virtual memory bloats to 1.8 GB and resident memory to 99 MB. These numbers keep growing until the program crashes. Since the error is with pthread_create, maybe too many native threads are being created and I'm running into a limit with that server?

@jbenet
Copy link
Member

jbenet commented Sep 30, 2015

  • possible, could be dials.
  • or it could be dht puts. we need to add rate limiting to dht handlers, both per-peer ID, in total, and in storage size.

@r2jitu
Copy link
Author

r2jitu commented Sep 30, 2015

Here's a profile of what all the threads were doing at the time of crash:

   2 [IO wait, 1 minutes]
   2 [IO wait, 2 minutes]
  35 [IO wait]
   1 [chan receive, 1 minutes]
  15 [chan receive, 2 minutes]
  16 [chan receive]
   2 [chan send]
   1 [idle]
  18 [runnable]
  10 [select, 1 minutes]
   1 [select, 2 minutes, locked to thread]
  15 [select, 2 minutes]
 142 [select]
   1 [semacquire, 2 minutes]
   1 [syscall, 2 minutes, locked to thread]
   1 [syscall, 2 minutes]
 233 [syscall]

There are 233 threads running syscall.EpollWait (https://github.com/jbenet/go-reuseport/blob/master/poll/poll_linux.go#L43)

goroutine 10811 [syscall]:
syscall.Syscall6(0xe8, 0x174, 0xc823f71d54, 0x20, 0x1387, 0x0, 0x0, 0x5f3558, 0x560b9bec, 0x3993a061)
  /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
syscall.EpollWait(0x174, 0xc823f71d54, 0x20, 0x20, 0x1387, 0x13567a0, 0x0, 0x0)
  /usr/local/go/src/syscall/zsyscall_linux_amd64.go:365 +0x89
github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-reuseport/poll.(*Poller).WaitWrite(0xc823f71d40, 0xecd9d92f1, 0xc839931900, 0x13567a0, 0x0, 0x0)
  /go/src/github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-reuseport/poll/poll_linux.go:43 +0x15d
github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-reuseport.connect(0x173, 0x7f39bdcccd28, 0xc82010c600, 0xecd9d92f1, 0x39931900, 0x13567a0, 0x0, 0x0)
  /go/src/github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-reuseport/impl_unix.go:325 +0x17c
github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-reuseport.dial(0x12a05f200, 0x0, 0x0, 0x0, 0x7f39bdcccc50, 0xc8225440f0, 0x0, 0x0, 0x0, 0xc823639cb0, ...)
  /go/src/github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-reuseport/impl_unix.go:140 +0x99f
github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-reuseport.(*Dialer).Dial(0xc824045be8, 0xc823639cb0, 0x4, 0xc823639cd0, 0x10, 0x0, 0x0, 0x0, 0x0)
  /go/src/github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/jbenet/go-reuseport/interface.go:98 +0x122
github.com/ipfs/go-ipfs/p2p/net/conn.reuseDial(0x12a05f200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f39c05517e0, ...)
  /go/src/github.com/ipfs/go-ipfs/p2p/net/conn/dial.go:174 +0x1d7
github.com/ipfs/go-ipfs/p2p/net/conn.(*Dialer).rawConnDial(0xc821c73f40, 0x7f39bdcccb68, 0xc8243df700, 0x7f39c05517e0, 0xc8242bef40, 0xc822933b60, 0x22, 0x0, 0x0, 0x0, ...)
  /go/src/github.com/ipfs/go-ipfs/p2p/net/conn/dial.go:128 +0x859
github.com/ipfs/go-ipfs/p2p/net/conn.(*Dialer).Dial.func1(0xc82362f020, 0x7f39bdcccb68, 0xc8243df700, 0xc821c73f40, 0x7f39c05517e0, 0xc8242bef40, 0xc822933b60, 0x22, 0xc8236391c0, 0xc8236391b0)
  /go/src/github.com/ipfs/go-ipfs/p2p/net/conn/dial.go:46 +0xc6
created by github.com/ipfs/go-ipfs/p2p/net/conn.(*Dialer).Dial
  /go/src/github.com/ipfs/go-ipfs/p2p/net/conn/dial.go:77 +0x38a

According to this, each thread that is blocked on a syscall gets its own system thread, so even though GOMAXPROCS is set to 3 in main, it looks like an excessive number of system threads are created.

@r2jitu
Copy link
Author

r2jitu commented Sep 30, 2015

Wow, I just read through the troubles you had to go through to get SO_REUSEPORT. I hope the Go devs update their net library.

This issue is probably a duplicate of #1425.

@whyrusleeping
Copy link
Member

Yeah, this is definitely a reuseport issue. I'll prioritize the epoll fix for 0.3.9

@whyrusleeping whyrusleeping added this to the IPFS 0.3.9 milestone Oct 11, 2015
@em-ly em-ly added the kind/bug A bug in existing code (including security flaws) label Aug 25, 2016
@whyrusleeping
Copy link
Member

We're using our own reuseport lib now. Closing

(thanks @Kubuxu for going through that hell pit of code)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

4 participants