Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in ipfs node #6802

Open
sidenaio opened this issue Dec 17, 2019 · 29 comments · Fixed by multiformats/go-multistream#50
Open

Panic in ipfs node #6802

sidenaio opened this issue Dec 17, 2019 · 29 comments · Fixed by multiformats/go-multistream#50
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@sidenaio
Copy link

sidenaio commented Dec 17, 2019

Version information:
go-ipfs version: b8ec598

Description:
Ipfs node crashes
Stacktrace :
image

@sidenaio sidenaio added the kind/bug A bug in existing code (including security flaws) label Dec 17, 2019
@Stebalien
Copy link
Member

Knowns:

  • Msgio is definitely trying to read 4 bytes (it's reading a length).
  • Read is clearly claiming to have read more than it did (64?).

The underlying connection must be one of:

  • A TCP connection.
  • A websocket connection.
  • A relay connection? We could be relaying over one of our stream multiplexers.

That means either:

  • One of the above connection types is claiming to have read more than it did.
  • There's a race condition somewhere.

We've had the first class of bug before in go-secio but that shouldn't be the case here.


  • What version of go-ipfs are you actually using and are you using go-ipfs as a binary or including it as a library? The library versions you're using are not lining up with the release.
  • Could you post the output of go list -m all?
  • Are you using relays on your private network?
  • Are any of your nodes listening on websockets?

@ridenaio
Copy link

  • We are using go-ipfs as a library. github.com/ipfs/go-ipfs v0.4.22-0.20191119151441-b8ec598d5801
  • You can see go list -m all output here
  • Yes, we are using auto relays on our private network
  • No

@Stebalien
Copy link
Member

We are using go-ipfs as a library. github.com/ipfs/go-ipfs v0.4.22-0.20191119151441-b8ec598d5801

I see. It looks like you're using a commit on master from November. I recommend updating to the latest master if possible.


Yes, we are using auto relays on our private network

Got it. This could be a bug any of the stream multiplexers, or maybe secio?

@vyzo this looks like libp2p/go-msgio#16.

@sidenaio
Copy link
Author

The bug reproduced on the commit c9e8070

@Stebalien
Copy link
Member

@vyzo could you try to reproduce this?

@ridenaio
Copy link

Any steps here? Failed for me too

panic: runtime error: slice bounds out of range [:203] with capacity 4

goroutine 13461090 [running]:
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
	D:/projects/go/pkg/mod/github.com/libp2p/go-libp2p-pnet@v0.1.0/psk_conn.go:43 +0x3e7
io.ReadAtLeast(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x4, 0x0, 0x0, 0x0)
	C:/Go/src/io/io.go:310 +0x194
io.ReadFull(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
	C:/Go/src/io/io.go:329 +0xb1
github.com/libp2p/go-msgio.ReadLen(0x7290198, 0xc0034e37c0, 0xc003534dd0, 0x4, 0x4, 0x0, 0x0, 0x0)
	D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/num.go:27 +0x10e
github.com/libp2p/go-msgio.(*reader).nextMsgLen(0xc003534dc0, 0x0, 0x0, 0x0)
	D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:178 +0xed
github.com/libp2p/go-msgio.(*reader).NextMsgLen(0xc003534dc0, 0x0, 0x0, 0x0)
	D:/projects/go/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:173 +0xcb
github.com/libp2p/go-libp2p-secio.(*etmReader).Read(0xc000bac0a0, 0xc0031a7830, 0xc, 0xc, 0x0, 0x0, 0x0)
	D:/projects/go/pkg/mod/github.com/libp2p/go-libp2p-secio@v0.2.1/rw.go:163 +0x163
io.ReadAtLeast(0x6a49090, 0xc0016ae600, 0xc0031a7830, 0xc, 0xc, 0xc, 0x0, 0x0, 0x0)
	C:/Go/src/io/io.go:310 +0x194
io.ReadFull(0x6a49090, 0xc0016ae600, 0xc0031a7830, 0xc, 0xc, 0x0, 0x0, 0x0)
	C:/Go/src/io/io.go:329 +0xb1
github.com/libp2p/go-yamux.(*Session).recvLoop(0xc0032328f0, 0x0, 0x0)
	D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:507 +0x15a
github.com/libp2p/go-yamux.(*Session).recv(0xc0032328f0)
	D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:486 +0x32
created by github.com/libp2p/go-yamux.newSession
	D:/projects/go/pkg/mod/github.com/libp2p/go-yamux@v1.2.3/session.go:129 +0x528

@Stebalien
Copy link
Member

I've checked read functions in.

  • msgio
  • yamux
  • mplex
  • circuit
  • secio
  • multistream
  • websocket
  • swarm

Given that we're not seeing random memory corruptions everywhere, just here, it still looks like the underlying connection is returning more bytes read than it should. I just don't know how.

@Stebalien
Copy link
Member

What transports are you using? QUIC? Websocket? TLS?

@sidenaio
Copy link
Author

sidenaio commented Jan 29, 2020

It's our config:

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/5555",
      "/ip6/::/tcp/5555"
    ]
  },
  "Bootstrap": [
    "/ip4/127.0.0.1/tcp/40405/ipfs/QmW6sqH4X46qSAUL6Fy7ovX42kWD5wp2n1VYZ3jnRFJGHg"
  ],
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "FilestoreEnabled": true,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "PreferTLS": false,
    "QUIC": false,
    "ShardingEnabled": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoFetch": false,
    "PathPrefixes": [],
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "QmZpRbcmAEJUFUijNdFENoJDgfbnQ5mVu59cUKhJyU1SEv"
  },
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": "",
    "StrictSignatureVerification": false
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Routing": {
    "Type": "dht"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "40s",
      "HighWater": 50,
      "LowWater": 30,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": true,
    "DisableNatPortMap": false,
    "DisableRelay": false,
    "EnableAutoNATService": true,
    "EnableAutoRelay": true,
    "EnableRelayHop": true
  }
}

@aschmahmann
Copy link
Contributor

aschmahmann commented Jan 29, 2020

@sidenaio you exposed your private key in your posted config file (I removed it for you and deleted the comment history). In the future running ipfs config show will strip it out for you, or you can do it manually.

I'd recommend rotating your identity. AFAIK the easiest way to do this is to just create a new repo and copy over the identity components.

@sidenaio
Copy link
Author

@aschmahmann It's a test config generated for my previous comment.

@Kubuxu
Copy link
Member

Kubuxu commented Jan 29, 2020

@Stebalien idea, add sanity check to libp2p-pnet, and print type of the Conn, if it fails.

@Stebalien
Copy link
Member

Double checking: you're not putting these nodes on the public network, right? Putting a node with:

  1. AutoRelay enabled.
  2. RelayHop enabled (i.e., acting as a relay).
  3. A 30-40 connection limit.

Will not end will if you have a lot of nodes in your network.


Otherwise, @Kubuxu is right.

Try to reproduce this with githu.com/libp2p/go-libp2p-pnet@test/ipfs-panic. Instead of panicing, that should print an error and print a connection type.

@sidenaio
Copy link
Author

@Stebalien, yes, this is a private network. We use a swarm key. We will try to reproduce with this branch.
Why are 30-40 connections not enough? What would you recommend for a network with 1k, 5k, 10k nodes?

Stebalien added a commit to multiformats/go-multistream that referenced this issue Feb 4, 2020
This was causing a double-write.

fixes libp2p/go-libp2p-pnet#31
probably fixes ipfs/kubo#6802
fixes ipfs/kubo#6197
@sidenaio
Copy link
Author

sidenaio commented Feb 4, 2020

@Stebalien We caught the error

ERROR[02-03|23:37:36.344] error while writing to stream            id=QmYXBBqKeh9RtAkEcfYbGce5zRyCFttrsHBKK5TK4dSAyj err="expected to read <= 4 in private network adapter, read 201: *struct { *net.TCPConn; manet.maEndpoints }"

@Stebalien
Copy link
Member

Ok, that's a TCP connection.

@Stebalien
Copy link
Member

Ok, I thought this might be related to a concurrent write in go-multistream but, really, that seems very unlikely. I'd still try upgrading to the latest master (includes a fix for that) to see if it helps, but this likely isn't solved.

@Stebalien
Copy link
Member

At this point, I have to assume that we're sharing the buffer somehow.

@Stebalien
Copy link
Member

This is really looking like some weird memory corruption.

  1. We take a lock before reading the next message length.
  2. We use an array embedded in the reader struct (msgio.go). The reader struct is returned by pointer and hidden behind an interface so we shouldn't be overwriting anything, ever.

Therefore, I have no idea how we could be reusing the buffer.


@sidenaio Could you try updating to the latest version of github.com/libp2p/go-libp2p-pnet@test/ipfs-panic and try updating github.com/multiformats/go-multistream@v0.1.1? I want to check to see if we're seeing this issue when the connection returns an error.

Specifically, golang/go#24727. I see you're both using windows.

@Stebalien Stebalien added the need/author-input Needs input from the original author label Feb 7, 2020
@ridenaio
Copy link

ridenaio commented Apr 2, 2020

After upgrading to github.com/multiformats/go-multistream@v0.1.1 error is not reproduced anymore.

@ridenaio
Copy link

ridenaio commented Apr 3, 2020

@Stebalien so, we thought that issue was solved and removed a link to github.com/libp2p/go-libp2p-pnet@test/ipfs-panic on public release. The issue came back again on windows. The current multistream version is 0.1.1.

panic: runtime error: slice bounds out of range [:72] with capacity 4

goroutine 12706 [running]:
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc002986b00, 0xc002546750, 0x4, 0x4, 0x406983, 0xc0037b7680, 0x4)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-pnet@v0.2.0/psk_conn.go:43 +0x249
io.ReadAtLeast(0x46e73d0, 0xc002986b00, 0xc002546750, 0x4, 0x4, 0x4, 0x1c, 0x0, 0x1c)
  C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
  C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:329
github.com/libp2p/go-msgio.ReadLen(0x46e73d0, 0xc002986b00, 0xc002546750, 0x4, 0x4, 0x432471, 0xbb24bc, 0xc002303c01)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/num.go:27 +0x87
github.com/libp2p/go-msgio.(*reader).nextMsgLen(0xc002546740, 0xc002303ce0, 0xc0030d0801, 0xc002303dc0)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:178 +0x7f
github.com/libp2p/go-msgio.(*reader).NextMsgLen(0xc002546740, 0x0, 0x0, 0x0)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-msgio@v0.0.4/msgio.go:173 +0x91
github.com/libp2p/go-libp2p-secio.(*etmReader).Read(0xc00008f9a0, 0xc000d10700, 0xc, 0xc, 0x0, 0x0, 0x0)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-secio@v0.2.1/rw.go:163 +0x108
io.ReadAtLeast(0x46e75d0, 0xc001b2be00, 0xc000d10700, 0xc, 0xc, 0xc, 0xc, 0x0, 0x0)
  C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
  C:/Users/travis/.gimme/versions/go1.13.9.windows.amd64/src/io/io.go:329
github.com/libp2p/go-yamux.(*Session).recvLoop(0xc002016620, 0x0, 0x0)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:526 +0xf0
github.com/libp2p/go-yamux.(*Session).recv(0xc002016620)
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:505 +0x32
created by github.com/libp2p/go-yamux.newSession
  C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-yamux@v1.3.5/session.go:125 +0x38d

ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 6, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 6, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
@ridenaio
Copy link

The problem continues to appear. We are still using a custom branch with error handling for go-libp2p-pnet to avoid panics. Any ideas?

goroutine 4149415 [running]:
runtime/debug.Stack(0x86, 0x0, 0x0)
	C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/runtime/debug/stack.go:24 +0xa4
runtime/debug.PrintStack()
	C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/runtime/debug/stack.go:16 +0x29
github.com/libp2p/go-libp2p-pnet.(*pskConn).Read(0xc00f13c400, 0xc007f8e768, 0x2, 0x2, 0xc01f6b5b30, 0x40ba9c, 0x19279e0)
	C:/Users/travis/gopath/pkg/mod/github.com/idena-network/go-libp2p-pnet@v0.2.1-0.20200406075059-75d9ee9b85ed/psk_conn.go:47 +0x2ca
io.ReadAtLeast(0x5770d78, 0xc00f13c400, 0xc007f8e768, 0x2, 0x2, 0x2, 0x1000, 0x0, 0xc01f6b5ba8)
	C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/io/io.go:310 +0x8e
io.ReadFull(...)
	C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/io/io.go:329
github.com/libp2p/go-libp2p-noise.(*secureSession).readNextInsecureMsgLen(0xc007f8e6e0, 0x0, 0xc01f6b5be8, 0x464bfe)
	C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-noise@v0.1.1/rw.go:134 +0x93
github.com/libp2p/go-libp2p-noise.(*secureSession).Read(0xc007f8e6e0, 0xc0217b4000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-libp2p-noise@v0.1.1/rw.go:53 +0x1fa
bufio.(*Reader).fill(0xc02179afc0)
	C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/bufio/bufio.go:100 +0x10a
bufio.(*Reader).ReadByte(0xc02179afc0, 0xc02179b0e0, 0xc01f6b5de0, 0xce4004)
	C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/bufio/bufio.go:252 +0x40
encoding/binary.ReadUvarint(0x1fdbdc0, 0xc02179afc0, 0xc01f6b5dc4, 0x2, 0x2)
	C:/Users/travis/.gimme/versions/go1.14.12.windows.amd64/src/encoding/binary/varint.go:110 +0x8c
github.com/libp2p/go-mplex.(*Multiplex).readNextHeader(0xc02165f000, 0xc01f6b5dc0, 0x2, 0x0, 0x0)
	C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:548 +0x46
github.com/libp2p/go-mplex.(*Multiplex).handleIncoming(0xc02165f000)
	C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:367 +0xe7
created by github.com/libp2p/go-mplex.NewMultiplex
	C:/Users/travis/gopath/pkg/mod/github.com/libp2p/go-mplex@v0.1.2/multiplex.go:105 +0x204

@Stebalien
Copy link
Member

What's the error message printed before the stack trace?

@Stebalien
Copy link
Member

That is, the patch now includes the connection's type and error message.

@Stebalien
Copy link
Member

However, I have good news: we've ruled out:

  • The stream multiplexer.
  • The security transport.
  • msgio.

Because we've swapped all of these out. Left over are:

  1. Some protocol at the top may be reading from multiple threads.
  2. Multistream.

@Stebalien
Copy link
Member

@ridenaio could you also post go list -m all so I can double-check versions?

@Stebalien
Copy link
Member

We've done a bit of digging here and believe it might be a bug in go's runtime. Specifically, in go's handling of cancellation in Windows AsyncIO may, if Windows does something unexpected, fail to update the "bytes read" length.

But we can't pin-point the issue or reproduce it reliably enough to debug.

@github-actions
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@Stebalien Stebalien removed need/author-input Needs input from the original author kind/stale labels Jun 14, 2022
@woss
Copy link

woss commented Apr 2, 2023

i'm also getting this, reported here #7194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants