-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic in ipfs node #6802
Comments
Knowns:
The underlying connection must be one of:
That means either:
We've had the first class of bug before in go-secio but that shouldn't be the case here.
|
|
I see. It looks like you're using a commit on master from November. I recommend updating to the latest master if possible.
Got it. This could be a bug any of the stream multiplexers, or maybe secio? @vyzo this looks like libp2p/go-msgio#16. |
The bug reproduced on the commit c9e8070 |
@vyzo could you try to reproduce this? |
Any steps here? Failed for me too
|
I've checked read functions in.
Given that we're not seeing random memory corruptions everywhere, just here, it still looks like the underlying connection is returning more bytes read than it should. I just don't know how. |
What transports are you using? QUIC? Websocket? TLS? |
It's our config: {
"API": {
"HTTPHeaders": {}
},
"Addresses": {
"API": "/ip4/127.0.0.1/tcp/5001",
"Announce": [],
"Gateway": "/ip4/127.0.0.1/tcp/8080",
"NoAnnounce": [],
"Swarm": [
"/ip4/0.0.0.0/tcp/5555",
"/ip6/::/tcp/5555"
]
},
"Bootstrap": [
"/ip4/127.0.0.1/tcp/40405/ipfs/QmW6sqH4X46qSAUL6Fy7ovX42kWD5wp2n1VYZ3jnRFJGHg"
],
"Datastore": {
"BloomFilterSize": 0,
"GCPeriod": "1h",
"HashOnRead": false,
"Spec": {
"mounts": [
{
"child": {
"path": "blocks",
"shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
"sync": true,
"type": "flatfs"
},
"mountpoint": "/blocks",
"prefix": "flatfs.datastore",
"type": "measure"
},
{
"child": {
"compression": "none",
"path": "datastore",
"type": "levelds"
},
"mountpoint": "/",
"prefix": "leveldb.datastore",
"type": "measure"
}
],
"type": "mount"
},
"StorageGCWatermark": 90,
"StorageMax": "10GB"
},
"Discovery": {
"MDNS": {
"Enabled": true,
"Interval": 10
}
},
"Experimental": {
"FilestoreEnabled": true,
"Libp2pStreamMounting": false,
"P2pHttpProxy": false,
"PreferTLS": false,
"QUIC": false,
"ShardingEnabled": false,
"StrategicProviding": false,
"UrlstoreEnabled": false
},
"Gateway": {
"APICommands": [],
"HTTPHeaders": {
"Access-Control-Allow-Headers": [
"X-Requested-With",
"Range",
"User-Agent"
],
"Access-Control-Allow-Methods": [
"GET"
],
"Access-Control-Allow-Origin": [
"*"
]
},
"NoFetch": false,
"PathPrefixes": [],
"RootRedirect": "",
"Writable": false
},
"Identity": {
"PeerID": "QmZpRbcmAEJUFUijNdFENoJDgfbnQ5mVu59cUKhJyU1SEv"
},
"Ipns": {
"RecordLifetime": "",
"RepublishPeriod": "",
"ResolveCacheSize": 128
},
"Mounts": {
"FuseAllowOther": false,
"IPFS": "/ipfs",
"IPNS": "/ipns"
},
"Plugins": {
"Plugins": null
},
"Provider": {
"Strategy": ""
},
"Pubsub": {
"DisableSigning": false,
"Router": "",
"StrictSignatureVerification": false
},
"Reprovider": {
"Interval": "12h",
"Strategy": "all"
},
"Routing": {
"Type": "dht"
},
"Swarm": {
"AddrFilters": null,
"ConnMgr": {
"GracePeriod": "40s",
"HighWater": 50,
"LowWater": 30,
"Type": "basic"
},
"DisableBandwidthMetrics": true,
"DisableNatPortMap": false,
"DisableRelay": false,
"EnableAutoNATService": true,
"EnableAutoRelay": true,
"EnableRelayHop": true
}
} |
@sidenaio you exposed your private key in your posted config file (I removed it for you and deleted the comment history). In the future running I'd recommend rotating your identity. AFAIK the easiest way to do this is to just create a new repo and copy over the identity components. |
@aschmahmann It's a test config generated for my previous comment. |
@Stebalien idea, add sanity check to libp2p-pnet, and print type of the Conn, if it fails. |
Double checking: you're not putting these nodes on the public network, right? Putting a node with:
Will not end will if you have a lot of nodes in your network. Otherwise, @Kubuxu is right. Try to reproduce this with |
@Stebalien, yes, this is a private network. We use a swarm key. We will try to reproduce with this branch. |
This was causing a double-write. fixes libp2p/go-libp2p-pnet#31 probably fixes ipfs/kubo#6802 fixes ipfs/kubo#6197
@Stebalien We caught the error
|
Ok, that's a TCP connection. |
Ok, I thought this might be related to a concurrent write in go-multistream but, really, that seems very unlikely. I'd still try upgrading to the latest master (includes a fix for that) to see if it helps, but this likely isn't solved. |
At this point, I have to assume that we're sharing the buffer somehow. |
This is really looking like some weird memory corruption.
Therefore, I have no idea how we could be reusing the buffer. @sidenaio Could you try updating to the latest version of Specifically, golang/go#24727. I see you're both using windows. |
After upgrading to |
@Stebalien so, we thought that issue was solved and removed a link to
|
The problem continues to appear. We are still using a custom branch with error handling for go-libp2p-pnet to avoid panics. Any ideas?
|
What's the error message printed before the stack trace? |
That is, the patch now includes the connection's type and error message. |
However, I have good news: we've ruled out:
Because we've swapped all of these out. Left over are:
|
@ridenaio could you also post |
We've done a bit of digging here and believe it might be a bug in go's runtime. Specifically, in go's handling of cancellation in Windows AsyncIO may, if Windows does something unexpected, fail to update the "bytes read" length. But we can't pin-point the issue or reproduce it reliably enough to debug. |
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days. |
i'm also getting this, reported here #7194 |
Version information:
go-ipfs version: b8ec598
Description:
Ipfs node crashes
Stacktrace :
The text was updated successfully, but these errors were encountered: