Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer discovery regression on Windows #5146

Closed
alanshaw opened this issue Jun 22, 2018 · 21 comments
Closed

Peer discovery regression on Windows #5146

alanshaw opened this issue Jun 22, 2018 · 21 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@alanshaw
Copy link
Member

Version information:

Regression between 0.4.13 and 0.4.14/0.4.15

Type:

Bug

Description:

On Windows, two nodes spawned on the same machine do not discover each other in 0.4.14 or 0.4.15, but they used to in 0.4.13.

The following demonstrates this error:

Install ipfsd-ctl and go-ipfs 0.4.13:

npm i ipfsd-ctl go-ipfs-dep@0.4.13

Create file test-peer-discovery.js and add the following:

const IpfsFactory = require('ipfsd-ctl')
const factory = IpfsFactory.create()

;[1, 2].forEach(id => {
  console.log(`node ${id} spawning...`)
  factory.spawn((err, ipfsd) => {
    if (err) throw err
    waitForPeers(id, ipfsd.api)
  })
})

function waitForPeers (id, node) {
  console.log(`node ${id} checking for peers...`)
  node.swarm.peers((err, peers) => {
    if (err) throw err
    console.log(`node ${id} has ${peers.length} peers`)
    if (!peers.length) return setTimeout(() => waitForPeers(id, node), 1000)
    peers.forEach(p => console.log(p.peer.toB58String()))
    node.stop()
  })
}

Run:

node test-peer-discovery.js

Output:

$ node test-peer-discovery.js
node 1 spawning...
node 2 spawning...
node 1 checking for peers...
node 2 checking for peers...
node 1 has 2 peers
QmSoLSafTMBsPKadTEgaXctDQVcqN88CNLHXMkTNwMKPnu
QmSoLer265NRgSp2LA3dPaeykiS1J6DifTC88f5uVQKNAd
node 2 has 2 peers
QmSoLPppuBtQSGwKDZT2M73ULpjvfd3aZ6ha4oFGL1KrGM
QmSoLer265NRgSp2LA3dPaeykiS1J6DifTC88f5uVQKNAd

Now install go-ipfs 0.4.14 or 0.4.15:

npm i go-ipfs-dep@0.4.15

Re-run, and output is now:

$ node test-peer-discovery.js
node 1 spawning...
node 2 spawning...
node 2 checking for peers...
node 1 checking for peers...
node 2 has 0 peers
node 1 has 0 peers
node 2 checking for peers...
node 1 checking for peers...
node 1 has 0 peers
node 2 has 0 peers
node 1 checking for peers...
node 2 checking for peers...
node 1 has 0 peers
node 2 has 0 peers
node 1 checking for peers...
node 2 checking for peers...
node 1 has 0 peers
node 2 has 0 peers
node 1 checking for peers...
node 2 checking for peers...
node 2 has 0 peers
node 1 has 0 peers
node 2 checking for peers...
node 1 checking for peers...
node 2 has 0 peers
node 1 has 0 peers
node 2 checking for peers...
node 1 checking for peers...
node 2 has 0 peers
node 1 has 0 peers
node 2 checking for peers...
node 1 checking for peers...
node 1 has 0 peers
node 2 has 0 peers
# ...forever

This seems to be the reason why our js-ipfs-api tests requiring connected nodes are failing.

@Stebalien
Copy link
Member

Does this not happen on Linux? On a hunch... try waiting a minute first.

My guess is that this is happening because:

  1. The DHT waits 5m before it actually bootstraps: Bootstrap immediately libp2p/go-libp2p-kad-dht#163
  2. In 0.4.14, I modified the IPNS republisher to wait 1 minute before the initial republish: Fix various IPNS issues #4440

I believe this initial IPNS republish may have acted like a DHT bootstrap, making your node findable in the DHT (usually).

@Stebalien Stebalien added the kind/bug A bug in existing code (including security flaws) label Jun 22, 2018
@Stebalien
Copy link
Member

Stebalien commented Jun 22, 2018

@alanshaw
Copy link
Member Author

alanshaw commented Jun 22, 2018

I haven't tried this exact example on linux, but on macOS it works on 0.4.15. Our js-ipfs-api tests are passing on linux and macOS also.

I have just tried leaving the example for over 10 minutes now and it's still reporting 0 peers.

I have Windows Defender turned off btw.

I'll try the InitialRebroadcastDelay change when I can but I'm not too hopeful about that one now...

@Stebalien
Copy link
Member

Hm. Damn. The usual rule here is: "when in doubt, blame reuseport" but that shouldn't be the issue here. However, just to check, try running with the environment variable IPFS_REUSEPORT set to false. That probably won't fix it, but we should rule it out.

@whyrusleeping
Copy link
Member

@Stebalien this is an mdns issue, not a DHT issue

@whyrusleeping
Copy link
Member

however, it looks like our mdns lib hasnt been updated in over a year.

@whyrusleeping
Copy link
Member

@alanshaw in your example above for 0.4.13, the two nodes your nodes are getting connections to are not eachother, its two bootstrap nodes.

@whyrusleeping
Copy link
Member

Looking into the js-ipfsd-ctl code, it seems that it replaces the config file uses with this one: https://github.com/ipfs/js-ipfsd-ctl/blob/master/src/defaults/config.json

This explicitly does not have mdns enabled, which requires a section like:

"Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },

The weird bit though, is that it also doesnt have any bootstrap nodes. So i'm not sure why the 0.4.13 daemon would be connecting to them...

@Stebalien
Copy link
Member

Does go-ipfs-dep use a precompiled IPFS or does it recompile it? Our MDNS library has some memory safety issues so the version of go it's compiled with may make a difference.

@alanshaw
Copy link
Member Author

alanshaw commented Jun 25, 2018

To address some of the comments above:

in your example above for 0.4.13, the two nodes your nodes are getting connections to are not each other, its two bootstrap nodes.

True! They do sometimes find each other though, but this can take many minutes. The other thing is that with 0.4.13 the nodes do accumulate peers over time (hundreds) after a minute:

screen shot 2018-06-25 at 13 30 35

With 0.4.15 no peers, nada, forever:

screen shot 2018-06-25 at 13 27 12

When I run this on macOS the output from the script is basically this every time:

screen shot 2018-06-25 at 13 25 28

Note: I modified the script a little, see below for updated version

This explicitly does not have mdns enabled, which requires a section like:

Yes, but when inspecting the config using ipfs.config.get() it does have that setting. Perhaps this is just a default that gets gets applied if you don't specify anything? Same for bootstrap nodes presumably...

Does go-ipfs-dep use a precompiled IPFS or does it recompile it?

It'll download a precompiled version if it has one for you system. Most node modules with a native aspect will do that and fallback to compiling on your system if not. I'm not sure if go-ipfs-dep does that. I'd hope so. In this case, a precompiled version IS being downloaded and used.


I modified the script a little for anyone who wants to try this. Adds peer IDs and colorizes the IDs for the two nodes we're spawning in red/blue.

// npm i async chalk ipfsd-ctl go-ipfs-dep
const auto = require('async/auto')
const IPFSFactory = require('ipfsd-ctl')
const chalk = require('chalk')
const factory = IPFSFactory.create()

console.log('spawning 2 nodes...')

auto({
  ipfsd0: (cb) => factory.spawn(cb),
  ipfsd1: (cb) => factory.spawn(cb),
  id0: ['ipfsd0', ({ ipfsd0 }, cb) => ipfsd0.api.id(cb)],
  id1: ['ipfsd1', ({ ipfsd1 }, cb) => ipfsd1.api.id(cb)]
}, (err, res) => {
  if (err) throw err

  res.ipfsd0.peerId = res.id0
  res.ipfsd1.peerId = res.id1

  res.ipfsd0.color = chalk.red
  res.ipfsd1.color = chalk.blue

  console.log(`spawned node with ID ${chalk.red(res.ipfsd0.peerId.id)}`)
  console.log(`spawned node with ID ${chalk.blue(res.ipfsd1.peerId.id)}`)

  waitForPeer(res.ipfsd0, res.ipfsd1)
  waitForPeer(res.ipfsd1, res.ipfsd0)
})

function waitForPeer (ipfsd, otherIpfsd) {
  const coloredId = (id) => {
    if (id === ipfsd.peerId.id) return ipfsd.color(id)
    if (id === otherIpfsd.peerId.id) return otherIpfsd.color(id)
    return id
  }

  ipfsd.api.swarm.peers((err, peers) => {
    if (err) throw err

    console.log(`node ${coloredId(ipfsd.peerId.id)} has ${peers.length} peers:`)
    peers.forEach((p, i) => {
      console.log(`  ${i + 1}. ${coloredId(p.peer.toB58String())}`)
    })

    const found = peers.some(p => p.peer.toB58String() === otherIpfsd.peerId.id)

    if (!found) return setTimeout(() => waitForPeer(ipfsd, otherIpfsd), 1000)
    ipfsd.stop()
  })
}

@alanshaw
Copy link
Member Author

Another data point:

If I run ipfs daemon in Windows (go-ipfs 0.4.15) with all the defaults it's fine - I get a bunch of swarm peers really quickly. The only difference I can tell between doing it like this and using ipfsd-ctl is that daemons are spawned with swarm address /ip4/127.0.0.1/tcp/0. i.e. pick a port. Could that be an issue?

@whyrusleeping
Copy link
Member

@alanshaw port zero shouldnt make a difference. The other difference i see is that ipfsd-ctl doesnt add any bootstrap nodes. No bootstrap nodes means no connections. Unless you are explicitly adding bootstrap nodes, seeing no connections is the expected behaviour. The behavior on 0.4.13 is the weird one to me

@alanshaw
Copy link
Member Author

I've logged out the response from ipfs.config.get('Bootstrap') for both nodes:

0.4.13

screen shot 2018-06-25 at 21 03 19

0.4.15

screen shot 2018-06-25 at 21 01 49

I'm going to explicitly start the nodes with those bootstrap servers just incase ipfs.config.get is giving me back some defaults because nothing is set...will report back.

@alanshaw
Copy link
Member Author

Explicitly adding those bootstrap addrs doesn't change anything.

If I set Bootstrap: [] then I see 0.4.13 behave exactly the same as 0.4.15.

@alanshaw
Copy link
Member Author

Ok I think I have found the issue:

On Windows in 0.4.14 and 0.4.15 if you set Address.API to any port other than 5001, including 0, then the node will never get any peers.

If you set Address.API: '/ip4/127.0.0.1/tcp/5001' then the peer with that API address will acquire peers.

ipfsd-ctl sets this to /ip4/127.0.0.1/tcp/0 (https://github.com/ipfs/js-ipfsd-ctl/blob/master/src/defaults/config.json#L16) which is why nodes that it spawns never get any peers.

@Stebalien
Copy link
Member

Try logging the commands on the IPFS daemon, running on both 5001 and on a different port. I'm guessing you'll see something run a command on the 5001 daemon but not the other. There's probably some test tool using the default port.

@Stebalien
Copy link
Member

That is, try the command ipfs diag cmds.

@negamaxi
Copy link

negamaxi commented Jul 9, 2018

Is there any workaround to get peers with API port different than 5001? Same issue on Linux.

@Stebalien
Copy link
Member

@negamaxi this is probably not your issue.

@alanshaw have you made any progress on this?

@alanshaw
Copy link
Member Author

alanshaw commented Jul 25, 2018

Yes! I believe this is fixed in go-ipfs 0.4.16 🎉

With 0.4.15 you can see the behaviour on windows if you take the following steps (I'm using the git bash so it's unix like):

  1. ipfs init
  2. vi ~/.ipfs/config and change Addresses config to:
     "Addresses": {
       "Swarm": [
         "/ip4/127.0.0.1/tcp/0"
       ],
       "Announce": [],
       "NoAnnounce": [],
       "API": "/ip4/127.0.0.1/tcp/0",
       "Gateway": "/ip4/127.0.0.1/tcp/0"
     },
    
  3. ipfs daemon &
  4. ipfs swarm peers - no peers - ever!

By default ipfs init will give you a swarm address in your config like /ip4/0.0.0.0/tcp/4001 - if you change the config in the steps above to use 0.0.0.0 instead of 127.0.0.1 then it works.

If you do the exact same steps using go-ipfs 0.4.16 it works, so I believe this is fixed.

@Stebalien
Copy link
Member

Ah. That makes more sense. It has nothing to do with the API port, right? Looks like https://github.com/libp2p/go-libp2p/blob/master/NEWS.md#dialing-and-source-addresses.

Closing as that issue has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

4 participants