Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal IP address is used on EC2 hosts instead of Elastic IP Address #230

Closed
threejeez opened this issue Aug 3, 2018 · 4 comments
Closed
Labels
kind/support A question or request for support status/deferred Conscious decision to pause or backlog

Comments

@threejeez
Copy link

Type: Question/Enhancement

Severity: High

Description:

I'm running a bootstrap node (B) and a node that connects to it that provides some content (C). Both nodes are running on an EC2 host and each instance has a NIC with a private IP address as well as an associated public elastic IP address. I then launch a node on my local PC (P) using the Bootstrap node and call findProviders to find the content on node C. While the content can be found through node B, my local node cannot connect to C because the IP address in the PeerInfo object for C is the internal one instead of the public Elastic IP.

To track this down I turned on debug logging and saw the following lines:

libp2p:dht:query:� ������كO�X��_��Ь`���O��geIK�G:Qmf4BRdg queue:work +0ms
libp2p:dht:net:Qmf4BRdg sending to: QmYqacKLM4ST232aSRuTCVzvYxz1rA1mMhN3fAuRBBzzAN +206ms
libp2p:switch:dial dialing QmYqacKLM4ST232aSRuTCVzvYxz1rA1mMhN3fAuRBBzzAN +205ms
libp2p:switch:dial dialing transport TCP +0ms
libp2p:switch:transport dialing TCP [ '/ip4/10.0.0.246/tcp/3000/ipfs/QmYqacKLM4ST232aSRuTCVzvYxz1rA1mMhN3fAuRBBzzAN' ] +18s
libp2p:swarm:dialer dialMany:start +18s
libp2p:swarm:dialer dialSingle: QmYqacKLM4ST232aSRuTCVzvYxz1rA1mMhN3fAuRBBzzAN:/ip4/10.0.0.246/tcp/3000/ipfs/QmYqacKLM4ST232aSRuTCVzvYxz1rA1mMhN3fAuRBBzzAN +1ms```

While the content does reside on node `QmYqacKLM4ST232aSRuTCVzvYxz1rA1mMhN3fAuRBBzzAN`, it will obviously fail to dial the node because `10.0.0.246` is the private IP Address for the machine.

We obviously can't bind libp2p to the public Elastic IP since it's not an IP that's associated with an actual NIC. I've looked into `multiaddress`'s `encapsulate` but haven't success with that.

So, the question is: what can be done to get the IP address that's broadcast to the network to be the public Elastic IP address instead for the private one? Does libp2p support this functionality or does it need to be enhanced?

#### Steps to reproduce the error:

1. Run a bootstrap node on an EC2 host
2. Run a node that provides content on another EC2 host and use the bootrap node from step 1
3. Start another node outside of AWS (local PC, for example) and connect to the network via the bootstrap node from step 1
4. Try to `findProviders` from the node in step 3
@threejeez
Copy link
Author

Correct me if I'm wrong, but I think this has to do with NAT traversal and is related to issue #104

@jacobheun
Copy link
Contributor

This is something that we should be able to address with NAT Traversal. Ideally the Nat Manager will be able to allow node C, via the relay through node B, to have node P dial it on its public address so the direct connection can be made.

@daviddias daviddias added kind/support A question or request for support status/deferred Conscious decision to pause or backlog labels Aug 15, 2018
@jeshocarmel
Copy link

Hi @threejeez any solution you found? I'm having the same issue as well

@threejeez
Copy link
Author

threejeez commented Jun 20, 2019

Yes. The machine you're running in has to have an public facing IP and you need to pass that IP into your application on the command line and tell libp2p to use it. Here's the go code we wrote to accomplish that:

    multiaddrString := fmt.Sprintf("/ip4/%s/tcp/%d", bindIP, libp2pPort)

    // 0.0.0.0 will listen on any interface device.
    sourceMultiAddr, _ := maddr.NewMultiaddr(multiaddrString)

    var extMultiAddr maddr.Multiaddr
    if broadcastIP == "" {
        a8util.Log.Warning("External IP not defined, Peers might not be able to resolve this node if behind NAT\n")
    } else {
        // here we're creating the multiaddr that others should use to connect to me
        extMultiAddr, err = maddr.NewMultiaddr(fmt.Sprintf("/ip4/%s/tcp/%d", broadcastIP, libp2pPort))
        if err != nil {
            a8util.Log.Errorf("Error creating multiaddress: %v\n", err)
            // return nil, err
        }
    }
    addressFactory := func(addrs []maddr.Multiaddr) []maddr.Multiaddr {
        if extMultiAddr != nil {
            // here we're appending the external facing multiaddr we created above to the addressFactory so it will be broadcast out when I connect to a bootstrap node.
            addrs = append(addrs, extMultiAddr)
        }
        return addrs
    }

    host, err := libp2p.New(
        ctx,
        libp2p.ListenAddrs(sourceMultiAddr),
        libp2p.Identity(prvKey),
        libp2p.AddrsFactory(addressFactory),
    )

bindIP is basically always 0.0.0.0.
broadcastIP is the public IP of the EC2 host passed from the CLI. This is what the node will tell to peers about how to connect to it. This is the bit you need.

jacobheun pushed a commit to jacobheun/js-libp2p that referenced this issue Jul 29, 2019
jacobheun added a commit to jacobheun/js-libp2p that referenced this issue Jul 29, 2019
* chore: update contributors

* chore: release version v0.29.0

* fix: move emitters to last thing in the method (libp2p#218)

* fix: move emitters to last thing in the method

* fix: setImmediate everything

* chore: update contributors

* chore: release version v0.29.1

* fix: move 'pull-stream' from devDependencies to dependencies (libp2p#220)

'pull-stream' package is needed in dependencies because it is used in './src/limit-dialer/queue.js'.

* chore: update deps

* chore: update contributors

* chore: release version v0.29.2

* feat: dial to PeerId and/or Multiaddr in addition to PeerInfo (libp2p#222)

* chore: update deps

* feat: support dial to peerId and/or multiaddr in adition to peerInfo

* chore: update CI

* chore: update contributors

* chore: release version v0.30.0

* chore: no sauce

* chore: update deps

* chore: update contributors

* chore: release version v0.31.0

* fix: use the right callback

* chore: update deps

* chore: update contributors

* chore: release version v0.31.1

* feat: increase maxListeners to Infinity (libp2p#226)

* feat: increase maxListeners to Infinity

ipfs/js-ipfs-bitswap#142 (comment)

* fix linting

* chore: update deps

* chore: update contributors

* chore: release version v0.31.2

* feat: p2p addrs situation (libp2p#229)

* chore: update gitignore and CI

* chore: update deps

* test: update tests to new p2p-webrtc-star multiaddr format

* chore: update contributors

* chore: release version v0.32.0

* chore: update deps

* chore: update contributors

* chore: release version v0.32.1

* fix: remove unused protocol-buffers dep (libp2p#230)

* chore: update contributors

* chore: release version v0.32.2

* chore: update deps

* chore: update contributors

* chore: release version v0.32.3

* chore: update deps

* fix: increase dial timeout

* chore: update contributors

* chore: release version v0.32.4

* feat: Circuit Relay (libp2p#224)

* chore: update deps

* chore: update contributors

* chore: release version v0.33.0

* fix: don't dial on relay if not enabled (libp2p#234)

* chore: update deps

* chore: fix package.json

* chore: update contributors

* chore: release version v0.33.1

* chore: update deps

* fix: don't dial circuit if no transports available (libp2p#236)

* chore: update contributors

* chore: release version v0.33.2

* fix: circuit dialing

* feat: fix circuit dialing

* chore: upgrade deps

* chore: update circle ci config

* chore: adding missing dev dependency

* fix: removing unused dependency

* test: adding tests

* fix: remove unused dep

* chore: updating CI files (libp2p#238)

* chore: update contributors

* chore: release version v0.34.0

* chore: use latest SECIO API

* chore: update deps

* feat: use latest secio API

* chore: update deps

* chore: update contributors

* chore: release version v0.35.0

* chore: update deps

* chore: update contributors

* chore: release version v0.35.1

* docs: update name references and API touches

* chore: update name references

* refactor: update name to switch, make it a class and rename start and stop methods

* test: refactor tcp transport tests to avoid code duplication

* test: reuse same test code for Websockets, remove code duplication

* test: update aegir pre and post hooks

* chore: use pre-push instead

* test: update and deduplicate code on stream muxing tests

* test: restructure test suits

* test: refactor swarm-no-muxing tests

* test: refactor circuit-relay tests

* test: refactor browser tests too

* style: fix linting

* fix: enableCircuitRelay is async and therefore needs a callback

* fix: transports.add does not need to be async at all

* docs: fix badges

* test: Linux does not like that we use multiple sockets with port 0

* test: fix test

* chore: update contributors

* chore: release version v0.36.0

* chore: update deps

* chore: update contributors

* chore: release version v0.36.1

* feat: use mplex, update CI

* docs: typo

* feat: observe traffic and expose statistics (libp2p#243)

* chore: update deps

* chore: update contributors

* chore: release version v0.37.0

* fix: for when handler func is not defined

* fix: for when peerinfo resolves to undefined

* chore: update contributors

* chore: release version v0.37.1

* chore: update deps

* chore: update contributors

* chore: release version v0.37.2

* fix: one more observer edge case

* chore: update deps

* chore: fix linting

* test: fix transport tests before all step by increasing the timeout

* chore: update contributors

* chore: release version v0.37.3

* chore: update deps

Chore which i think fixes this issue also
https://github.com/libp2p/js-libp2p-switch/issues/235

* fix: revert version back to the current release

fix for https://github.com/libp2p/js-libp2p-switch/pull/249/files#r178832198

* chore: update deps

* chore: update deps

* chore: update contributors

* chore: update deps

* test: timeout

* chore: update contributors

* chore: release version v0.39.0

* chore: update deps

* chore: update contributors

* chore: update deps

* chore: update contributors

* chore: release version v0.39.2

* feat: improve circuit err messages (libp2p#250)

* feat: improve circuit err handling

* feat: add test to to validate err when circuit not enabled

* refactor: update files and add jsdocs to improve readability

refactor: initial refactor of dial.js

refactor: add more jsdocs to dial and clean up some code

refactor: make get-peer-info more readable

fix: jsdocs in dial

docs: update some jsdocs

refactor: make dial.js a bit easier to consume

fix: fix linting

docs: add more jsdocs and comments

refactor: clean up dial methods and encryption order

* test: add tests for get-peer-info

* docs: remove answered todo comment

answered at libp2p/js-libp2p-switch#252 (comment)

* fix: dont create base conn when muxed exists

* fix: tests and conflicts

* chore: update deps

* chore: update contributors

* chore: release version v0.40.0

* test: fix require of multiplex

* fix: libp2p#189 Prevent self-dial

* test: add selfdial test

* chore: add lead maintainer

* chore: update contributors

* chore: update contributors

* chore: release version v0.40.1

* fix: return on call to nextMuxer

When the call to multistream.Dialer.select is unsuccessful, call nextMuxer to try select the next one in the list but do not continue executing callback afterwards.

License: MIT
Signed-off-by: Alan Shaw <alan@tableflip.io>

* fix: drop connection when stream ends unexpectedly

Pull streams pass true in the error position when the sream ends.
In https://github.com/multiformats/js-multistream-select/blob/5b19358b91850b528b3f93babd60d63ddcf56a99/src/select.js#L18-L21
...we're getting lots of instances of pull-length-prefixed stream
erroring early with `true` and it's passed back up to the dialer
in https://github.com/libp2p/js-libp2p-switch/blob/fef2d11850379a4720bb9c736236a81a067dc901/src/dial.js#L238-L241

The `_createMuxedConnection` contains an assumption that any error
that occurs when trying `_attemptMuxerUpgrade` is ok, and keeps the
relveant baseConnecton in the cache. If the pull-stream has ended
unexpectedly then keeping the connection arround starts causing
the "already piped" errors when we try and use the it later.

This PR adds a guard to avoid putting the connection back into the
cache if the stream has ended.

There is related work in an old PR to add a check for exactly this issue in
pull-length-prefixed dignifiedquire/pull-length-prefixed#8
...but it's still open, so this PR adds a check for true in
the error position at the site where the "already piped" errors
were appearing. Once the PR on pull-length-prefixed is merged this
check can be removed. It's not ideal to have it in this code as it
is far removed from the source, but it fixes the issue for now.

Arguably anywhere that `msDialer.handle` is called should do the
same check, but we're not seeing this error occur anywhere else so
to keep this PR small, I've left it as the minimal changeset to
fix the issue.

Of note, we had to add '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star'
to the swarm config to trigger the "already piped" errors. There
is a minimal test app here https://github.com/tableflip/js-ipfs-already-piped-error

Manual testing shows ~50 streams fail in the first 2 mins of
running a node, and then things stabalise with ~90 active muxed
connections after that.

Fixes libp2p#235
Fixes ipfs/js-ipfs#1366
See dignifiedquire/pull-length-prefixed#8

License: MIT
Signed-off-by: Oli Evans <oli@tableflip.io>

* fix: add utility methods to prevent already piped error

* chore: update contributors

* chore: release version v0.40.2

* fix: prevent undefined error during a mutual hangup

* chore: update contributors

* chore: release version v0.40.3

* feat: swap quick-lru by hashlru

This removes the only dependency using generators in the ipfs/libp2p ecosystem.
Next version of create-react-app will support ipfs out-of-box with this change.

* chore: update contributors

* chore: release version v0.40.4

* fix: stats - observer expects protocolTag

* fix: re-enable stats tests in node

* chore: Upgrade big.js to 5.1.2

* chore: Change require('big.js') to require('big.js').Big

* chore: update contributors

* chore: release version v0.40.5

* fix: no stats on multistream proto dial

* fix: adjust test values

* fix: handle error in protocol handshake

* chore: update contributors

* chore: release version v0.40.6

* chore: remove travis and circleci

* Add private network support (libp2p#266)

* feat: add support for private networks

fix: update protector.protect usage
chore: fix linting and update deps
test: add secio to pnet tests
docs: add private network info the readme
chore: update pnet package version
test: add skipped test back in and update it

* fix: improve erroring around invalid peers

docs: add some comments
chore: update deps
test: simplify identify test

* chore: update contributors

* chore: release version v0.40.7

* test: add sample network circuit relay tests (libp2p#275)

* test: add sample network circuit relay tests

* test: use ephemeral ports

* chore: update deps

chore: remove test pre-push
chore: update test ports

* chore: update contributors

* chore: release version v0.40.8

* chore: update mplex and stats test numbers

* feat: make switch a state machine (libp2p#278)

* feat: add basic state machine functionality to switch

* feat: make connections state machines

* refactor: clean up logs

* feat: add dialFSM to the switch

* feat: add better support for closing connections

* test: add tests for some uncovered lines

* feat: add warning emitter for muxer upgrade failed

* docs: update readme

* chore: update contributors

* chore: release version v0.41.0

* fix: ignore dial request when one is in progress (libp2p#283)

* chore: update contributors

* chore: release version v0.41.1

* fix: improve connection closing and error handling (libp2p#285)

* fix: improve connection closing and error handling

* test: improve identify test

*  chore: update deps

* fix: only emit from connections if there is a listener

* test: add more connection tests

* chore: update libp2p-mplex

* fix: dont dial an address that we have

* fix: ensure circuit listens last on start

* chore: update npm publish files

* chore: update contributors

* chore: release version v0.41.2

* fix: use retimer to avoid creating so many timers (libp2p#289)

* use retimer to avoid scheduling so many timers

* Fixed linting

* fix: improve connection tracking and closing (libp2p#291)

* chore: update deps

* fix: check we have a proper transport before filtering addresses

* fix: improve connection close on stop

* fix: improve stat stopping

* test: fix stats test

* fix: improve tracking of open connections

* chore: remove log

* fix: stats stop in browser

chore: fix linting and browser tests

* fix: remove uneeded set peer info

* fix: abort the base connection on close

* fix: catch edge cases of dialTimeout calling back twice

* fix: close all connections instead of checking peerbook peers

* test: update dial fsm test waits

* test: make parallel dial tests deterministic

fix: improve logic around disconnecting

fix: remove duplicate event handling logic

* chore: fix lint

* test: improve test reliability

* chore: update contributors

* chore: release version v0.41.3

* refactor: stat use for over forEach (libp2p#295)

forEach is 10x slower than a regular for(;;) loop, and it should
be avoided in hot code paths.

* fix: avoid sync callback in async functions (libp2p#297)

* fix: avoid sync callback in async functions

* test: add error check

* refactor: clean up async usage

* chore: clean up

* refactor: remove async waterfall usage on identify

* chore: fix linting

* chore: update contributors

* chore: release version v0.41.4

* fix: peerBook undefined libp2p#299

* fix: reduce bundle size (libp2p#292)

* fix: reduce bundle size

* fix: use bignumber everywhere

* chore: update deps

* chore: update contributors

* chore: release version v0.41.5

* fix: import async/setImmediate to avoid webpack errors (libp2p#303)

* test: add pull-mplex to test suite (libp2p#305)

* chore: use travis
* chore: update dependencies

* fix: dial in series until we have proper abort support (libp2p#306)

refactor: simplify the circuit dial logic

chore: remove travis windows cache

refactor: clean up dial many error logic

test: explicitly set correct address

test(refactor): update order of echo logic and add after

refactor: cleanup per feedback

* chore: update contributors

* chore: release version v0.41.6

* fix: peer disconnect event and improve logging performance (libp2p#309)

* fix: only emit disconnects from muxed conns

* fix: update disconnect logic

* chore: clean up logging to prevent unneeded string formatting

* chore: fix spelling

* chore: update contributors

* chore: release version v0.41.7

* feat: add basic dial queue to avoid many connections to peer (libp2p#310)

BREAKING CHANGE: This adds a very basic dial queue peer peer.
This will prevent multiple, simultaneous dial requests to the same
peer from creating multiple connections. The requests will be queued
per peer, and will leverage the same connection when possible.
The breaking change here is that `.dial`, will no longer return a
connection. js-libp2p, circuit relay, and kad-dht, which use `.dial`
were not using the returned connection. So while this is a breaking change
it should not break the existing libp2p stack. If custom applications
are leveraging the returned connection, they will need to convert to only
using the connection returned via the callback.

* chore: dont log priviatized unless it actually happened
* refactor: only get our addresses for filtering once

* feat: update identify to include supported protocols (libp2p#311)

* chore: update contributors

* chore: release version v0.42.0

* fix: ensure dials always use the latest PeerInfo from the PeerBook (libp2p#312)

* fix: ensure dials always use the latest PeerInfo from the PeerBook

This fixes an issue where if dial is called with a new instance
of PeerInfo, if it is the first dial to that peer, the queue was
forever associated with that instance. This is currently the case
when Circuit checks the HOP status of a potential relay. This ensures
that whenever we dial, we are updating the peer book and using the
latest PeerInfo in that dial request.

* test: add test for get peer info

* refactor: just use id with dialer queue

* chore: update contributors

* chore: release version v0.42.1

* fix: identify on dial (libp2p#313)

* chore: update contributors

* chore: release version v0.42.2

* feat: global dial queue (libp2p#314)

* feat: add a general queue to limit all dials

* fix: improve queue count logic and add better abort

* feat: add a basic blacklist

* fix: abort dial queue on error instead of stop

* feat: add a crude priority lane

* test: add test for blacklist error

* fix: make blacklist and max dials configurable

* refactor: blacklist after callback

* test: improve testings around blacklisting

* chore: update contributors

* chore: release version v0.42.3

* fix: improve dial queue and parallel dials (libp2p#315)

* feat: allow dialer queues to do many requests to a peer

* fix: parallel dials and validate cancelled conns

* feat: make dial timeout configurable

* fix: allow already connected peers to dial immediately

* refactor: add dial timeout to consts file

* fix: keep better track of in progress queues

* refactor: make dials race

* chore: update contributors

* chore: release version v0.42.4

* feat: limit the number of cold calls we can do (libp2p#316)

* feat: limit the number of cold calls we can do

* feat: add a backoff to blacklisting

* refactor: make cold calls configurable

* fix: make blacklist duration longer

* fix: improve blacklisting

* test: add some tests for queue

* feat: add jitter to blacklist ttl

* test: validate cold queue is removed

* feat: purge old queues every hour

* test: fix aegir post script node shutdown

* fix: abort the cold call queue on manager abort

* fix: improve queue cleanup and lower interval to 15 mins

* fix: improve connection tracking (libp2p#318)

* fix: centralize connection events and peer connects

* fix: remove unneeded peerBook put

* chore: update contributors

* chore: release version v0.42.5

* fix: dont blacklist good peers (libp2p#319)

* fix: revert to try each (libp2p#320)

* chore: update contributors

* chore: release version v0.42.6

* fix: missing queue (libp2p#323)

* fix: improve stopping logic (libp2p#324)

* chore: update contributors

* chore: release version v0.42.7

* chore: add discourse badge (libp2p#327)

* fix: dial self (libp2p#329)

* feat: support a priority queue for dials (libp2p#325)

* chore: update contributors

* chore: release version v0.42.8

* fix: dont compare empty strings (libp2p#330)

* chore: update contributors

* chore: release version v0.42.9

* fix: resolve transport sort order in browsers (libp2p#333)

* fix: resolve transport sort order in browsers

* fix: update sort logic

* fix: dont use peerinfo distinct (libp2p#334)

* fix: dont use peerinfo distinct

* refactor: remove unneeded code

* refactor: clean up

* refactor: fix feedback

* chore: update contributors

* chore: release version v0.42.10

* fix(stats): prevent 0ms timeDiff breaking movingAverage (libp2p#336)

* stats - stat - prevent 0ms timeDiff breaking movingAverage

* chore: remove commitlint

* chore: update contributors

* chore: release version v0.42.11

* fix: dont blindly add observed addresses to our list (libp2p#337)

Until we can properly validate the observed address our
peer tells us about, we shouldnt blindly add it to our
address list. Until we have better NAT management we cant
reliably validate that we're adding an appropriate address
for ourselves.

* fix: clear blacklist for peer when connection is established (libp2p#340)

* chore: update contributors

* chore: release version v0.42.12

* refactor: move switch into src/switch

* refactor: cleanup switch and move tests into test dir
maschad pushed a commit to maschad/js-libp2p that referenced this issue Jun 21, 2023
maschad pushed a commit to maschad/js-libp2p that referenced this issue Jun 21, 2023
maschad pushed a commit to maschad/js-libp2p that referenced this issue Jun 21, 2023
Instead of inserting the interface address into the metric name,
use the metric address as a label prefix for the value being reported.

This allows our metric names to be stable even if you don't
know the ip/port combo that will be used ahead of time.

The tradeoff is the label names may change between restarts if
the port number changes, but we have to apply a disambguator somewhere.

Depends on:

- [ ] libp2p/js-libp2p-prometheus-metrics#6
maschad pushed a commit to maschad/js-libp2p that referenced this issue Jun 21, 2023
## [6.0.4](libp2p/js-libp2p-tcp@v6.0.3...v6.0.4) (2022-11-22)

### Bug Fixes

* use labels to differentiate interfaces for metrics ([libp2p#230](libp2p/js-libp2p-tcp#230)) ([6c4c316](libp2p/js-libp2p-tcp@6c4c316))
maschad pushed a commit to maschad/js-libp2p that referenced this issue Jun 21, 2023
maschad pushed a commit to maschad/js-libp2p that referenced this issue Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support A question or request for support status/deferred Conscious decision to pause or backlog
Projects
None yet
Development

No branches or pull requests

4 participants