-
Notifications
You must be signed in to change notification settings - Fork 124
fix: revert to serialized pubsub operations #319
Conversation
During the refactor I took the opportunity to parallelise some pubsub operations that didn't explicitly depend on each other. This worked perfectly while testing locally, but on CI it is a different story. I found that tests for js-ipfs-api (which are run against go-ipfs) failed for seemingly random reasons. After much investigation I finally tried re-serialising the operations I had refactored to be parallel and the tests started to pass. It seems that the pubsub implementation in go-ipfs has some issues with concurrency. I also found two intermittent issues with `swarm.connect` in go-ipfs (seen significantly more often on CI): 1. Issuing two calls to this function from the same node might end up in the second not actually creating a connection and no error message reported to the user 2. Even after the response to the user it takes a few milliseconds for a connection to actually be connected I intend to open issues on go-ipfs and write examples demonstrating these problems. I created a utility function `connect` to temporarily mitigate these issues in one place. The utility serialises calls from a single node to another and pauses after each to allow the connection to properly establish. License: MIT Signed-off-by: Alan Shaw <alan@tableflip.io>
License: MIT Signed-off-by: Alan Shaw <alan@tableflip.io>
@richardschneider hit this many times in the past and got documented somewhere. Can you make that there is a issue on go-ipfs? //cc @Stebalien |
I will triple check but all signs seem to be indicating that this is the case. I've seen these tests pass on js-ipfs on CI with the concurrency i.e. without this PR. Like I said in the PR description:
😉 |
Which version of go-ipfs were you using?
If you call
One side will generally learn about the connection before the other. Is that what's happening here? |
Sorry I should have been more explicit - two calls to connect but with different addresses
hmm that's an interesting one, does the response to connect come before both nodes have been connected together? I'd have to double check the code but in think in general when we need node A to know about B we'll connect A to B rather than B to A before issuing commands. |
No, it's just that one node learns that the connection is ready before the other. We could delay until the other learns but that'd cost us a round trip. For example, take TCP. In a TCP connection, the client always learns that the connection is ready before the server and can start sending data immediately after sending it's ACK (but the server needs to wait for the ACK before sending data).
That's the right way to do this.
For two different nodes? That's definitely not supposed to happen. I'm trying to reproduce but I can't (connecting and disconnecting from two nodes in a loop). Is the failing connect call returning no response or just no error? That is, is it returning a success response (the peer ID of the target peer)? |
During the refactor (#290) I took the opportunity to parallelise some pubsub operations that didn't explicitly depend on each other. This worked perfectly while testing locally, but on CI it is a different story. I found that tests for js-ipfs-api (which are run against go-ipfs) failed for seemingly random reasons.
After much investigation I finally tried re-serialising the operations I had refactored to be parallel and the tests started to pass. It seems that the pubsub implementation in go-ipfs has some issues with concurrency.
I also found two intermittent issues with
swarm.connect
in go-ipfs (seen significantly more often on CI):I intend to open issues on go-ipfs and write examples demonstrating these problems.
I created a utility function
connect
to temporarily mitigate these issues in one place. The utility serialises calls from a single node to another and pauses after each to allow the connection to properly establish.Merging this PR will bring us significantly closer to merging ipfs-inactive/js-ipfs-http-client#785