Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Stable transports in the browser #1088

Closed
Beanow opened this issue Nov 18, 2017 · 24 comments
Closed

Stable transports in the browser #1088

Beanow opened this issue Nov 18, 2017 · 24 comments

Comments

@Beanow
Copy link

Beanow commented Nov 18, 2017

  • Version: 0.26.0
  • Platform: Firefox Quantum / Chromium
  • Subsystem: swarm, pubsub, transports

Type: Bug / Enhancement

Severity: High

Description:

There's a good amount of pitfalls with the js-ipfs transports in the browser today.
Bundled with 0.26.0 are:

  • libp2p-websockets
  • libp2p-webrtc-star
  • libp2p-railing

libp2p-webrtc-star

As in #950, today libp2p-webrtc-star is crashing the browser.

image
image

Possible solutions:

@daviddias
Copy link
Member

Thank you for opening this issue, @Beanow :)

I believe we can have an interim solution with libp2p-websocket-star libp2p/js-libp2p#122 (comment)

One other note I want to add to this thread is that PubSub itself does relay, so in practice you can have a hosted node that subscribes to the same PubSub channel and that node will relay messages for you.

@alvestrand
Copy link

  1. How many PeerConnections did you create?
  2. Did you get anything useful done with them, or did you just create and park them?
    I'm currently debugging an issue that seems to be number-of-threads related.

@mitra42
Copy link

mitra42 commented Nov 28, 2017

@alvestrand , I'm not sure who the question was addressed to, but we see this without explicitly creating any connections (just creating and starting IPFS), all the decisions on how many connections to open, and how many threads is happening in lower layers.

@diasdavid - could you post HERE, what is needed for the workaround as libp2p/js-libp2p#122 (comment) seems to have just configurations people are reporting problems with. In particular - need to know what interim configuration to use, anything in package.json (e.g. non-default versions). I'll be happy to test our apps against it.

@nils-ohlmeier
Copy link

@alvestrand when opening the demo URL from issue #950 I see Firefox opening lots and lots of PeerConnections with DataChannels in them. It also appears to close PCs, but the number of open PCs keep going up. I'm tempted to put in an upper boundary of PeerConnection's per domain to prevent pages from crashing Firefox by exhausting memory and/or threads.

@mitra42 what I don't quite understand is that this is IPFS project, right? But in here and in issue #950 you state that you don't open connections. Which JS library in your stack opens the PeerConnections? Who ever maintains that library needs to understand that PeerConnections are not as cheap to create as for example TCP connections. So who ever manages the PeerConnections needs keep a balance.

@mitra42
Copy link

mitra42 commented Nov 29, 2017

Hi Nils, There are lots of links in #950, so I'm not sure which "demo URL" you refer to. I don't know about Orbit, if that is what you mean, but I'm refering to our demos (on https://dweb.me/examples).

I said we don't EXPLICITLY open connections, we, and I believe the many others reporting bugs like this, are loading IPFS, using the recommended config for using pubsub which is required by Yjs which is required to implement append-only-logs. ipfs.start() and wait ~5-15 mins and it crashes.

The management of connections is done by the underlying IPFS libraries. I understand that @diasdavid thinks he knows how to fix it, and that is happening (I'm currently not sure if that is a fix to WebRTC-star, or the implementation of websockets-start).

{ repo: ..., config: Addresses: { Swarm: [ '/dns4/star-signal.cloud.ipfs.team/wss/p2p-webrtc-star']}, EXPERIMENTAL: { pubsub: true }}

@nils-ohlmeier
Copy link

So I just opened https://dweb.me/examples/example_list.html in Firefox 59 and let it sit for several minutes without doing anything. After maybe 15min or so Firefox had opened 407 PeerConnections, of which 109 got closed.

I noticed that quite a few of the not closed PeerConnections never got connected to anything. From Firefox perspective these connections never received an SDP answer for the offers Firefox created. I think that is something which should get optimized in the IPFS stack to more aggressively close PeerConnections which failed to connect to anything.

I also noticed that each of these PeerConnections starts two threads in Firefox, which seems unnecessary. I'll file a bug on the Firefox side for that. Maybe we can optimize this a little bit more.

One other observation from my side: the tab in Firefox never crashed for me. Now I did this test on a MacBook Pro with 16GB of RAM. Quite possible that on a less power full machine the 500+ threads Firefox was running at the time results in actual problems.

@Beanow
Copy link
Author

Beanow commented Nov 29, 2017

@nils-ohlmeier wow nice findings!
Could you share a quick primer of how you analysed this? To reproduce and test for improvements.

Same as @mitra42 no connections were declared by me. It's part of the peer discovery process and pubsub stack these connections grow.

From what I know there is a dialer machine in the swarm component that tries to connect with every address that comes up in the peer discovery.

@Beanow
Copy link
Author

Beanow commented Nov 29, 2017

As for crashes I ran on a 32GB RAM machine and don't think any OOM was triggered by the OS. Perhaps they are some sandboxing hard limit? As in FF Quantum and Chromium it's only crashing the tab.

@mitra42
Copy link

mitra42 commented Nov 29, 2017

Great - I see different behavior on Firefox and on Chrome

  • on Firefox (and it varies from version to version) the typical behavior was the growth in threads until Firefox itself crashes.
  • on Chrome it slows down a bit and then Chrome closes the tab, occasionally Chrome itself needs restarting.

@nils-ohlmeier
Copy link

@Beanow well I work on WebRTC in Firefox for a living, so I should know how to do this ;-)

The easiest thing is to open "about:webrtc" in another tab. That will show you all the WebRTC PeerConnections Firefox currently has open, plus old closed ones. I did then save that page and ran you favorite unix command line tools to do the counting of the connections on that page.

As for the threads I created a bug in the Firefox bug tracker with a patch which should improve the thread problem: https://bugzilla.mozilla.org/show_bug.cgi?id=1421819

But the general problem remains that lots of PeerConnections get created, which eventually get the browser into trouble.

@daviddias
Copy link
Member

@nils-ohlmeier thank you so much for joining this thread and providing such valuable analysis! This is rad :D

We are working on a ConnManager that gauges the usage and number of connections open and tries to preemptively close some before the browser (or Node.js) starts panicking. Is there any API that we can use that can give us a better understanding of the current load rather than using simple heuristics?

@nils-ohlmeier
Copy link

@diasdavid you are not the only asking for load information. But the normal use case so far was pages/services which do video calling. So we are contemplating to expose information about the speed of video encoding. But that would not help in your case as you only do data channels, with no video. So unfortunately the answer AFAIK is no.

But as a first step instead of re-acting on load I would recommend to look into quicker (?) closing the PeerConnections which never connected anywhere. Something appears to close PeerConnection already. Maybe you "just" need to change a timeout value in one of libs providing you WebRTC?

@Beanow
Copy link
Author

Beanow commented Dec 22, 2017

As a short update. libp2p-websocket-star tested in 0.27.5 using the FAQ entry's config holds up well stability wise. My stress tests were not able to crash the browser and files are coming through.

That isn't to say performance is where it needs to be, it will at least stay running.
Some observations that stood out to me:

  • The dialer seems to be going ham with a mere 15 peers. Large amounts of dialing attempts are constantly going out.
  • The socket-io client seems to slow down upload speeds a lot (~45 of time spent) due to it's packet encoding scripts.
  • Most of the time seems to be spent on hmacs and secio.

@daviddias daviddias added status/ready Ready to be worked status/deferred Conscious decision to pause or backlog and removed status/ready Ready to be worked labels Jan 25, 2018
@daviddias
Copy link
Member

After seeing the magic @ya7ya did with WebRTC on paratii, I now know that there is a way to pipe tons of data through WebRTC without having it consume tons of memory both in Firefox and Chrome.

@ya7ya could you outline the patches/changes you did or help lead the way by submitting the PRs needed? This is super high priority and of high importance :)

@ya7ya
Copy link
Contributor

ya7ya commented Jul 1, 2018

Hey @diasdavid , Sorry for the late reply 😄

I'm not so sure if my fixes to the paratii fork solved the underlying problem. it's important to mention that paratii would crash if we ran 5 or 6 embedded players in 1 page. but it has web3 + ipfs + clappr bundled. and it's mainly the web3 fault if i remember correctly.

but the main change is limiting the MAX_MESSAGE_SIZE in js-ipfs-bitswap to around 32kb instead of 512kb. 32kb isn't an exact science value. some browsers (firefox) won't work with higher values, like 64kb but chrome does fine.

This however was the wrong place to do the edit according to this comment , I did subsequently try to do the block-stream limit in the js-libp2p-webrtc-star , but it broke , and when i fixed it, it still didn't fix the main issue.

As far as i can tell. limiting the max message size resulted in a more stable connection, meaning the dialer isn't going crazy attempting to connect to the peers that keep disconnecting.

What i would recommend is the following:

  • a better limiter needs to be added somewhere in the libp2p stack, ideally in the js-libp2p-webrtc-star but last time i tried that i broke the repo 😄 🤣 (sorry about that)

  • we probably should drop socket.io for something thinner and lighter. maybe https://www.npmjs.com/package/uws

  • both webtorrent and ipfs has the same throttling issue, and both use simple-peer, so it might be worth checking to make sure the underlying problem is in simple-peer not the browser webrtc layer. but this is a guess and probably very wrong.

@mkg20001
Copy link
Contributor

mkg20001 commented Jul 1, 2018

we probably should drop socket.io for something thinner and lighter. maybe npmjs.com/package/uws

Or just drop the extra websocket server entirely: libp2p/js-libp2p-webrtc-star#148

@nils-ohlmeier
Copy link

FYI https://lgrahl.de/articles/demystifying-webrtc-dc-size-limit.html explains the old limits on maximum messages which could be send over data channels. But newer versions of Firefox now support way bigger messages https://blog.mozilla.org/webrtc/large-data-channel-messages/

That doesn't mean that sending large amount of data over data channels might cause stability problems. Just thought I point this out in case you are not aware of these informations already.

@lgrahl
Copy link

lgrahl commented Jul 10, 2018

Do you guys make use of the (quirky) flow control there is for data channels (namely bufferedAmountLowThreshold, onbufferedamountlow and bufferedAmount)? If not, it might be that you're buffering too much data at once. I've written an example a while ago how this can be used.

I haven't looked at your code yet but feel free to ping me for general data channel questions in Mozilla's IRC #media or on freenode #webrtc.

@daviddias
Copy link
Member

No more "already piped" with 0.31 - #1458

@interfect
Copy link

I just checked on libp2p-webrtc-star today, with js-ipfs 6d960f3.

Putting '/dns4/wrtc-star.discovery.libp2p.io/tcp/443/wss/p2p-webrtc-star' in as a Swarm (alongside '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star') led to quick file transfers between computers in my LAN, but also led to Firefox (after a few minutes) no longer being able to connect to web sites. I would open a new tab and have it sit at "Connecting" forever, and Gmail would complain that it was disconnected from chat and be unable to save messages being edited. Either a tab crash (which would happen after trying to open a few pages on Linux) or killing and re-launching the browser (on Mac) seemed to bring things back to normal.

Also, on Linux, running a node with webrtc would also sometimes eventually make the browser's window stop drawing its contents, including the tab bar.

The webrtc transport is still not stable, and moreover somehow seems able to achieve by accident the sort of browser-ruining behavior that malicious hackers everywhere struggle to produce.

@lidel
Copy link
Member

lidel commented Sep 5, 2018

@interfect did you try running it along with libp2p connection manager?
Limiting the number of connections may improve stability of webrtc.

Click to expand a sample config
{
  "config": {
    "Addresses": {
      "Swarm": ["/dns4/star-signal.cloud.ipfs.team/tcp/443/wss/p2p-webrtc-star","/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star"],
      "Bootstrap": []
    }
  },
  "connectionManager": {
    "maxPeers": 20
  }
}

@interfect
Copy link

interfect commented Sep 6, 2018

@lidel I've given connectionManager.maxPeers a shot (although I didn't include "Bootstrap": [] in my config), and the browser break/crash issues persist. It doesn't really seem to have much of an effect; I think I may get my node reporting fewer peers, but I'm still seeing loads and loads of attempted WebRTC connections in about:webrtc, and loads and loads of UDP connections at the router:

image

(Where it spikes up is where I open the tab with js-ipfs in it.)

A lot of these connections are apparently to the same endpoints:

image

I'm not sure what's at pf-in-f127.1e100.net or that EC2 address; maybe IPFS bootstrap nodes or some kind of WebRTC relay thing? (EDIT: The Google one looks like it might be Google's STUN server.)

I think the connection manager might only be counting and limiting fully-open connections, as @Stebalien mentions go-ipfs does in ipfs/kubo#5248 (comment). It seems perfectly happy to try to open 300 connections in 30 seconds, even if it is maintaining no more than 20 properly connected peers.

@kevinsimper
Copy link

As per @lidel from IRC is am posting about a demo app I made with PubSub and websocket-star.

I made this PubSub game and I use the websocket-start with IPFS. Yesterday I tried a demo at a meetup but the demo failed when more than 20 people tried to use it.
How can I scale up my application, the problem was that the websocket connection was dropping, so I was only connected to at max 12 and some could reach 21-22 but we were 70 in total.

A demo is here: https://p2p-ipfs-presentation.surge.sh/game/ you can view-source and it is also on github here https://github.com/kevinsimper/p2p-ipfs-presentation/blob/master/game/index.html

I tried webrtc before a couple of months ago, but it would often crash my browser after 5 minutes

@daviddias daviddias added status/ready Ready to be worked and removed status/deferred Conscious decision to pause or backlog labels Dec 9, 2018
MicrowaveDev pushed a commit to galtproject/js-ipfs that referenced this issue May 22, 2020
Also converts all tests to async/await
@SgtPooki SgtPooki self-assigned this May 17, 2023
@SgtPooki SgtPooki moved this to 🥞 Todo in js-ipfs deprecation May 17, 2023
@SgtPooki
Copy link
Member

js-ipfs is being deprecated in favor of Helia. You can #4336 and read the migration guide.

Please feel to reopen with any comments by 2023-06-02. We will do a final pass on reopened issues afterward (see #4336).

This issue is most likely resolved in Helia (and the latest libp2p), please try it out!

Followers/subscribers on this issue should investigate libp2p team's (and many other contributors') universal-connectivity app (see https://github.com/libp2p/universal-connectivity), where a lot of these problems are shown to have been ironed out.

@kevinsimper it would be really cool to migrate your game to Helia and add to ipfs-examples/helia-examples see ipfs/helia#43 for some of the examples we're already planning on porting.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: Done
Development

No branches or pull requests