Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implment Packet buffer for sending #20

Open
disarticulate opened this issue Nov 26, 2020 · 13 comments · May be fixed by #25
Open

Implment Packet buffer for sending #20

disarticulate opened this issue Nov 26, 2020 · 13 comments · May be fixed by #25

Comments

@disarticulate
Copy link
Contributor

Most browsers currently have a limit for message size:

https://stackoverflow.com/questions/15435121/what-is-the-maximum-size-of-webrtc-data-channel-messages

My testing on chrome gets the following error:
Attempting to send message of size 988606 which is larger than limit 262144

Although the spec is expected to be built into browsers, these arbitrary size limits result in no error message that I see in the console. The above comes from running debug version of chrome.

When I tried to create my own 'sync' system before switching to try y-webrtc, I used protocol buffers to wrap updates, and hashed the data to keep them order/organized. I don't have any real knowledge about best practices, however.

@dmonad
Copy link
Member

dmonad commented Nov 28, 2020

Hi @disarticulate ,

this is indeed a problem. Data channels in WebRTC really feel like an afterthought in many places.

One solution would be to write a wrapper around the webrtc package "simple-peer" that will handle splitting up messages. Larger messages simply need to be split up if they exceed a certain size (I wonder if it is possible to get/overwrite the message-size limit)..

I wanted to write a wrapper around simple-peer anyway because different browsers often have trouble communicating with each other. Sometimes messages get lost although we use a reliable webrtc connection. Our wrapper around simple-peer should handle splitting up messages and making sure that no messages get lost (using a retry logic).

I imagine that we simply assign an increasing number to each message. Messages that are split have an additional increasing number that defines the part of the message.

This is how I would define the protocol. Internally, I'd probably simply encode this to Uint8Arrays using lib0/encoding. Protocol buffers is great, but it adds quite some overhead that I try to avoid (bundle size & mental complexity).

# Example of a "normal" message that is not split up
[normalMessageType, messageClock, ...message]

# Example of a split message
[splitMessageType, messageClock, numberOfMessageParts, partNumber, ...messagePart]

The peers would need to maintain a list of messages that they have not received yet. And of course, they would need to merge message parts when all parts have been received. For Yjs it is not necessary to apply messages in a certain order. Any order is fine. Messages just should not get lost.

When I tried to create my own 'sync' system before switching to try y-webrtc, I used protocol buffers to wrap updates, and hashed the data to keep them order/organized. I don't have any real knowledge about best practices, however.

One advantage of using Yjs/CRDTs is that you don't have to care about the order of messages. These messages simply have to arrive somehow at the other peers.

@disarticulate
Copy link
Contributor Author

disarticulate commented Nov 28, 2020

I looked around for some prior art, and this appears to be the only wrapper around simple peer that overcomes the issue:

https://github.com/disarticulate/simple-peer-files

The simple-peer-files/src/Meta.ts implements a similar protocol to what you describe

I forked it to see how small it could be bundled, including making simple-peer a peerDependency, without @feross/buffer, it came out to ~38Kb, compressed I believe, ~110Kb uncompressed. It looks like they're using some heavy streaming libraries, so i'm not sure how to interpret 'bundle size', but I'd guess a lot of that duplicates what you've done with lib0.

the other thought I had: with the Yjs/CRDT is there anyway to 'naturally' spit out smaller/chunked updates with some kind of flag? This would probably ruin the advantage of out of order updates to the extent that you'd need to mutexlock updates until a splitMessage is finished sending.

For now, I'm down sizing my documents and moving the media/large segmented parts into hashes and seeing if simple-peer-files works well enough to do the heavy lifting and recombine the thing on the otherside.

@dmonad
Copy link
Member

dmonad commented Dec 3, 2020

i'm not sure how to interpret 'bundle size', but I'd guess a lot of that duplicates what you've done with lib0.

Yjs uses lib0/encoding anyway. So I'd like to avoid other encoding-libraries if possible. Seems a lot of people are focused on protobuf ^^ yjs/yjs#262 - I Explained my reasons for not using protobuf in Yjs there.

It seems that WebRTC doesn't always guarantee in-order delivery. So the new protocol should account for that. Simply describing the end of a message only works when the protocol guarantees in-order delivery.

the other thought I had: with the Yjs/CRDT is there anyway to 'naturally' spit out smaller/chunked updates with some kind of flag? This would probably ruin the advantage of out of order updates to the extent that you'd need to mutexlock updates until a splitMessage is finished sending.

There is. You can basically split up Yjs documents into smaller update messages. But, when you insert one huge JSON/binary blob in Yjs, then the smallest update-unit might still be too large for WebRTC. I don't think we can get around splitting of messages..

@disarticulate
Copy link
Contributor Author

disarticulate commented Dec 4, 2020

my webrtc buffer protocol was to:

  1. hash the data
  2. chunk the data, then calculate the hashes
  3. wrap in a protobuf with packet # and metadata, particularly the final hash;
  4. receive packets in whatever order then reassemble until the hash matches. so no 'technical' order was necessary but nnot knowing the numbers would make reassembly expensive, but not impossible.

hashing was used because i semi-expect to have an unsecure network and wanted my packets not to be modified, but right now it's just syncing device documents.

I think the problem is definitely webrtc, but I could imagine a benefit to standard 'update sizes' via an intelligent chunking function within the core, as abstractly it seems that's what you're doing when you're moving updates left or right. a buffer's just a bunch of updates to the right. It's just it loses the advantage while it's trying to do that update.

anyway, I'm deep into my application layer and cannot provide much other than presenting things I've found along the way.

@holtwick
Copy link

holtwick commented Jan 5, 2021

Hi, I would like to join the discussion with a question:
If I'd like to send a bigger file like an image, I guess it doesn't make sense to wrap that in a Y.Doc?

If that's true, what would be the best way to share such a file among peers? Usually I would send some request to a peer to send me the binary data using a DataChannel Is that correct?

Can we extend y-webrtc to support exchanging additional data formats? Would it be possible to use the same encryption?

@dmonad
Copy link
Member

dmonad commented Jan 6, 2021

In the current state, y-webrtc apparently can't handle large files (depending on the browser being used).

Managing this manually would be pretty hard because you need to coordinate where to get the file from. y-webrtc supports partially connected networks (not every client is connected to every other client).

Therefore, it might make sense to put the image in a subdocument. Then Yjs can handle syncing the image asynchronously. There should be close to no performance overhead if you store the image as a Uint8Array somewhere in Yjs.

@dmonad
Copy link
Member

dmonad commented Jan 6, 2021

Another nice alternative is to use webtorrent (for large files).

@disarticulate
Copy link
Contributor Author

disarticulate commented Jan 6, 2021 via email

@holtwick
Copy link

holtwick commented Jan 6, 2021

Thanks, @dmonad and @disarticulate for the valuable feedback. I will test the solutions you mentioned once I get to the implementation of that feature in my project. I'll give feedback on the outcomes.

To summarize the solutions you proposed:

  • Yjs sub-document with data in Uint8Array format
  • Using webtorrent (smart load distribution)
  • Direct transmission with chunked data about 16KB each. Example code

I would add another solution, for my special use case, involving a stupid web server to upload the data once and clients fetching from there.

@disarticulate
Copy link
Contributor Author

disarticulate commented Mar 21, 2021

I created a monkey patch, hack, into SimplePeer here:

https://github.com/disarticulate/y-webrtc/

I did the following:

  1. Extended SimplePeer's class as SimplePeerExtended.js
  2. Overwrite import in y-webrtc.js to use th eextneded version
  3. created two Y.Doc for transmission (txDoc) and receiving (rxDoc)
  4. created a initial setup and sync transmissions for peers
    a. client1 syncs: txDoc -> rxDoc (one way)
    b. client2 syncs: txDoc -> rxDoc (one way)
  5. send(chunk) -> queses data, creates more chunks with packets, and sends each packet into an array in the txDoc
  6. txDoc.on('update' -> sends msg to sync
  7. rxDoc is updated with msg
  8. upon receipt of all packets, this.push is triggered

it reuses yjs and no outside packages. it may be a design guide to something more economical. also, i believe WebRTC spec doesn't garuntee order of transmission so the CRDT algo does some work here. otherwise we're just using the nice encoded dataset given byh 'update'

@martinpengellyphillips
Copy link

I just encountered this and took a while to determine the issue. What happened in my case is that syncing in Firefox worked, but syncing the same in Chrome suddenly started failing (having worked previously). I eventually narrowed it down to a size issue where a particularly large update was silently breaking y-webrtc for Chrome.

A few questions:

  • Is the related pr here still the best approach to workaround this?
  • Is there anything I can do to help get a fix included in y-webrtc itself?
  • Can there be a more visible y-webrtc warning / error when this occurs?

Thanks!

@disarticulate
Copy link
Contributor Author

@martinpengellyphillips here's the #25 pull request. I think some of the feedback is about better integration with @dmonad's approach and comments.

As far as I know, this is just how webrtc is going to handle things. Another solution would be to figure out how to ensure all updates using webrtc are already a max size before using the pipe.

@andre-dietrich
Copy link

I created a monkey patch, hack, into SimplePeer here:

https://github.com/disarticulate/y-webrtc/

I did the following:

1. Extended SimplePeer's class as SimplePeerExtended.js

2. Overwrite import in y-webrtc.js to use th eextneded version

3. created two Y.Doc for transmission (txDoc) and receiving (rxDoc)

4. created a initial setup and sync transmissions for peers
   a. client1 syncs: txDoc -> rxDoc (one way)
   b. client2 syncs: txDoc -> rxDoc (one way)

5. send(chunk) -> queses data, creates more chunks with packets, and sends each packet into an array in the txDoc

6. txDoc.on('update' -> sends msg to sync

7. rxDoc is updated with msg

8. upon receipt of all packets, this.push is triggered

it reuses yjs and no outside packages. it may be a design guide to something more economical. also, i believe WebRTC spec doesn't garuntee order of transmission so the CRDT algo does some work here. otherwise we're just using the nice encoded dataset given byh 'update'

@disarticulate ... Thanks for your efforts, I used your fix as an alternative WebRTC-Provider and it works like charm, tested it on different browsers and with images and even video files ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants