Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsub: Message is unsized #118

Closed
jamesray1 opened this issue Dec 28, 2018 · 16 comments
Closed

pubsub: Message is unsized #118

jamesray1 opened this issue Dec 28, 2018 · 16 comments

Comments

@jamesray1
Copy link
Contributor

jamesray1 commented Dec 28, 2018

One problem with messages being unsized is that an attacker could spam the network with very large data fields in messages, which would be prohibitively expensive to transmit. While enforcing an arbitrary size limit may be unreasonable for a library, perhaps a workaround is to optionally make the data field limited to some value, which can be chosen by users of the library. Otherwise, leave it to users to manage this problem, such as by having a gas limit in Ethereum, and otherwise pricing storage, computation, bandwidth and I/O.

cc @whyrusleeping, @vyzo

@Stebalien
Copy link
Member

go-libp2p-pubsub currently enforces a max size of 1MiB. We should definitely make room for this in the spec and consider making it configurable.

@jamesray1
Copy link
Contributor Author

jamesray1 commented Dec 28, 2018

Similarly, the topicIDs field should also be prevented from growing infinitely, e.g. via pricing storage in an application. It seems like you wouldn't be able to compress a topicID or topicIDs without having collisions.

@jamesray1
Copy link
Contributor Author

#119

@Stebalien
Copy link
Member

The entire RPC (topic IDs, authors, signatures, etc) is currently limited to 1MiB. That means we only have to read 1MiB before we can validate the message. While we don't currently take any action (that I know of) other than to drop invalid messages or messages on topics to which we are not subscribed, we can (in the future) disconnect from peers that send us such messages.

@lacabra
Copy link

lacabra commented Jun 6, 2019

@Stebalien could you please point out where the 1MB is coded in the code? We are using libp2p to transmit messages larger than 1MB, and we observe that they are being dropped silently, so we need to adjust these limits accordingly. Thank you.

@jacobheun
Copy link
Contributor

Came across this while looking at https://discuss.libp2p.io/t/pub-sub-messages-larger-than-1mb/173.

Pubsub is setting the limit via the delimited io https://github.com/libp2p/go-libp2p-pubsub/blob/49274b0e8aecdf6cad59d768e5702ff00aa48488/comm.go#L33. One thing to note, in addition to pubsub the Mplex spec also sets a limit of 1MB. I haven't looked at the other go muxers, but that may also need to be adjusted if you need to increase these for an isolated network.

@Stebalien
Copy link
Member

One thing to note, in addition to pubsub the Mplex spec also sets a limit of 1MB.

That just means that each mplex "frame" needs to be less than or equal to a megabyte. Internally, go-mplex breaks large writes into small writes so you should never see this. However, on the read side, we kill the entire connection if the other end sends us a single frame larger than 1 megabyte.


@lacabra I highly recommend that you don't. Pubsub should be used for events, not large messages, as every peer is likely to receive the same message multiple times. Pubsub is optimized for latency, not throughput. If you need to distribute a large amount of data, I recommend:

  1. Using pubsub to announce this data (e.g., by CID).
  2. Using something like bitswap to actually fetch the data. You could also use a custom protocol.

If you really need to change this, it's hard-coded here: https://github.com/libp2p/go-libp2p-pubsub/blob/master/comm.go#L33. But be warned, other implementations will likely drop your messages as well.

Also, the UX here really sucks. I've filed a PR to avoid silently dropping messages (in go): libp2p/go-libp2p-pubsub#189

@dirkmc
Copy link

dirkmc commented Jun 7, 2019

That just means that each mplex "frame" needs to be less than or equal to a megabyte. Internally, go-mplex breaks large writes into small writes so you should never see this. However, on the read side, we kill the entire connection if the other end sends us a single frame larger than 1 megabyte.

Is it worth adding this to the mplex spec?

@jacobheun
Copy link
Contributor

Is it worth adding this to the mplex spec?

This would probably a good thing to at least add a recommendation around to get more consistency across implementations. js mplex does not currently break up the writes, although it probably should, it just closes the connection.

@vishalchangraniaxiom-old
Copy link

vishalchangraniaxiom-old commented Dec 2, 2019

go-libp2p-pubsub currently enforces a max size of 1MiB. We should definitely make room for this in the spec and consider making it configurable.

Not having the size limit configurable is kind of a deal-breaker for us to use the pub-sub model of communication offered by libp2p. Our application will have messages slightly over 1MB and we would like this field to be configurable. Please can this issue be prioritized and the field be made configurable?
@vyzo @raulk

@raulk
Copy link
Member

raulk commented Dec 3, 2019

@vishalchangraniaxiom could you describe your use case? Transmitting large messages over pubsub isn’t a recommended pattern. These messages circulate through the network, and reach is amplified at every step. Sending large blobs can easily saturate links and incur in extra redundancy overhead. The best way to do this is to store blob somewhere, and send a lightweight event indicating its availability and locator (eg. CID if you use IPFS).

@vishalchangraniaxiom-old

@raulk thanks for the reply. Our use-case - transmitting blockchain blocks which may be a little over 1mb. I understand the concern around large messages but if the field was configurable and not hard coded we could fine tune it to our requirements.

@aschmahmann
Copy link
Contributor

@vishalchangraniaxiom I see in the issue you linked that you are also looking for some state (i.e. the blockchain) to be persisted via pubsub and for peers to be able to "catch-up" upon joining the network.

As this "catch up" behavior is not supported by pubsub (pubsub is a fire and forget protocol), you will likely have to use some other protocol to perform the catch-up. For example https://github.com/libp2p/go-libp2p-pubsub-router, a persistent Key-Value store built on pubsub, uses a separate Fetch protocol when peers need to catch up to the latest state.

Since you will need to utilize some protocol to do catch up anyway it should be pretty straightforward to reuse that same protocol to turn some notification about a new block into a retrieval of that block.

@vishalchangraniaxiom-old

@aschmahmann thanks. I will definitely look into the pubsub-router. Do you know if that has any size restrictions?

@aschmahmann
Copy link
Contributor

@vishalchangraniaxiom yes, it has both the size restrictions of the pubsub messages and the Fetch protocol has size restrictions (these are to protect from the DoS mentioned above). However, the point is that creating a custom protocol on top of pubsub (or reusing something like Bitswap) would allow you to keep the pubsub message size small (i.e. less than 1MB) and allow your custom protocol to have arbitrary message size.

@mxinden
Copy link
Member

mxinden commented Apr 5, 2021

I am closing here as the point of discussion has diverted. The message limit has been documented in https://github.com/libp2p/specs/blob/master/pubsub/README.md#the-message.

Feel free to comment here in case you would like to continue the concrete discussion on message size.

@mxinden mxinden closed this as completed Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants