-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pubsub: Message is unsized #118
Comments
go-libp2p-pubsub currently enforces a max size of 1MiB. We should definitely make room for this in the spec and consider making it configurable. |
Similarly, the |
The entire RPC (topic IDs, authors, signatures, etc) is currently limited to 1MiB. That means we only have to read 1MiB before we can validate the message. While we don't currently take any action (that I know of) other than to drop invalid messages or messages on topics to which we are not subscribed, we can (in the future) disconnect from peers that send us such messages. |
@Stebalien could you please point out where the 1MB is coded in the code? We are using libp2p to transmit messages larger than 1MB, and we observe that they are being dropped silently, so we need to adjust these limits accordingly. Thank you. |
Came across this while looking at https://discuss.libp2p.io/t/pub-sub-messages-larger-than-1mb/173. Pubsub is setting the limit via the delimited io https://github.com/libp2p/go-libp2p-pubsub/blob/49274b0e8aecdf6cad59d768e5702ff00aa48488/comm.go#L33. One thing to note, in addition to pubsub the Mplex spec also sets a limit of 1MB. I haven't looked at the other go muxers, but that may also need to be adjusted if you need to increase these for an isolated network. |
That just means that each mplex "frame" needs to be less than or equal to a megabyte. Internally, go-mplex breaks large writes into small writes so you should never see this. However, on the read side, we kill the entire connection if the other end sends us a single frame larger than 1 megabyte. @lacabra I highly recommend that you don't. Pubsub should be used for events, not large messages, as every peer is likely to receive the same message multiple times. Pubsub is optimized for latency, not throughput. If you need to distribute a large amount of data, I recommend:
If you really need to change this, it's hard-coded here: https://github.com/libp2p/go-libp2p-pubsub/blob/master/comm.go#L33. But be warned, other implementations will likely drop your messages as well. Also, the UX here really sucks. I've filed a PR to avoid silently dropping messages (in go): libp2p/go-libp2p-pubsub#189 |
Is it worth adding this to the mplex spec? |
This would probably a good thing to at least add a recommendation around to get more consistency across implementations. js mplex does not currently break up the writes, although it probably should, it just closes the connection. |
Not having the size limit configurable is kind of a deal-breaker for us to use the pub-sub model of communication offered by libp2p. Our application will have messages slightly over 1MB and we would like this field to be configurable. Please can this issue be prioritized and the field be made configurable? |
@vishalchangraniaxiom could you describe your use case? Transmitting large messages over pubsub isn’t a recommended pattern. These messages circulate through the network, and reach is amplified at every step. Sending large blobs can easily saturate links and incur in extra redundancy overhead. The best way to do this is to store blob somewhere, and send a lightweight event indicating its availability and locator (eg. CID if you use IPFS). |
@raulk thanks for the reply. Our use-case - transmitting blockchain blocks which may be a little over 1mb. I understand the concern around large messages but if the field was configurable and not hard coded we could fine tune it to our requirements. |
@vishalchangraniaxiom I see in the issue you linked that you are also looking for some state (i.e. the blockchain) to be persisted via pubsub and for peers to be able to "catch-up" upon joining the network. As this "catch up" behavior is not supported by pubsub (pubsub is a fire and forget protocol), you will likely have to use some other protocol to perform the catch-up. For example https://github.com/libp2p/go-libp2p-pubsub-router, a persistent Key-Value store built on pubsub, uses a separate Fetch protocol when peers need to catch up to the latest state. Since you will need to utilize some protocol to do catch up anyway it should be pretty straightforward to reuse that same protocol to turn some notification about a new block into a retrieval of that block. |
@aschmahmann thanks. I will definitely look into the pubsub-router. Do you know if that has any size restrictions? |
@vishalchangraniaxiom yes, it has both the size restrictions of the pubsub messages and the Fetch protocol has size restrictions (these are to protect from the DoS mentioned above). However, the point is that creating a custom protocol on top of pubsub (or reusing something like Bitswap) would allow you to keep the pubsub message size small (i.e. less than 1MB) and allow your custom protocol to have arbitrary message size. |
I am closing here as the point of discussion has diverted. The message limit has been documented in https://github.com/libp2p/specs/blob/master/pubsub/README.md#the-message. Feel free to comment here in case you would like to continue the concrete discussion on message size. |
One problem with messages being unsized is that an attacker could spam the network with very large data fields in messages, which would be prohibitively expensive to transmit. While enforcing an arbitrary size limit may be unreasonable for a library, perhaps a workaround is to optionally make the data field limited to some value, which can be chosen by users of the library. Otherwise, leave it to users to manage this problem, such as by having a gas limit in Ethereum, and otherwise pricing storage, computation, bandwidth and I/O.
cc @whyrusleeping, @vyzo
The text was updated successfully, but these errors were encountered: