-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobuf seems like a lot of overhead for this use case? #269
Comments
@kentonv Thank you for pointing this out! To be honest, I was under the impression that the protobuf did operate on the underlying ArrayBuffer. Are there any recent benchmarks comparing PB to Cap'n Proto in JS? I will investigate Cap'n Proto. While you're here, do you have any opinion on FIDL? |
Maybe dumb question here but should not SSL handle compression sufficiently in a modern world and the extra cycles of encoding and decoding and uglify etc really not worth the exercise in _ in which it would just inflate when SSL and HTTP try to compress something compressed like it's not worth doing it twice or even counter productive. |
@MasterJames This is regarding internal communication between V8 and privileged controlling code - there's no SSL. |
@ry Ask a digression. Are you sure you want to develop "Deno" in c++? |
@ry With Protobuf you generally construct an object tree representing a message first, then you serialize it into an ArrayBuffer all at once, then you parse the buffer on the other side into another object tree, and then you consume that object tree. With Cap'n Proto, you directly construct the message in the ArrayBuffer. As in, when you call More importantly, though, the encoding does not try to do any variable-width encoding like Protobuf does. For example, Protobuf encodes integers in a variable-width format so that small integers encode to one byte. This is great when sending messages over a network but is a waste when communicating within the same address space. Regarding benchmarks -- I've spent a lot of time benchmarking serialization systems, and unfortunately, my conclusion is that benchmark results are almost always meaningless. Different serialization schemes will tend to win on different kinds of inputs data due to their differing design trade-offs. Also, caching effects and working set size tend to be a huge factor in performance which is near-impossible to capture in micro-benchmarks. To have a really meaningful benchmark, someone needs to write two versions of a real-world application using two different serializations and compare them... but that's an enormous amount of work that almost no one ever does. Cap'n Proto, for its part, tends to show up in microbenchmarks as "infinity times faster" than Protobuf because serialization benchmarks tend to measure specifically the parse/serialize step, which Cap'n Proto doesn't have -- but of course that's not really a meaningful result either. @whtiehack Cap'n Proto does not require C++. |
Right of course I somehow knew that. Sorry just tuning in I'll try to keep quite. |
I've started to learn about Cap'n Proto after reading these comments, very interesting! @0xac5d44cb6d8d6291;
struct CapMsg {
channel @0 :Channel;
command @1 :Command;
# START
startCmdCwd @2 :Text;
startCmdArgv @3 :List(Text);
startCmdDebugFlag @4 :Bool;
startCmdMainJs @5 :Text;
startCmdMainMap @6 :Text;
# CODE_FETCH
codeFetchModuleSpecifier @7 :Text;
codeFetchContainingFile @8 :Text;
# CODE_FETCH_RES
codeFetchResModuleName @9 :Text;
codeFetchResFilename @10 :Text;
codeFetchResSourceCode @11 :Text;
codeFetchResOutputCode @12 :Text;
# READ_FILE_SYNC
readFileSyncFilename @13 :Text;
readFileSyncData @14 :Data;
enum Channel {
start @0;
os @1;
}
enum Command {
startCmd @0;
codeFetch @1;
codeFetchRes @2;
readFileSync @3;
readFileSyncRes @4;
}
} |
@matiasinsaurralde I recommend using a union here:
|
Hey, at my previous job I switched a project from Protobuf to Cap'n' Proto and saw a big performance boost - that said, that was a very serialisation heavy project where the extra copy was proving to be the bottleneck. I was under the impression that protobuf was used in deno to explicitly create a copy - but if that is not required I definitely think Cap'n Proto is worth considering. I think other alternatives I looked at at the time (like FlatBuffers) are also worth a look. |
If a copy is needed for security, adding in a memcpy() will still be dramatically faster than serializing/parsing a protobuf. A copy might be needed to avoid TOCTOU bugs if JavaScript could be manipulating the buffer concurrently in another thread. If JS is blocked while the message is processed, or if it can be processed carefully to avoid TOCOU, then you can avoid the copy. |
Would love to see benchmarks on that @kentonv :) |
@kentonv In initial days of deno, we were using a different struct for each message but it was adding ~20kb for each message to final typescript declaration file (see d765300), what happens here if we try to use a union? |
@benjamingr You don't need a benchmark to prove that memcpy() is faster than any copying parser/serializer. @qti3e I'm not sure exactly why Protobuf |
Yeah but you would need one to show the impact of it on a similar app |
Cap'n Proto is awesome. Great to see @kentonv involved here, recommending the switch himself! |
Cap'n Proto was exciting when it came out but has not seen any updates for years, the dev has shifted his focus on his products instead. I would recommend Cereal instead: https://uscilab.github.io/cereal/index.html |
@patlecat Hi. I'm the dev of Cap'n Proto. Cap'n Proto is actively developed. Cereal is C++-only, which makes it not very useful here. |
@kentonv I'm preparing a simplified project with that will help you reproduce the issue I'm getting with |
@kentonv I've been preparing a test project that uses Rust and the |
I've been trying Capnp on my fork, these are some numbers (building in release mode) doing 5000 Using Capnp:
Using PB:
|
@matiasinsaurralde It looks like you're using "packed" encoding. You shouldn't -- "packing" is basically a lightweight compression scheme tailored for Cap'n Proto, intended to make messages smaller on the wire at the cost of some CPU time, much like how Protobuf uses variable-width integer encoding. For intra-process communication, this wastes time (especially on the TS side, since JavaScript is pretty bad at byte twiddling). Try switching to regular encoding (remove "packed"). Note that for the very best case, you'd want to go a little further. A Cap'n Proto message is normally composed of multiple "segments", each of which has a backing ArrayBuffer. When you call Let me know how it goes. I'm travelling for the next week but will try to peak at my e-mail from time to time. |
@kentonv Thanks, I'm currently trying out the non-packed encoding (works faster), also reading more about how the segmentation works, it seems that the Rust library offers some options for this. |
I evaluated a couple options which have similar goals: cap-n-proto, FIDL, and flatbuffers. Our requirements were that it supports C++, TypeScript, Rust, and builds with GN.
The initial work of replacing Protobufs with Flatbuffers is complete. We consider this choice tentative. @piscisaureus and I have an uneasy feeling about some of FlatBuffer's design decisions. Particularly the vtable (I would prefer fixed size table, as cap-n-proto does) and how flatbuffers starts encoding at the end of the buffer (@piscisaureus described this to me, hopefully I'm relaying it correctly). |
@ry FlatBuffers writes back-to-front so that reads can be front-to-back (children are written before their parents so that parents can be read before children). |
This doesn't move the needle much on async ops that are submitted and immediately awaited upon, but it does offer slighly better latency when taking a promise from an async op and then waiting on it after performing other work. One of the big refactorings in this PR is ensuring that async and sync fast ops share the same code path. The same refactoring can be applied to slow ops, but that would be slightly too much to tackle in this PR.
Hello. I'm the guy who wrote Protobuf v2 and open sourced it at Google. (I also, just in the last year, built Cloudflare Workers, a non-Node JavaScript server runtime embedding V8, with a focus on secure sandboxing and high multi-tenancy.)
I was surprised by the choice of Protobuf for intra-process communications within Deno. Protobuf's backwards compatibility guarantees and compact wire representation offer no benefit here, while the serialize/parse round-trip on every I/O seems like it would be pretty expensive.
I suppose the main motivation for using Protobuf here is to get convenient code for transmitting data that is compatible across languages?
Have you considered Cap'n Proto for this? I created Cap'n Proto shortly after leaving Google. It works a lot like Protobuf, except that the generated getters and setters operate directly on a backing byte array (e.g. ArrayBuffer in JavaScript), using a C-struct-like layout. Since you can directly share that byte array between code in different languages, you can construct a message in one language and then consume it in another without even making a copy in between, much less a serialize/parse round trip. Communication between a sandbox and a supervisor was in fact a motivating use case for Cap'n Proto.
Cap'n Proto is well-supported in C++, Rust, and Go. There is also a mostly-complete TypeScript implementation. Admittedly the TypeScript implementation has not received a whole lot of real-world usage, but there's interest I may be willing to directly adopt and support it for you. (I expect to need it in my own work soon anyway.)
The text was updated successfully, but these errors were encountered: