Implemented protobuf equivalent of network_protocol.rs #6672

pompon0 · 2022-04-22T15:10:07Z

See the design: https://docs.google.com/document/d/1gCWmt9O-h_-5JDXIqbKxAaSS3Q9pryB1f9DDY1mMav4/edit
Abstracted away the network protocol format for smooth borsh -> protobuf transition.
Implemented tests of cross-version communication.

matklad

This looks really, really great! I finally feel like we are fixing stuff, rather than piling on work-arounds, major kudos here!

One thing I am really unsure here is error handling strategy. I don't like how anyhow::Result makes it impossible to understand what actually goes wrong. Like, did we critically messed up our serialization logic, or is it just a transient socket error? At the same time, network is close enough to the edge of the system, where "just log and move on" is the right error handling strategy, so I also can't say that I am 100% sure that anyhow is a wrong choice here.

I wonder what @nagisa thinks here, they do have opinions on error handling: https://kazlauskas.me/entries/errors :-)

Note: I didn't do line by line review for correctness here, as I am relatively unfamiliar with the net (hence no approval). Can do one if needed!

matklad · 2022-04-22T15:28:37Z

chain/network/build.rs

@@ -0,0 +1,3 @@
+fn main() -> std::io::Result<()> {
+    prost_build::compile_protos(&["src/network_protocol/network.proto"], &["src/"])


Is this a pure rust solution, or would it depend on protoc and stuff being in path? If it is the latter, I would suggest to maybe just commit generated protobuf code, to make sure that git clone && cargo build works without extra steps?

No super strong opinion here.

It is not pure, I somehow missed that fact, sorry. We have the following options available:

commit the generated code - it is not that bad due to the fact that the generated files are small (they use "derive" a lot - yay a 2-staged code generation :P).

enable "vendored" feature of the "prost-build" package - it removes the dependency on "protoc" in PATH, but introduces dependency on "cmake" and increases the build time

[INFEASIBLE - dynamically linked] add a pre-pagacked protoc binary as a dependency

get rid of prost package and use protobuf package instead. It contains a rust implementation of protoc.

I really dislike committing generated code to repo. Having a dependency on cmake also sounds bad to me. I'd go with the prepackaged protoc, as a well tested proto compiler, but maybe such a binary blob is not welcome in our open source blockchain technology? WDYT?

tokio-rs/prost#182 is a nasty failure mode I had encountered in the past with prost but it seems that they no longer execute arbitrary code downloaded off internet at build time by default. I'd still petition for vendoring. build.rs is a terrible horrible thing.

@nagisa could you clarify what you mean by "vendoring"? "vendoring" source of prococ and compiling that (prost-build+vendored) or "vendoring" compiled protoc binary ("protobuf" package)?

FYI protoc-bin-vendored doesn't make much sense, because it is dynamically linked. We would have to create our own package with statically linked protoc for that.

Err, I meant checking in the generated sources, my bad.

To unblock this, let's maybe leave this thing as is? "Do what the docs suggest, even if you think you know better" is a strong heuristic. We can revisit this later.

If we do revisit:

I'd be against adding binaries: I'd rather have protoc in PATH than in the repo :-)

I do like checking in generated sources, provided that there's a test verifying the generated sources are fresh. But that's a contrarian opinion which is a bit exhaustive to argue :-)

I've implemented a checksum protection for both the proto file and the checked-in generated sources (will push it tomorrow with the rest of stuff). In the meantime, PTAL at: https://github.com/pompon0/protoc-static . I've implemented a github workflow statically compiling and releasing a protoc binary. It supports now x64: linux and macOS, with some bash tuning Windows can be supported as well, and with some more work I could get it to cross compile as well to other architectures (with zig-cc). It is a 6MB binary which could be downloaded and checksummed from build.rs (in a bazel-like fashion). WDYT?

aaand, it turns out that the thing is worse than I expected.
prost-build (the package needed for code generation) actually checks for the presence of protoc in the system during build of the package itself and panics if it doesn't find it. I had to make the dependency on protoc optional and use conditional compilation to make it work :/

chain/network/src/network_protocol/borsh_conv.rs

chain/network/src/network_protocol/mod.rs

chain/network/src/network_protocol/proto_conv.rs

chain/network/src/peer/peer_actor.rs

chain/network/src/tests/util.rs

chain/network/src/network_protocol/network.proto

nagisa

My feedback on error stuff ^^

I definitely think a concrete error type would be preferable, but also I can see there not necessarily being much value in defining one if we don't anticipate callers to actually introspect the contents of it. Were we work on adding error structs here I would also prefer having a dedicated type for each conversion method, which is a ton of work, so if there isn't an immediate use-case for it, it is probably not super worth the effort.

chain/network/src/network_protocol/borsh_conv.rs

nagisa · 2022-04-25T12:42:44Z

chain/network/src/network_protocol/proto_conv.rs

+            hash: (|| CryptoHash::try_from(p.hash.as_ref().context("missing")?))()
+                .context("hash")?,


In the context of how this context could be presented to the user, I think these messages should be expanded to be more descriptive than a single word. Perhaps something like

the hash field in the GenesisId value is missing

could not convert CryptoHash / could not parse CryptoHash ¹

in this specific case. More descriptive messages would also obviate the need for the (||)() pattern.

Footnotes

note that the error context should be describing the operation that occurs in the fallible expression, not the surrounding code. That is, in hash: foo.context(...)? the context should be describing what kind of operation foo is, and not hash: foo.context(...)? or even a broader context. In practice not maintaining a specific polarity in adding context can lead to situations where you end up with both missing context and duplicated context. While not a problem here, contextualizing strictly the failing expression also helps to avoid guessing the callers' intent. ↩

Re [1]: that doesn't make sense to me. If I have a function:
fn bar() -> Result {
...
foo().context(...)? <- X
...
}
Then while passing the error from line X to the caller of bar() (via operator "?") I need to add some information why foo() was called at X in the first place - otherwise if I have 2 similar calls done for different reasons from bar(), the error message won't help me determine the precise root cause. I agree that a consistent polarity should be established, but IMO the whole context that the error is returned from (in that case, the call to bar()) should be included.

nagisa · 2022-04-25T12:50:40Z

chain/network/src/network_protocol/proto_conv.rs

+impl TryFrom<&proto::PublicKey> for PeerId {
+    type Error = anyhow::Error;
+    fn try_from(p: &proto::PublicKey) -> anyhow::Result<Self> {
+        Ok(Self::try_from_slice(&p.borsh)?)


Missing context?

This call is so trivial that IMO there is no extra context to add that could be useful (i.e. it would rather obscure the information in the cause chain). The proto message types that I introduced solely for wrapping borsh should rather be treated just like one-field tuple types in rust / newtype in Haskell - extra information to the type system, but not really for the user.

chain/network/src/network_protocol/network.proto

chain/network/src/peer/peer_actor.rs

chain/network/src/network_protocol/mod.rs

mm-near · 2022-04-27T11:24:49Z

chain/network/src/peer/peer_actor.rs

@@ -105,6 +103,11 @@ pub(crate) struct PeerActor {
    routed_message_cache: LruCache<(PeerId, PeerIdOrHash, Signature), Instant>,
    /// A helper data structure for limiting reading
    throttle_controller: ThrottleController,
+    /// Whether we detected support for protocol buffers during handshake.
+    protocol_buffers_supported: bool,


in theory this should be Optional, right?

(as there are 3 states: we don't know, supported, not supported

the full name of this field should be rather: protocol_buffers_support_detected (true/false)

chain/network/src/peer/peer_actor.rs

mm-near · 2022-04-27T13:29:17Z

chain/network/src/tests/peer_actor.rs

+    EpochSyncFinalizationRequest(EpochId),
+}
+
+struct FakeActor {


you added couple cool test classes (FakeActor, ActixSystem etc) -- let's document them (either within each .rs file or create a README.md within this directory).

With info on what these classes do and couple examples on how they can be used.

FakeActor is private and I'm not sure whether it should be reusable in a way different than through the public interface which is currently exposed. I would like to make ActixSystem reusable, but I'd have to move it outside of the network package probably - let's do that in a separate PR.

chain/network/src/tests/util.rs

nagisa · 2022-04-28T11:04:26Z

chain/network/build.rs

+    let up_to_date = (|| {
+        let generated = std::fs::read_to_string(generated_path).ok()?;
+        let generated = generated.strip_prefix(&want_prefix)?;
+        let (prefix, content) = generated.split_once('\n')?;
+        if format!("{prefix}\n") != content_prefix(content) {
+            return None;
+        }
+        Some(())
+    })()
+    .is_some();


Is there a reason you didn't use starts_with?

Suggested change

let up_to_date = (|| {

let generated = std::fs::read_to_string(generated_path).ok()?;

let generated = generated.strip_prefix(&want_prefix)?;

let (prefix, content) = generated.split_once('\n')?;

if format!("{prefix}\n") != content_prefix(content) {

return None;

}

Some(())

})()

.is_some();

let up_to_date = std::fs::read_to_string(generated_path).and_then(|s| {

s.starts_with(&want_prefix)

}).unwrap_or(false);

It is more tricky than that. In addition to the hash of the proto file, I'm prepending the hash of the generated code to the file with the generated code ITSELF. So that if someone modifies the generated code by mistake, the compiler will should on them. To verify this hash, I need to first strip it from the file content, then compute the hash of the content remainder, and only then compare hashes.

core/primitives-core/src/hash.rs

nagisa · 2022-04-28T11:11:42Z

Overall LGTM from my end.

matklad

Left a couple of non-blocking comments, but overall it looks good to me, and, given its size, we might want to merge it sooner and do any fixes, if any, in follow-ups.

chain/network/build.rs

matklad · 2022-04-28T11:09:28Z

chain/network/build.rs

@@ -0,0 +1,49 @@
+use sha2::Digest;
+
+fn content_prefix(content: &str) -> String {


There's a weak consensus for top-down code: placing entry-points and public API at the top, and helpers and impl details at the bottom:

https://near.zulipchat.com/#narrow/stream/300659-Rust-.F0.9F.A6.80/topic/Code.20style.20question/near/260109589

matklad · 2022-04-28T11:26:21Z

chain/network/src/tests/data.rs

+pub fn make_account_id<R: Rng>(rng: &mut R) -> AccountId {
+    format!("account{}", rng.gen::<u32>()).parse().unwrap()
+}
+
+pub fn make_signer<R: Rng>(rng: &mut R) -> InMemorySigner {
+    let account_id = make_account_id(rng);
+    InMemorySigner::from_seed(account_id.clone(), KeyType::ED25519, &account_id)
+}
+
+pub fn make_validator_signer<R: Rng>(rng: &mut R) -> InMemoryValidatorSigner {
+    let account_id = make_account_id(rng);
+    InMemoryValidatorSigner::from_seed(account_id.clone(), KeyType::ED25519, &account_id)
+}
+
+pub fn make_addr<R: Rng>(rng: &mut R) -> SocketAddr {
+    SocketAddr::V4(SocketAddrV4::new(Ipv4Addr::LOCALHOST, rng.gen()))
+}
+
+pub fn make_peer_info<R: Rng>(rng: &mut R) -> PeerInfo {
+    let signer = make_signer(rng);
+    PeerInfo {
+        id: PeerId::new(signer.public_key),
+        addr: Some(make_addr(rng)),
+        account_id: Some(signer.account_id),
+    }
+}
+
+pub fn make_announce_account<R: Rng>(rng: &mut R) -> AnnounceAccount {
+    let peer_id = PeerId::new(make_signer(rng).public_key);
+    let validator_signer = make_validator_signer(rng);
+    let signature = validator_signer.sign_account_announce(
+        validator_signer.validator_id(),
+        &peer_id,
+        &EpochId::default(),
+    );
+    AnnounceAccount {
+        account_id: validator_signer.validator_id().clone(),
+        peer_id: peer_id,
+        epoch_id: EpochId::default(),
+        signature,
+    }
+}
+
+pub fn make_partial_edge<R: Rng>(rng: &mut R) -> PartialEdgeInfo {
+    let a = make_signer(rng);
+    let b = make_signer(rng);
+    PartialEdgeInfo::new(
+        &PeerId::new(a.public_key),
+        &PeerId::new(b.public_key),
+        rng.gen(),
+        &a.secret_key,
+    )
+}


Consider using https://docs.rs/arbitrary/latest/arbitrary/ for generating random data.

If we want to stop at a single randomized test here, using rand is fine. If we want to grow this infrae to a long term randomizied testing infrastructure, I would feel rather strongly that we should base the thing on arbitrary.

arbitrary is like rand, but:

it is a finite PRNG, so it's easy to vary length of input to vary the length of the test

it can be easily plugged into coverage-guided fuzzing

it provides derive for simple cases

it's a better crate than rand overall (simpler API, no generics(compile time), no dependencies)

What if I need more parameters than just an entropy source to construct test input? Like consistent clock timestamps for example?

AFAICT arbitrary assumes that data is generated independently for each branch, which is a feeble assumption IMO.

in a sense, that depending on the property being fuzzed, a different distribution of input data might be desirable. Otherwise you just test whether your code is able to reject obviously invalid data gracefully.

a different distribution of input data might be desirable.

Indeed, and that's feasible with arbitrary: you don't have to implement Arbitrary trait, the main value comes from the Unstructured type:

fn make_strings(u: &mut arbitrary::Unstructured<'_>, alphabet: &str) -> arbitrary::Result<Vec<String>>

https://docs.rs/wasm-smith/latest/wasm_smith/ would be a cool example of using highly-skewed distributions, esp SwarmConfig: https://docs.rs/wasm-smith/latest/wasm_smith/struct.SwarmConfig.html

chain/network/src/tests/data.rs

matklad · 2022-04-28T11:29:06Z

chain/network/src/tests/network_protocol.rs

+    // Encodings should never be compatible.
+    for (from, to) in [(Encoding::Proto, Encoding::Borsh), (Encoding::Borsh, Encoding::Proto)] {
+        for m in &msgs {
+            let bytes = &m.serialize(from);
+            match PeerMessage::deserialize(to, bytes) {


If you'd like to fuzz this property, ask @Ekleog

chain/network/src/tests/peer_actor.rs

chain/network/src/tests/util.rs

matklad · 2022-04-28T11:38:01Z

chain/network/Cargo.toml

@@ -7,9 +7,20 @@ edition = "2021"
 rust-version = "1.60.0"
 publish = false

+[build-dependencies]
+prost-build = { version = "0.10"}
+sha2 = "0.9"


Could use near_stable_hasher here given that we already have it.

sha256 is very stable as well :)

It's not about stability, it's about not having logical duplicates. We already use near_stable_hasher for near-network so there's no need to add sha2 on top of that. For this use-case, stable-hasher and sha2 play equivalent roles -- provide some hash you can use for fingerprinting.

Abstracted away the network protocol format for smooth borsh -> protobuf transition. Implemented tests of cross-version communication.

Co-authored-by: Aleksey Kladov <aleksey.kladov@gmail.com>

hash-protected proto code generation

… (hidden behind a simple if)

pompon0 requested a review from a team as a code owner April 22, 2022 15:10

pompon0 requested a review from matklad April 22, 2022 15:10

matklad reviewed Apr 22, 2022

View reviewed changes

matklad requested a review from nagisa April 22, 2022 18:40

nagisa reviewed Apr 25, 2022

View reviewed changes

chain/network/src/network_protocol/network.proto Show resolved Hide resolved

nagisa reviewed Apr 25, 2022

View reviewed changes

matklad requested a review from mm-near April 25, 2022 13:13

pompon0 force-pushed the gprusak-network-protobufs branch from 9b15674 to 1fc80d7 Compare April 28, 2022 10:36

pompon0 requested review from nagisa and matklad April 28, 2022 10:37

pompon0 added the S-automerge label Apr 28, 2022

mm-near reviewed Apr 28, 2022

View reviewed changes

nagisa reviewed Apr 28, 2022

View reviewed changes

matklad reviewed Apr 28, 2022

View reviewed changes

mm-near approved these changes Apr 28, 2022

View reviewed changes

pompon0 removed the S-automerge label Apr 28, 2022

pompon0 force-pushed the gprusak-network-protobufs branch from 645709e to 406677a Compare April 28, 2022 15:38

pompon0 added the S-automerge label Apr 28, 2022

pompon0 force-pushed the gprusak-network-protobufs branch 2 times, most recently from 15e9d1d to 85e82ae Compare May 4, 2022 08:33

pompon0 and others added 9 commits May 4, 2022 10:33

Implemented protobuf equivalent of network_protocol.rs

afe03a4

Abstracted away the network protocol format for smooth borsh -> protobuf transition. Implemented tests of cross-version communication.

Update chain/network/src/network_protocol/mod.rs

cd4302a

Co-authored-by: Aleksey Kladov <aleksey.kladov@gmail.com>

applied comments

1b7df41

hash-protected proto code generation

migrated to thiserror

aedff4b

applied comments

498d94a

applied comments

547d396

prost-build is simply terrible

cc6f63d

made a bunch of conditionally compiled blocks visible to the compiler…

9c7d27e

… (hidden behind a simple if)

decreased version of rand to remove package duplication

4ffff22

pompon0 force-pushed the gprusak-network-protobufs branch 3 times, most recently from 4375ce5 to 432c20c Compare May 4, 2022 09:10

migrated from prost to protobuf

432c20c

near-bulldozer bot merged commit 0d879ac into master May 4, 2022

near-bulldozer bot deleted the gprusak-network-protobufs branch May 4, 2022 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented protobuf equivalent of network_protocol.rs #6672

Implemented protobuf equivalent of network_protocol.rs #6672

pompon0 commented Apr 22, 2022

matklad left a comment

matklad Apr 22, 2022

pompon0 Apr 25, 2022 •

edited

Loading

nagisa Apr 25, 2022

matklad Apr 25, 2022

pompon0 Apr 25, 2022

nagisa Apr 25, 2022

matklad Apr 25, 2022

pompon0 Apr 26, 2022

pompon0 Apr 28, 2022

nagisa left a comment

nagisa Apr 25, 2022

pompon0 Apr 25, 2022

nagisa Apr 25, 2022

pompon0 Apr 25, 2022

mm-near Apr 27, 2022

pompon0 Apr 28, 2022

mm-near Apr 27, 2022

pompon0 Apr 28, 2022

nagisa Apr 28, 2022

pompon0 Apr 28, 2022

nagisa commented Apr 28, 2022

matklad left a comment

matklad Apr 28, 2022

matklad Apr 28, 2022

pompon0 Apr 28, 2022

pompon0 Apr 28, 2022

pompon0 Apr 28, 2022

matklad Apr 28, 2022

matklad Apr 28, 2022

matklad Apr 28, 2022

matklad Apr 28, 2022

pompon0 Apr 28, 2022

matklad Apr 28, 2022

		@@ -0,0 +1,3 @@
		fn main() -> std::io::Result<()> {
		prost_build::compile_protos(&["src/network_protocol/network.proto"], &["src/"])

		hash: (\|\| CryptoHash::try_from(p.hash.as_ref().context("missing")?))()
		.context("hash")?,

		@@ -0,0 +1,49 @@
		use sha2::Digest;

		fn content_prefix(content: &str) -> String {

Implemented protobuf equivalent of network_protocol.rs #6672

Implemented protobuf equivalent of network_protocol.rs #6672

Conversation

pompon0 commented Apr 22, 2022

matklad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pompon0 Apr 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nagisa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nagisa commented Apr 28, 2022

matklad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pompon0 Apr 25, 2022 •

edited

Loading