-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented protobuf equivalent of network_protocol.rs #6672
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really, really great! I finally feel like we are fixing stuff, rather than piling on work-arounds, major kudos here!
One thing I am really unsure here is error handling strategy. I don't like how anyhow::Result
makes it impossible to understand what actually goes wrong. Like, did we critically messed up our serialization logic, or is it just a transient socket error? At the same time, network is close enough to the edge of the system, where "just log and move on" is the right error handling strategy, so I also can't say that I am 100% sure that anyhow is a wrong choice here.
I wonder what @nagisa thinks here, they do have opinions on error handling: https://kazlauskas.me/entries/errors :-)
Note: I didn't do line by line review for correctness here, as I am relatively unfamiliar with the net (hence no approval). Can do one if needed!
chain/network/build.rs
Outdated
@@ -0,0 +1,3 @@ | |||
fn main() -> std::io::Result<()> { | |||
prost_build::compile_protos(&["src/network_protocol/network.proto"], &["src/"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a pure rust solution, or would it depend on protoc
and stuff being in path? If it is the latter, I would suggest to maybe just commit generated protobuf code, to make sure that git clone && cargo build
works without extra steps?
No super strong opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not pure, I somehow missed that fact, sorry. We have the following options available:
- commit the generated code - it is not that bad due to the fact that the generated files are small (they use "derive" a lot - yay a 2-staged code generation :P).
- enable "vendored" feature of the "prost-build" package - it removes the dependency on "protoc" in PATH, but introduces dependency on "cmake" and increases the build time
- [INFEASIBLE - dynamically linked] add a pre-pagacked protoc binary as a dependency
- get rid of prost package and use protobuf package instead. It contains a rust implementation of protoc.
I really dislike committing generated code to repo. Having a dependency on cmake also sounds bad to me. I'd go with the prepackaged protoc, as a well tested proto compiler, but maybe such a binary blob is not welcome in our open source blockchain technology? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokio-rs/prost#182 is a nasty failure mode I had encountered in the past with prost
but it seems that they no longer execute arbitrary code downloaded off internet at build time by default. I'd still petition for vendoring. build.rs
is a terrible horrible thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nagisa could you clarify what you mean by "vendoring"? "vendoring" source of prococ and compiling that (prost-build+vendored) or "vendoring" compiled protoc binary ("protobuf" package)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI protoc-bin-vendored doesn't make much sense, because it is dynamically linked. We would have to create our own package with statically linked protoc for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, I meant checking in the generated sources, my bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unblock this, let's maybe leave this thing as is? "Do what the docs suggest, even if you think you know better" is a strong heuristic. We can revisit this later.
If we do revisit:
- I'd be against adding binaries: I'd rather have
protoc
inPATH
than in the repo :-) - I do like checking in generated sources, provided that there's a test verifying the generated sources are fresh. But that's a contrarian opinion which is a bit exhaustive to argue :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented a checksum protection for both the proto file and the checked-in generated sources (will push it tomorrow with the rest of stuff). In the meantime, PTAL at: https://github.com/pompon0/protoc-static . I've implemented a github workflow statically compiling and releasing a protoc binary. It supports now x64: linux and macOS, with some bash tuning Windows can be supported as well, and with some more work I could get it to cross compile as well to other architectures (with zig-cc). It is a 6MB binary which could be downloaded and checksummed from build.rs (in a bazel-like fashion). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaand, it turns out that the thing is worse than I expected.
prost-build (the package needed for code generation) actually checks for the presence of protoc in the system during build of the package itself and panics if it doesn't find it. I had to make the dependency on protoc optional and use conditional compilation to make it work :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feedback on error stuff ^^
I definitely think a concrete error type would be preferable, but also I can see there not necessarily being much value in defining one if we don't anticipate callers to actually introspect the contents of it. Were we work on adding error structs here I would also prefer having a dedicated type for each conversion method, which is a ton of work, so if there isn't an immediate use-case for it, it is probably not super worth the effort.
hash: (|| CryptoHash::try_from(p.hash.as_ref().context("missing")?))() | ||
.context("hash")?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of how this context could be presented to the user, I think these messages should be expanded to be more descriptive than a single word. Perhaps something like
- the
hash
field in theGenesisId
value is missing - could not convert
CryptoHash
/ could not parseCryptoHash
1
in this specific case. More descriptive messages would also obviate the need for the (||)()
pattern.
Footnotes
-
note that the error context should be describing the operation that occurs in the fallible expression, not the surrounding code. That is, in
hash: foo.context(...)?
the context should be describing what kind of operationfoo
is, and nothash: foo.context(...)?
or even a broader context. In practice not maintaining a specific polarity in adding context can lead to situations where you end up with both missing context and duplicated context. While not a problem here, contextualizing strictly the failing expression also helps to avoid guessing the callers' intent. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re [1]: that doesn't make sense to me. If I have a function:
fn bar() -> Result {
...
foo().context(...)? <- X
...
}
Then while passing the error from line X to the caller of bar() (via operator "?") I need to add some information why foo() was called at X in the first place - otherwise if I have 2 similar calls done for different reasons from bar(), the error message won't help me determine the precise root cause. I agree that a consistent polarity should be established, but IMO the whole context that the error is returned from (in that case, the call to bar()) should be included.
impl TryFrom<&proto::PublicKey> for PeerId { | ||
type Error = anyhow::Error; | ||
fn try_from(p: &proto::PublicKey) -> anyhow::Result<Self> { | ||
Ok(Self::try_from_slice(&p.borsh)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call is so trivial that IMO there is no extra context to add that could be useful (i.e. it would rather obscure the information in the cause chain). The proto message types that I introduced solely for wrapping borsh should rather be treated just like one-field tuple types in rust / newtype in Haskell - extra information to the type system, but not really for the user.
9b15674
to
1fc80d7
Compare
@@ -105,6 +103,11 @@ pub(crate) struct PeerActor { | |||
routed_message_cache: LruCache<(PeerId, PeerIdOrHash, Signature), Instant>, | |||
/// A helper data structure for limiting reading | |||
throttle_controller: ThrottleController, | |||
/// Whether we detected support for protocol buffers during handshake. | |||
protocol_buffers_supported: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory this should be Optional, right?
(as there are 3 states: we don't know, supported, not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the full name of this field should be rather: protocol_buffers_support_detected (true/false)
EpochSyncFinalizationRequest(EpochId), | ||
} | ||
|
||
struct FakeActor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you added couple cool test classes (FakeActor, ActixSystem etc) -- let's document them (either within each .rs file or create a README.md within this directory).
With info on what these classes do and couple examples on how they can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FakeActor is private and I'm not sure whether it should be reusable in a way different than through the public interface which is currently exposed. I would like to make ActixSystem reusable, but I'd have to move it outside of the network package probably - let's do that in a separate PR.
chain/network/build.rs
Outdated
let up_to_date = (|| { | ||
let generated = std::fs::read_to_string(generated_path).ok()?; | ||
let generated = generated.strip_prefix(&want_prefix)?; | ||
let (prefix, content) = generated.split_once('\n')?; | ||
if format!("{prefix}\n") != content_prefix(content) { | ||
return None; | ||
} | ||
Some(()) | ||
})() | ||
.is_some(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you didn't use starts_with
?
let up_to_date = (|| { | |
let generated = std::fs::read_to_string(generated_path).ok()?; | |
let generated = generated.strip_prefix(&want_prefix)?; | |
let (prefix, content) = generated.split_once('\n')?; | |
if format!("{prefix}\n") != content_prefix(content) { | |
return None; | |
} | |
Some(()) | |
})() | |
.is_some(); | |
let up_to_date = std::fs::read_to_string(generated_path).and_then(|s| { | |
s.starts_with(&want_prefix) | |
}).unwrap_or(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more tricky than that. In addition to the hash of the proto file, I'm prepending the hash of the generated code to the file with the generated code ITSELF. So that if someone modifies the generated code by mistake, the compiler will should on them. To verify this hash, I need to first strip it from the file content, then compute the hash of the content remainder, and only then compare hashes.
Overall LGTM from my end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of non-blocking comments, but overall it looks good to me, and, given its size, we might want to merge it sooner and do any fixes, if any, in follow-ups.
chain/network/build.rs
Outdated
@@ -0,0 +1,49 @@ | |||
use sha2::Digest; | |||
|
|||
fn content_prefix(content: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a weak consensus for top-down code: placing entry-points and public API at the top, and helpers and impl details at the bottom:
pub fn make_account_id<R: Rng>(rng: &mut R) -> AccountId { | ||
format!("account{}", rng.gen::<u32>()).parse().unwrap() | ||
} | ||
|
||
pub fn make_signer<R: Rng>(rng: &mut R) -> InMemorySigner { | ||
let account_id = make_account_id(rng); | ||
InMemorySigner::from_seed(account_id.clone(), KeyType::ED25519, &account_id) | ||
} | ||
|
||
pub fn make_validator_signer<R: Rng>(rng: &mut R) -> InMemoryValidatorSigner { | ||
let account_id = make_account_id(rng); | ||
InMemoryValidatorSigner::from_seed(account_id.clone(), KeyType::ED25519, &account_id) | ||
} | ||
|
||
pub fn make_addr<R: Rng>(rng: &mut R) -> SocketAddr { | ||
SocketAddr::V4(SocketAddrV4::new(Ipv4Addr::LOCALHOST, rng.gen())) | ||
} | ||
|
||
pub fn make_peer_info<R: Rng>(rng: &mut R) -> PeerInfo { | ||
let signer = make_signer(rng); | ||
PeerInfo { | ||
id: PeerId::new(signer.public_key), | ||
addr: Some(make_addr(rng)), | ||
account_id: Some(signer.account_id), | ||
} | ||
} | ||
|
||
pub fn make_announce_account<R: Rng>(rng: &mut R) -> AnnounceAccount { | ||
let peer_id = PeerId::new(make_signer(rng).public_key); | ||
let validator_signer = make_validator_signer(rng); | ||
let signature = validator_signer.sign_account_announce( | ||
validator_signer.validator_id(), | ||
&peer_id, | ||
&EpochId::default(), | ||
); | ||
AnnounceAccount { | ||
account_id: validator_signer.validator_id().clone(), | ||
peer_id: peer_id, | ||
epoch_id: EpochId::default(), | ||
signature, | ||
} | ||
} | ||
|
||
pub fn make_partial_edge<R: Rng>(rng: &mut R) -> PartialEdgeInfo { | ||
let a = make_signer(rng); | ||
let b = make_signer(rng); | ||
PartialEdgeInfo::new( | ||
&PeerId::new(a.public_key), | ||
&PeerId::new(b.public_key), | ||
rng.gen(), | ||
&a.secret_key, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using https://docs.rs/arbitrary/latest/arbitrary/ for generating random data.
If we want to stop at a single randomized test here, using rand is fine. If we want to grow this infrae to a long term randomizied testing infrastructure, I would feel rather strongly that we should base the thing on arbitrary.
arbitrary is like rand, but:
- it is a finite PRNG, so it's easy to vary length of input to vary the length of the test
- it can be easily plugged into coverage-guided fuzzing
- it provides derive for simple cases
- it's a better crate than rand overall (simpler API, no generics(compile time), no dependencies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if I need more parameters than just an entropy source to construct test input? Like consistent clock timestamps for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT arbitrary assumes that data is generated independently for each branch, which is a feeble assumption IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a sense, that depending on the property being fuzzed, a different distribution of input data might be desirable. Otherwise you just test whether your code is able to reject obviously invalid data gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a different distribution of input data might be desirable.
Indeed, and that's feasible with arbitrary: you don't have to implement Arbitrary trait, the main value comes from the Unstructured
type:
fn make_strings(u: &mut arbitrary::Unstructured<'_>, alphabet: &str) -> arbitrary::Result<Vec<String>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.rs/wasm-smith/latest/wasm_smith/ would be a cool example of using highly-skewed distributions, esp SwarmConfig: https://docs.rs/wasm-smith/latest/wasm_smith/struct.SwarmConfig.html
// Encodings should never be compatible. | ||
for (from, to) in [(Encoding::Proto, Encoding::Borsh), (Encoding::Borsh, Encoding::Proto)] { | ||
for m in &msgs { | ||
let bytes = &m.serialize(from); | ||
match PeerMessage::deserialize(to, bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like to fuzz this property, ask @Ekleog
chain/network/Cargo.toml
Outdated
@@ -7,9 +7,20 @@ edition = "2021" | |||
rust-version = "1.60.0" | |||
publish = false | |||
|
|||
[build-dependencies] | |||
prost-build = { version = "0.10"} | |||
sha2 = "0.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use near_stable_hasher here given that we already have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sha256 is very stable as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about stability, it's about not having logical duplicates. We already use near_stable_hasher
for near-network
so there's no need to add sha2 on top of that. For this use-case, stable-hasher and sha2 play equivalent roles -- provide some hash you can use for fingerprinting.
645709e
to
406677a
Compare
15e9d1d
to
85e82ae
Compare
Abstracted away the network protocol format for smooth borsh -> protobuf transition. Implemented tests of cross-version communication.
Co-authored-by: Aleksey Kladov <aleksey.kladov@gmail.com>
hash-protected proto code generation
… (hidden behind a simple if)
4375ce5
to
432c20c
Compare
See the design: https://docs.google.com/document/d/1gCWmt9O-h_-5JDXIqbKxAaSS3Q9pryB1f9DDY1mMav4/edit
Abstracted away the network protocol format for smooth borsh -> protobuf transition.
Implemented tests of cross-version communication.