-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: Rust buffer verification #4916
Comments
cc @ry |
It be nice to run a test to see what speedup Rust gets from not using bounds checks. Besides that, another advantage of a Verifier (as opposed to catching an out of bound error) is that it is all in one place (catching bounds checks is hard to make modular). Also may be easier to debug. |
As an interim mitigation, are there any examples of creating and using a Rust FFI wrapper for the C++ buffer verifier? |
@pwrdwnsys Not that I know of, but @aardappel would be able to chime in. I suspect it would be straightforward to do that. |
@jean-airoldie and @tymcauley: You both have been making excellent contributions to the Rust port. Would either or both of you be interested in working on the Rust Verifier functionality? I would've emailed you personally, but I don't know how to get in touch with either of you :-) If you want to email me, link is in my profile. |
@rw I might be interested in taking a look at that at some point (work's busy right now, so it might not be for a week or two). I'll let you know if I've got any questions about the implementation you had in mind. |
@rw I'm afraid I'm quite busy at the moment and I can't really commit to anything too time consuming. |
@rw Alright I got some time now. Email me at I was thinking of maybe creating a |
Just in-case no one is aware of this, someone has written their own verifier: |
@goodsauce @doitian "Canonical FlatBuffers".. sounds interesting, is this a new encoding? Some more details would be welcome. |
No, it's just a strict FlatBuffers builder. And our verifier implementation has nothing to do with CFB, its for the reader implemented here. We use FlatBuffers to parse message received from the P2P network. Without a verifier, any accessor may panic. So we build a generator via Python and Jinja as a temporary solution. The code has been moved to branch legacy, and here is an example of the generated verifier code: https://github.com/nervosnetwork/cfb/blob/legacy/tests/common/scalars_with_different_size_generated_verifier.rs |
Any info if it's still planned? |
@krojew Still planned, but I don't have to time to work on it for a while. |
2020 - any news? I've been using CFB and it seems to work fine, so maybe it's possible to their implementation? |
I have to agree with the previous discussion - if there's no way to verify the validity of a buffer, the idiomatic way to handle such situation in Rust is to return a Result. At the moment, the generated code panics on error which results in termination of current thread. In other words - without a verification like in CFB, it's very dangerous to use flatbuffers in Rust, unless someone places absolute trust in incoming data. |
Any news? |
I'd like to clarify that panicking is very inconvenient but it is safe. I believe that there is no risk of a vulnerability due to EDIT: contributors have found problems with our usage of unsafe; I was incorrect. |
@rw in this case, it's not simply inconvenient - it makes FB in Rust effectively unusable when dealing with untrusted message sources. Not all programs can afford terminating an entire thread with panic, when somebody sends invalid data. This effectively means a malicious sender can kill a process simply by sending garbage bytes. That's not something one would want in production environment. The idiomatic thing to do is to use a Result instead of a full thread panic. But for it to work, we need verification. I don't see why we wouldn't want to integrate cfb into Rust FB directly, hence solving the problem. |
I would like to second @krojew 's sentiment. I'm writing a high performance streaming library using flatbuffers in Rust and I won't have the luxury of deserializing in a separate thread. The ability for a malicious actor to crash the thread (and by extension the whole process, since I only have one thread) is "safe" in terms of memory safety, but still a vulnerability. A (P.S. thank you so much for the rust implementation - verifier or no, I'm really glad I have this tool to use in Rust :) ) |
Again, the solution to not having panics is to verify a buffer beforehand, so we know access will not panic. Besides the ergonomic issues of each field access being a We don't have any implementation for any language that verifies the buffer during an access, they all verify ahead of time or rely on some form of out of bounds exception if they don't. Rust would be best served by the former, since panics apparently are not always meant to be caught? |
@aardappel we can keep current behavior for access. The issue is with having a mechanism to verify the buffer beforehand, as you know. Currently, this is only provided by cfb, which has been having problems with generated FB code lately. We should have an official one. |
@aardappel your justification makes sense to me now, thanks for the clarification. A verifier + panic on invalid access would be the most performant while still offering the protections necessary for production use |
@aardappel So two things with this - it wouldn't be a slowdown, as the version right now already does bounds checks. Secondly, you would not need to constantly The main issue here is that a verifier doesn't exist (and won't for quite a while), and even if it did, people shouldn't be forced to use one just to avoid panics. There are two reasons someone might want safe, non-panicking code, that doesn't use a verifier:
For this reason, there is a need for the option to use a Verifier-free api that does not panic. I can't see an argument for making people choose between a verifier and code that can panic - the most common use case is wanting code that both does not panic, and is agnostic to or even opposed to a verifier, probably for performance reasons. This is in addition to the fact that For the case where someone does want a verifier, or wants non-bounds checked code, that is where the separate concept of a |
If we do this, I'd like to add an integration test that uses something like https://github.com/dtolnay/no-panic to prove that accessor code won't panic. |
@rw I think we should just focus on a verifier. As I've tried to explain, @TheButlah And yes, it will be slower. Besides doing manual offset checks (which will likely not get as well optimized as the compiler inserted ones, and can't be elides similarly) you need to propagate these errors, stick em in a Yes a verifier is more expensive than a single access, but only needs to happen once per buffer.. and skips any string data or scalars, so is rather fast for most kinds of buffers. And is entirely optional if data comes from a trusted source, or
I find that a weird thing to say. Of course you do. You really don't want to be reading random binary as a FlatBuffer, all sorts of interesting data could come out.
I do not see a strong argument for that.
No, it isn't. What is idiomatic entirely depends on what you're doing. Current Rust APIs are full of potential panics, and for good reason: they encode unexpected failures. |
In my use case I have only three things that I care about:
I might be wrong about point 2, but I really do think there are a number of common use cases where you want those three things, and a verifier is too slow. For example, when you read only a small part of the overall data, you would rather eat the performance cost of the bounds check + |
Idea: we could let the user choose which type of overhead they want to incur. I'm looking at the impl for The read-only methods they have there are We could let users configure their flatbuffer reader in three ways:
This choice can be orthogonal to verification. Maybe careful users want verification AND checked array accesses. Maybe risk-embracing users want no verification combined with unsafe access. Thoughts? Edit: maybe we should use the |
@rw I'm all for letting the user choose what type of overhead they want to incur, although I do think that once verified the buffer should no longer have bounds checked access since there is simply no point to continuing to do so once verified. I can't provide much feedback on the rest of your comment, as I'm new to rust. Why do 1 and 2 return results if the error types are impossible? It seems like they should just be If that is the intent, the |
I agree with @aardappel - a verifier is a better approach than a result. I would very much like to have a verifier returning a Also, for a |
Yes, just to have a unified external API.
Could you say more about this use case? I'm imagining the worst-case scenario, where a user wants to read one integer out of a buffer that is 2GB in size.
|
@rw my buffers are usually a lot smaller than that, but let's take streaming video with some metadata, like timestamps, from some camera to some client as an example. First concern is robustness. I don't want the client to crash because all of this is supposed to be robust enough to run on a remote system, but can't afford the resources/don't want the complexity of managing a thread for deserialization - so code that can panic doesn't work here. Therefore, either a verification step or bounds checking + a Result would be needed. Second concern is performance. In this case, I'm just logging the timestamp from the sensor without reading any of the video data, and then I will pass off the flatbuffer to some callback the user provided. Does it make sense for the client to have to scan the whole flatbuffer, almost all of which is RGB data, just to get out a timestamp? The user will typically not care about verification either - if they cannot access the RGB frame due to an invalid buffer, thats fine - they will just wait for the next valid frame. Under the current system, panics and the associated crash would occur if the user connected the client to a port that generated flatbuffers for a different sensor type. In this case, the client crashes, when I want a guarantee that the client won't crash. I don't actually mind if they connect to a sensor of the wrong type and they get back bogus data due to a misinterpretation of the flatbuffer - that's on them to verify the buffer if they feel a need to (and they generally won't, because they would rather see the bogus video frames and realize they connected to a wrong sensor and reconnect at that point, than incur the cost of verification) This of course is all much less important if the verifier can skip a vector's data (such as if it's able to just verify the start and size of a vector but skip the vector contents), but so far I haven't heard that to be the case - hoping I'm wrong though because if so I would definitely just verify the buffer |
Perhaps an additional question is in order - under what circumstances can the verifier skip the contents of vectors in the flatbuffer by checking just the start index and the number that describes the size of the vector? I would expect a vector of ints could be checked by verifying the initial offset and total size without scanning through the actual contents. What about a vector of tables? I'm assuming in that case, nothing could really be skipped since each table has to be checked for it's structure. What about a vector of fixed size arrays, or a vector of structs? I'd assume this is equivalent to the vector of ints case If the verifier can skip vectors of primitives, it pretty much solves my use case above - I can verify the vector knowing it's basically O(1) rather than O(num pixels) |
|
@rw Yes, it would still have a performance overhead that would scale with the size of the input some cases, like the one you have mentioned and in the case of vectors of non-primitives, I assume. I was just trying to find a middle ground that would justify a verifier for my use case since this proved to be a lot more controversial than expected :P
|
The verifier should skip primitives - how would you verify e.g. an int? If it's there, it's valid and that's it. A vector of ints follows the same rule - if the size is valid, the vector is valid. I don't see why a verifier would be linear in terms of buffer size in this case. Of course we cannot forget enums, which should be verified. |
@TheButlah Remember, you only run a verifier once, as data arrives. You are then free to access it efficiently afterwards. If you constantly are receiving brand new buffers, and then only touch a fraction of it, then I don't understand your use case, or at least I don't think we need to cater for it. @rw Also, I know using |
So from what I'm gathering from this thread so far, a flatbuffer type could have two ways of construction:
Assuming that we had a working rust verifier the obvious approach would be: impl Foo {
pub fn new(bytes: &[u8]) -> Result<Self, Error> {
// This is the hypothetical verifier fn.
verify::<Self>(bytes)?;
Ok(get_root::<Self>(bytes))
}
pub unsafe fn new_unchecked(bytes: &[u8]) -> Self {
// Currently this is not actually unsafe because we do bounds checking in the accessors,
// but this would allow us to make accessing unchecked (performance) without it being a breaking change.
unsafe {
get_root::<Self>(bytes)
}
}
} So in that sense the |
Moreover I would argue that if you want some data in a flatbuffer that should not be validated by the verifier for performance reasons, this data should be expressed a an array of bytes. This way you can do the type checking only when you access said data. |
@jean-airoldie exactly, thats what I was talking about earlier in the thread with a @aardappel OK, if the performance concerns I'm listing have not been an actual issue in reality, then thats that - I'll defer to your experience here. Especially now that I know that the verifier can skip the primitives. Based on this it sounds like |
Just so we're clear, I'm talking about bounds check that would not be required if the type is known, as opposed to bounds check related to things that are not part of the type contract (i.e. the length of a vector). Take for instance |
@jean-airoldie I'm not sure I understand - That's why I like having a |
Not sure which Rust thread this belong to, but https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-35864 |
That's not good. |
@aardappel Argh. That kind of thing is one of the reasons I want to have a "safe and panic-ful" mode of access, even when Verifiers are used more wildly, since we can always make unsafe coding mistakes. I also think that adding miri is still a good idea, although I don't know if it would have caught that issue: #6203 |
I have a draft PR that fixes the miri problems by backing the vector with |
As I work on this PR, I think maybe what should be done in Rust is to base everything on The way byte_order does possibly-unaligned reads is via calls to Thoughts? |
I agree that having people use u8 should be a priority |
@CasperN Thanks for the fix! Good to know that things are once again safe. However, did that PR actually implement buffer verification? If not this issue should probably remain open |
Verification was implemented in #6161 |
This is a tracking issue to document design and implementation of a verifier system for Rust Flatbuffers. I'm thinking that we can clone the logic from the C++ codebase.
The benefit is twofold:
unsafe
pointer access in Rust, thereby removing bounds checking.Anyone have thoughts on this? @aardappel
The text was updated successfully, but these errors were encountered: