-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Flatbuffer Verifier for Rust #6161
Comments
cc @aardappel |
Linking cfb verifier for reference: https://github.com/nervosnetwork/cfb |
Okay, there's how I see it. First goes the trait: /// A trait to access untrusted data in the flatbuffer.
///
/// # Safety
///
/// * results should be stable (pure):
///
/// * for all calls with the same `buf` and `loc`
/// `follow` *must* return the same results.
///
/// * the trait implementation should guarantee that if `follow` returned `Some`
/// then call to `follow_unchecked` with the same `buf` and `loc`
/// would not invoke UB.
pub unsafe trait FollowChecked<'a>
where
Self: Follow<'a>,
{
/// Checked getter for untrusted data.
///
/// # Note to implementors
///
/// Please, try to avoid panics.
fn follow(buf: &'a [u8], loc: usize) -> Option<Self::Inner>;
} The name of the function is a bikeshed question. Let's look at how it would be implemented for unsafe impl FollowChecked<'_> for u64 {
#[inline]
fn follow(buf: &[u8], loc: usize) -> Option<Self::Inner> {
buf.get(loc..loc + size_of::<Self>()).and_then(|s| {
from_raw_bytes(s).map(|x| x.to_le())
})
}
}
/// Transmutation function.
#[inline]
pub fn from_raw_bytes<T: FromRawBytes>(s: &[u8]) -> Option<&T> {
assert!(size_of::<T>() != 0);
let ptr = s.as_ptr();
if ptr as usize % align_of::<T>() == 0 && s.len() >= size_of::<T> {
// SAFETY:
// there is at least `sizeof(T)` bytes,
// and the reference is aligned;
// transmute safety should be guaranteed by trait implementor.
Some(unsafe { &*(ptr as *const T) })
} else {
None
}
}
/// Marker trait for types that could be safely read from raw aligned bytes
/// in a generic way.
///
/// # Safety
///
/// * implementor type *must* be non-zero sized;
///
/// * it *must* be safe and sound to transmute
///
/// * `[u8; size_of::<Self>()]` into `Self`,
///
/// * `[u8; N * size_of::<Self>()]` into `[Self; N]`,
///
/// # Note
///
/// It is not the same as [`AsRawBytes`]! For example, \
/// `#[repr(C)] struct Q(u8, u16);` \
/// can implement `FromRawBytes`, but can't implement `AsRawBytes`,
/// because implicit padding is treated as uninitialized memory in Rust!
pub unsafe trait FromRawBytes: Copy {}
// SAFETY:
// all scalars could be safely transmuted from and into raw bytes.
unsafe impl FromRawBytes for u64 {} There are things like enums and bitflag-enums, for which we may want to check values or may not want.
Let's start with tables: #[derive(Debug)]
pub struct Table<'a> {
buf_start: NonNull<u8>,
loc: usize,
_lifetime: PhantomData<&'a [u8]>,
}
impl<'a> Table<'a> {
#[doc(hidden)]
#[inline]
pub const fn new(buf_start: NonNull<u8>, loc: usize) -> Self {
Self {
buf_start,
loc,
_lifetime: PhantomData,
}
}
/// # Safety
///
/// Absolutely none. It should not be ever called anywhere but autogenerated code.
#[inline]
unsafe fn vtable(&self) -> &'a [VOffset] {
let buf_start = self.buf_start.get();
let offset = SOffset::follow_unchecked(buf_start, self.loc);
let vtable_loc = (self.loc as isize - offset as isize) as usize;
let vtable_start = buf_start.add(vtable_loc);
let vtable_len = VOffset::follow_unchecked(vtable_start, 0);
slice::from_raw_parts(vtable_start as *const VOffset, vtable_len as usize / size_of::<VOffset>())
}
/// # Safety
///
/// Absolutely none. It should not be ever called anywhere but autogenerated code.
#[inline]
pub unsafe fn get_optional<T>(&self, vtable_slot: usize) -> Option<T::Inner>
where
T: ?Sized + Follow<'a>,
{
unsafe { self.vtable() }
.get(vtable_slot)
.and_then(|&table_offset| {
if table_offset == 0 {
None
} else {
// SAFETY:
// should be guaranteed to be safe by the caller.
Some(unsafe { T::follow_unchecked(self.buf_start.get(), self.loc + table_offset as usize) })
}
})
}
/// # Safety
///
/// Absolutely none. It should not be ever called anywhere but autogenerated code.
#[inline]
pub unsafe fn get_required<T>(&self, vtable_slot: VOffset) -> T::Inner
where
T: ?Sized + Follow<'a>,
{
debug_assert!(self.vtable().len() > vtable_slot as usize);
// SAFETY:
// should be guaranteed to be safe by the caller.
unsafe {
T::follow_unchecked(
self.buf_start.get(),
self.loc + *self.vtable().get_unchecked(vtable_slot as usize) as usize,
)
}
}
} Yeah, tables are unsafe as hell. How do we implement /// Given a `buf`, `loc` and a list of fields
///
/// * for unions:
/// `{ (UnionType, is_required, union_vtable_slot, associated_ptr_vtable_slot)
/// match { [union_value => AssociatedType,] }`
///
/// * for any other type:
/// `(Type, is_required, vtable_slot)`
///
/// returns `Some(Table)` if all fields given could be followed,
/// `None` otherwise.
///
/// Note: `*_slot` must be an offset in `VOffset`s from the start of vtable,
/// not in bytes! It also must be of type `usize`.
///
/// Another note: types must be explicitly required!
/// Unlike the schema, this macro allows structs and scalars to be optional.
#[macro_export]
macro_rules! follow_table {
($buf:expr, $loc:expr, [ $( $field:tt, )* ]) => {{
#[inline]
fn table_follow<'a>(buf: &'a [u8], loc: usize) -> Option<$crate::Table<'a>> {
use $crate::{
core::{mem::{size_of, align_of}, slice, ptr::NonNull}, SOffset, VOffset, FollowChecked,
};
match SOffset::follow(buf, loc) {
None => None,
Some(offset) => {
// `loc` could only have value in `0..isize::MAX` range
// due to slice restrictions (already successfully indexed the slice with `loc`).
// For any possible `loc`, whenever it overflows, it would stay in
// negative `isize` range, thus would not be a valid index into `buf`;
// whenever it underflows, it also would stay in negative `isize` range;
// thus this would never return `Some` on underflow or overflow,
// thus it is indeed stable, and if this returned `Some`,
// it would not be UB to call `follow_unchecked` with the same `buf` and `loc`
// (if all table types are checked, of course).
let vtable_loc = (loc as isize - offset as isize) as usize;
let vtable = buf.get(vtable_loc..)?;
let vtable: &'a [VOffset] = {
// This `follow` checks that `vtable` is aligned for `VOffset`.
let vtable_len = VOffset::follow(vtable, 0)? as usize;
debug_assert_eq!(vtable.as_ptr() as usize % align_of::<VOffset>(), 0);
let v = buf.get(vtable_loc..vtable_loc + vtable_len)?;
// SAFETY:
// it is aligned for `VOffset`,
// and `VOffset` could be safely transmuted from bytes.
unsafe { slice::from_raw_parts(v.as_ptr() as *const VOffset, v.len() / size_of::<VOffset>()) }
};
$(
$crate::follow_table! { vtable, buf, loc, $field }
)*
Some($crate::Table::new(NonNull::from(buf.first().unwrap()), loc))
}
}
}
table_follow($buf, $loc)
}};
// Union branch.
($vtable:expr, $buf:expr, $loc:expr,
{
($union_type:ty, $req:expr, $vtable_union_slot:expr, $vtable_assoc_type_slot:expr)
match {$(
$union_value:expr => $associated_type:ty,
)*}
}
) => {{
let (vtable, buf, loc) = ($vtable, $buf, $loc);
if $req {
let union_table_offset = *vtable.get($vtable_union_slot)?;
if union_table_offset == 0 {
// Required can't be default.
return None;
} else {
let assoc_loc = loc + *vtable.get($vtable_assoc_type_slot)? as usize;
match <$union_type>::follow(buf, loc + union_table_offset as usize)? {
$(
$union_value => { <$associated_type>::follow(buf, assoc_loc)?; },
)*
// If union is `None` or unknown value, it is always valid, yet ignored.
_ => (),
}
}
} else {
// If the slot is not required and is out of vtable range,
// then it is always valid to get, but default.
if let Some(&union_table_offset) = vtable.get($vtable_union_slot) {
// If it is `0`, then it is default, then it is valid.
if union_table_offset != 0 {
let assoc_loc = loc + *vtable.get($vtable_assoc_type_slot)? as usize;
match <$union_type>::follow(buf, loc + union_table_offset as usize)? {
$(
$union_value => { <$associated_type>::follow(buf, assoc_loc)?; },
)*
// If union is `None` or unknown value, it is always valid, yet ignored.
_ => (),
}
}
}
}
}};
// Any other field branch.
($vtable:expr, $buf:expr, $loc:expr, ($t:ty, $req:expr, $vtable_slot:expr)) => {{
if $req {
let table_offset = *$vtable.get($vtable_slot)?;
if table_offset == 0 {
// Required can't be default.
return None;
} else {
<$t>::follow($buf, $loc + table_offset as usize)?;
}
} else {
// If the slot is not required and is out of vtable range,
// then it is always valid to get, but default.
if let Some(&table_offset) = $vtable.get($vtable_slot) {
// If it is `0`, then it is default, then it is valid.
if table_offset != 0 {
<$t>::follow($buf, $loc + table_offset as usize)?;
}
}
}
}};
} Because we know all three parts of the triplet this macro needs at schema compilation time, this macro is easy to use in the generated code. Though, this macro could generate a lot of Rust code, and to speed up unchecked compilations we may want to put The macro is used like this: #[cfg(test)]
mod tests {
/// Welp, yeah, tests that this thing compiles at all.
#[test]
fn it_works() {
assert!(follow_table! {
&[],
0,
[
// Union field syntax.
{ (u64, true, 4, 5)
match {
// Actual types would be `ForwardsUOffset<str/[u8]>`;
// anyway, this is enough for a compilation test.
1 => str,
2 => [u8],
}
},
// Any other type field syntax.
(u32, true, 2),
(u16, true, 3),
]
}
.is_none());
}
} Slices would look like table: pointer to the buffer start, location and lifetime phantom marker. // Option<Self::Inner>
fn follow(buf: &'a [u8], loc: usize) -> Option<Slice<'a, T>> {
UOffset::follow(buf, loc).and_then(|len| {
let len = len as usize * size_of::<T>();
let start = loc + size_of::<UOffset>();
buf.get(start..start + len).and_then(|_| {
let mut ptr = start;
while ptr < start + len {
T::follow(buf, ptr)?;
ptr += size_of::<T>();
}
})
Some(unsafe { Slice { buf_start: NonNull::from(buf.first().unwrap()), loc: loc, _lifetime: PhantomData })
})
} About the /// A trait to access trusted data in the flatbuffer.
///
/// # Safety
///
/// Absolutely none. Implementing this trait outside this lib or autogenerated code is UB.
///
/// Consider this trait an implementation detail,
/// that we cannot hide due to autogenerated code.
///
/// ## But I'm writing for the lib!
///
/// Welp, then
///
/// * if `Self` is sized, then `sizeof(Self)` must equal to the actual number of bytes your implementation would read,
/// and `Self` must be non-zero-sized type;
///
/// * results should be stable (pure):
///
/// * for all calls with the same `buf` and `loc
/// `follow_unchecked` *must* return the same results
/// if the call with the given `buf` and `loc` is not UB.
pub unsafe trait Follow<'a> {
/// The target type to aquire.
type Inner;
/// Unchecked getter for already checked or trusted data.
///
/// # Safety
///
/// Absolutely none. Using this function outside of the lib is UB.
///
/// ## But I'm writing for the lib!
///
/// Read inexhaustive list of when `FollowChecked::follow` returns `None`.
/// Calling this function when `buf.wrapping_add(loc)` is unaligned
/// or when `follow` returns `None` is UB.
unsafe fn follow_unchecked(buf_start: *const u8, loc: usize) -> Self::Inner;
} The list: /// # When it returns `None`
///
/// First: if it always returns `None` whenever `&buf[loc]` is unaligned.
/// When is it unaligned? Well… It depends on the root type mostly,
/// but `Follow` is used to follow any type, not just root,
/// thus you can't actually check whether the given `buf` and `loc`
/// would produce aligned access or unaligned.
///
/// For undocumented implementations' behaviour, read the implementation description.
///
/// ## Primitives
///
/// For `u8, i8, u16, i16, u32, i32, u64, i64, f32, f64`
/// it returns `None` if `buf.len() < loc + size_of::<T>()` where `T` is one of those types.
///
/// ## Booleans
///
/// For [`bool`] and [`Bool`] it returns `None`
/// whenever `u8::follow(buf, loc)` returns `None`.
///
/// ## Native Rust slices
///
/// For `[T]` it returns `None` when
///
/// * `UOffset::follow(buf, loc)` returns `None`;
///
/// * `buf.len()
/// < loc
/// + size_of::<UOffset>()
/// + UOffset::follow(buf, loc).unwrap() as usize * size_of::<T>()`
///
/// Just to clarify things: [`UOffset`] is a type alias for one of primitives.
///
/// ## Flatbuffer slices
///
/// Same as native slices, plus it returns `None` if following any of the items in the slice returned `None`.
///
/// For more info read [`Slice::new`] documentation.
///
/// ## Strings
///
/// For string slices ([`str`]) it returns `None` when
///
/// * `<[u8]>::follow(buf, loc)` returns `None`;
///
/// * result of `<[u8]>::follow(buf, loc).unwrap()` does not contain valid UTF-8 sequence.
///
/// ## Offsets
///
/// For [`ForwardsUOffset<T>`] it returns `None` when
///
/// * `UOffset::follow(buf, loc)` returns `None`;
///
/// * `T::follow(buf, loc + UOffset::follow(buf, loc).unwrap() as usize)`
/// returns `None`.
///
/// [`bool`]: https://doc.rust-lang.org/std/primitive.bool.html
/// [`Bool`]: struct.Bool.html
/// [`UOffset`]: type.UOffset.html
/// [`Slice::new`]: slice/struct.Slice.html#method.new_unchecked
/// [`str`]: https://doc.rust-lang.org/std/primitive.str.html
/// [`ForwardsUOffset<T>`]: struct.ForwardsUOffset.html
The special flatbuffers /// A special flatbuffers bool.
///
/// Flatbuffers booleans are simple: any `u8` value is valid for it;
/// if the inner `u8` equal zero, then `Bool` maps to `false`,
/// any other value maps to `true`.
#[derive(Copy, Clone)]
#[repr(transparent)]
pub struct Bool(u8);
// SAFETY:
// `repr(transparent)` over `u8`;
// could be safely cast to bytes.
unsafe impl AsRawBytes for Bool {}
// `Bool` is one byte wide, always ordered.
impl ByteOrdered for Bool {}
impl Bool {
/// Constructor function.
#[inline]
pub const fn new(u: u8) -> Self {
Self(u)
}
/// Casts this custom `Bool` into the real Rust `bool`.
#[inline]
pub const fn as_bool(self) -> bool {
self.0 != 0
}
/// Casts a `&u8` to `&Bool`.
#[inline]
pub fn from_ref(u: &u8) -> &Self {
// SAFETY: `repr(transparent)`, thus it's safe
unsafe { &*(u as *const u8 as *const Bool) }
}
/// Casts a slice of bytes to slice of `Bool`s.
#[inline]
pub fn from_slice(s: &[u8]) -> &[Bool] {
// SAFETY:
// `repr(transparent)` over `u8` guarantee safe transmutation,
// while other invariants are guaranteed by `s` being a valid slice
// of items with the same align and size as `Bool`.
unsafe { slice::from_raw_parts(s.as_ptr() as *const Bool, s.len()) }
}
} Marker traits and transmutation functions that would replace unsound use core::{
mem::{align_of, size_of},
slice,
};
/// Transmutation function.
#[inline]
pub fn as_raw_bytes<T: AsRawBytes>(t: &T) -> &[u8] {
assert!(size_of::<T>() != 0);
// SAFETY:
// transmute safety should be guaranteed by trait implementor.
unsafe { slice::from_raw_parts(t as *const T as *const u8, size_of::<T>()) }
}
/// Slice transmutation function.
#[inline]
pub fn slice_as_raw_bytes<T: AsRawBytes>(s: &[T]) -> &[u8] {
assert!(size_of::<T>() != 0);
// SAFETY:
// the pointer and length are obtained from valid slice, thus pointer is valid,
// and slice length *in bytes* is never bigger than `isize::MAX`;
// transmute safety should be guaranteed by trait implementor.
unsafe { slice::from_raw_parts(s.as_ptr() as *const u8, s.len() * size_of::<T>()) }
}
/// Transmutation function.
#[inline]
pub fn from_raw_bytes<T: FromRawBytes>(s: &[u8]) -> Option<&T> {
assert!(size_of::<T>() != 0);
let ptr = s.as_ptr();
if ptr as usize % align_of::<T>() == 0 && s.len() >= size_of::<T> {
// SAFETY:
// there is at least `sizeof(T)` bytes,
// and the reference is aligned;
// transmute safety should be guaranteed by trait implementor.
Some(unsafe { &*(ptr as *const T) })
} else {
None
}
}
/// Slice transmutation function.
#[inline]
pub fn slice_from_raw_bytes<T: FromRawBytes>(s: &[u8]) -> Option<&[T]> {
assert!(size_of::<T>() != 0);
let ptr = s.as_ptr();
if ptr as usize % align_of::<T>() == 0 {
// SAFETY:
// the pointer and length are obtained from valid slice,
// and pointer is checked to be aligned for `T`;
// transmute safety should be guaranteed by trait implementor.
Some(unsafe { slice::from_raw_parts(ptr as *const T, s.len() / size_of::<T>()) })
} else {
None
}
}
/// Slice transmutation function that does not check alignment.
///
/// # Safety
///
/// `s` must be aligned for `T`.
#[inline]
pub unsafe fn slice_from_raw_bytes_unchecked<T: FromRawBytes>(s: &[u8]) -> &[T] {
assert!(size_of::<T>() != 0);
// SAFETY:
// the pointer and length are obtained from valid slice,
// and pointer is checked to be aligned for `T`;
// transmute safety should be guaranteed by trait implementor.
unsafe { slice::from_raw_parts(s.as_ptr() as *const T, s.len() / size_of::<T>()) }
}
/// Marker trait for types that are represented in little endian byte order.
pub trait ByteOrdered {}
/// Marker trait for types that could be safely represented as raw bytes
/// in a generic way.
///
/// # Safety
///
/// * implementor type *must* be non-zero sized;
///
/// * it *must* be safe and sound to transmute
///
/// * `Self` into `[u8; size_of::<Self>()]`;
///
/// * `[Self; N]` into `[u8; N * size_of::<Self>()]`,
///
/// # Note
///
/// It is not the same as [`FromRawBytes`]! For example, `NonZeroI32`
/// can implement `AsRawBytes`, but can't implement `FromRawBytes`,
/// because raw bytes could contain invalid value for `NonZeroI32`!
pub unsafe trait AsRawBytes: Copy {}
/// Marker trait for types that could be safely read from raw aligned bytes
/// in a generic way.
///
/// # Safety
///
/// * implementor type *must* be non-zero sized;
///
/// * it *must* be safe and sound to transmute
///
/// * `[u8; size_of::<Self>()]` into `Self`,
///
/// * `[u8; N * size_of::<Self>()]` into `[Self; N]`,
///
/// # Note
///
/// It is not the same as [`AsRawBytes`]! For example, \
/// `#[repr(C)] struct Q(u8, u16);` \
/// can implement `FromRawBytes`, but can't implement `AsRawBytes`,
/// because implicit padding is treated as uninitialized memory in Rust!
pub unsafe trait FromRawBytes: Copy {} The name Though, Implementing codegen for this would be easier with #6098 , because recursive struct checks are a little bit tricky. Any questions other than bikeshedding Edit: forgot to tell about unbounded lifetime in the pub unsafe fn get_root_unchecked<'a, T>(buf: &'a [u8]) -> T
where
T: Follow<'a> + Root
{
debug_assert_eq!(buf.as_ptr() as usize % align_of::<T::Aligned>(), 0);
ForwardsOffset<T>::follow_unchecked(buf.as_ptr(), 0)
} Edit 2: okay, forgot about /// # Safety
///
/// Absolutely none. Implementing it anywhere but generated code is UB.
///
/// ## But I'm writing the codegen!
///
/// Then `Aligned` must be a generated type that looks like
/// `#[repr(C, align(N))] pub struct BytesFor[RootIdent]([u8; N]);`
/// where `N` is the maximum possible alignment for the root
/// (which is the maximujm of all possibly-contained structs, including nested flatbuffers),
/// and `[RootIdent]` is the name of the root.
pub unsafe trait Root {
type Aligned: FromRawBytes + AsRawBytes;
} This trait would also allow provide some safety guarantees for some of the builder buffers, because Without this trait creating builder would always be unsafe. Though, it's not yet clear how builder would look to use the |
The biggest problem with this idea (besides a ton of unsafe) is that it continues to work on raw u8 slices, which doesn't give us anything as far as validation is concerned. The verifier should return a type representing a validated view into a given buffer (this was discussed some time ago). Therefore we would get essentially two ways of accessing data:
Our use cases can be broken down as:
Number 1 can be achieved with unchecked access to a slice and is inherently unsafe. The user takes full responsibility for trusting the buffer. |
Yes. This And the problem is that we don't have anything of this right now. The getters are not the fastest, nor are they sound and totally safe. We cannot actually parse flatbuffer into valid struct without unsafe inside, because that would involve mutation of the buffer (which cannot be done for a number of reasons) and/or new allocations. However, we can totally validate it and guarantee that if user only uses safe functions from the generated code and no raw constructors, it is safe; and that's what Getters in tables would look like fn get_color(&self) -> Color {
unsafe { self.tab.get_required(Self::COLOR_SLOT) }
} But for Edit: "ton of unsafe" is not a problem, it is reality. All those things that are unsafe are unsafe, and no less. |
2 is useful when interested only in a subset of data. So that FollowChecked would be implemented for every field or just for the table? It looks like for every field, which defeats the purpose of one-time validation. Also, what's the definition of follow_unchecked? |
Can C/C++ verifier check only subset of data in the schema? Anyway, you can do this (even if not with how flatc works right now) by shrinking the schema down to fields you are interested in. Flatc won't allow you to write Edit: and if we'd use newtype enums instead of native, we won't ever need to |
Irrelevant for the use case.
Also - irrelevant. This is a use case which we can support with low effort. What's relevant are my two previous questions. Can you answer them, so we can see if this is a viable way? |
You wanna answers? Nice.
I wrote. Re-read my first comment, it is there.
No, it can't. But can you implement something like table with all getters both safe and as fast as possible? For example, if you worked with trusted buffers, but then started to work with untrusted (for example, when it was local first, and then also network), with When all getters are either unsafe or safe but fallible, you need to rewrite all the code that works with it. It is against DRY, it is against "parse, don't validate", and finally, it is uncommon use case, which we can easily make working with a little change in schema parser. |
Found a problem in the macro for tables: can't check unions. There actually should be a match for a union, and only then we'd know types. Shouldn't be too hard to fix, though. |
I searched for the definition of follow_unchecked and there seems to be none, hence my question. Also from your second answer I can't tell what "no" refers to. You mean it will be implemented for each field or the whole table once? |
Only the whole table. If you check the table partially, but generate getters for all fields, it would be unsound. Unless, of course, you do not generate double getter "unsafe / safe-fallible" for unchecked fields. I think, you should read flatCC verification doc. I mostly followed this, with a bunch of exceptions, that could be closed lately, if needed: tl;dr: Edit: |
You provided a description, not definition. Please provide a definition, so we can see the whole proposed process of retrieving data. |
That's how it looks for unsafe fn follow_unchecked(buf_start: *const u8, loc: usize) -> u64 {
// SAFETY: this is guaranteed to be safe by the caller.
unsafe { (*(buf_start.add(loc) as *const u64)).to_le() }
} For tables it simply stores unsafe fn follow_unchecked(buf_start: *const u8, loc: usize) -> &'a Struct {
// SAFETY: this is guaranteed to be safe by the caller.
unsafe { &*(buf_start.add(loc) as *const Struct) }
} If enums are just a For native enums it would be a transmute after following integer. But we can't transmute, lol, because it could be invalid value and UB; invalid values are not forbidden by flatbuffer schema evolution strategy. Thus only newtype enums could be followed, and native enums could be only stored as integer. Are there any other types I forgot about? |
This gives us only the unsafe path for a trusted buffer - we're still missing a verification step. |
Lolwut? Read about And how do you want to access a buffer without unsafe and without checks? Even the buffer you just built could be invalid, if you won't make all pushing functions in builder unsafe. |
First of all, let's try to keep to constructive arguments and drop "lol", "lolwut", "mkay" and similar teenage language. Second - FollowChecked is implemented for every field, as I see in your example. To whole point of having a verifier is to not do that, but have a single step of verification, after which every access can skip all checks. The follow_unchecked approach accomplishes the latter, but we still need the former. Or, if I'm seeing it wrong, please provide a full example, from start to finish. |
|
Basing on:
it rather seems FollowChecked is being used by getters to get the actual data. It doesn't verify the validity - it simply reads data. |
It only reads valid data. See table macro, and Edit: welp, yes, it doesn't have to return |
We already have fallible getters, so there's no need to make such changes - we only need to get rid of all those unwraps. The problem with returning a bool, like c++ does, is that it doesn't enforce anything. That said - the same buffer put into such verification can be used by the unchecked api and it will compile just fine, even if the buffer is invalid. If unchecked getters for such "raw" buffer are unsafe, then ok - it's the user's decision. But, if the getters are to be safe, we are breaking the contract of not causing UB via a safe api. Therefore safe+unchecked api should be restricted to this verified buffer view, I talked about earlier. That way there's no possibility to misuse the api and the contract is upheld. |
Updated the table macro and its example in my first comment. |
Okay, there is a lot to respond to and I haven't read all of it yet so sorry if I missed something.
Imo, we only want APIs 1 and 3. API 2 is kind of like lazy validation. I get it, but its a niche use-case, and less important after you have 1 and 3. Also, we should implement the verifier such that you could choose to only validate a whole "subtree", rather than each node of the tree. That will sufficiently cover the high performance / lazy use case. E.g. // Imagine we're some high performance networking service that inspects metadata but not
// the rest of the message.
let m: Message = unsafe { flatbuffers::get_root_unchecked(buf) };
let metadata = m.metadata().verify(verifier)?; Also, I'm assuming today's pub fn get_root<'a, T: Follow<'a> + 'a>(data: &'a [u8], verifier:: Verifier) -> Result<T::Inner, VerifierError> {
let tab: T = unsafe { get_root_unchecked(data) };
return tab.verify(verifier);
}
That sounds like what is usually called
I think I agree. The principle is that we must not panic or cause UB unless the user typed "unsafe" somewhere and that we must wrap verified buffers in |
Yeah, it's niche and partial validation would cover that. The question is about added complexity to such solution. Right now we can implement 2 simply by removing unwraps from current accessors and passing on the result.
That would be ergonomic but the resulting type of both |
I'm not concerned about implementation complexity. I agree its small. I think its not worth the API complexity.
Yes pub unsafe fn get_root_unchecked<'a, T: Follow<'a> + 'a>(data: &'a [u8]) -> T::Inner;
pub fn get_root<'a, T: Follow<'a> + 'a>(data: &'a [u8], verifier:: Verifier) -> Result<T::Inner, VerifierError>; |
That's not enough. Let's assume
But, since we return the same thing from
The return types of both |
Welp, However, I think that such partial usage should be expressed in a schema, not in code: probably with
Nope, see previous part of this comment. No unsafe in this function, only safe Edit: also, the bound is |
We're discussing potential api, not your exact proposal. |
I understand people not wanting unsafe - I'm one of them. That's why the only unsafe api I find justified is for people implicitly trusting buffers and wanting the fastest access possible. All other use cases should have a safe api, either provided by the verifier or by fallible accessors. Those are not mutually exclusive and fit both our lists of use cases. |
Agree. I propose a short term state where regardless of verification, we do some checks and panics so we at least don't hit UB (RW 1). In the long term, assuming we trust the verifier, we should deprecate, or flag-gate, the checks in normal accessors so the default API does not do any checking after the verification step (rw 2, krojew 3). Users may skip the verification via
I'll grant that this kind of lazy check-as-you-go verification is probably the most "rust-y" thing to do since it allows for the finest-grained error handling, and that it'll be more performant for very large messages where only a few fields are read, but I claim that these aren't the most common cases. The normal Flatbuffers verify-once-then-trust model addresses the common cases more simply. Unless users raise issues claiming they need this, and have supporting benchmarks, I don't think we should do it. |
I'm not going to get into y'alls API debates, but please keep things somewhat ergonomic :) Also, when implementing, please check the C++ verifier.. a fair bit of work has gone into fixing corner cases, and you'll waste a lot of time if not at least the implementation code derives from this. Your only hope getting this code somewhat solid is by fuzzing, so I'd plan for that. |
I think we need to push this forward. Let's summarize our proposals @rw @CasperN, so we can start discussing the details. My proposal is to generate two versions of tables:
What are your ideas? |
I think we should do 2. We can hold off on removing checks until later. Details:
[1] flatbuffers/include/flatbuffers/flatbuffers.h Line 2202 in 5be777e
|
I still think returning another type instead of |
I don't think I understand the question, or what the |
I'm not sure what you mean,
functions associated with types (methods) can refer to their associated type as impl Foo {
fn new() -> Self; // Self means Foo in this `impl` scope.
} I was being confusing.
Somehow, I think Rust users would prefer more detailed errors, but whatever, this can be discussed when there's an implementation. |
I mean we should only have access to safe and unchecked getters after verification. Which implies |
There are a few ways to get Flatbuffers. Via Those factory functions should have Mildly related: We really should make a trait flatbuffers::GeneratedTable {
type Args;
type Builder;
type Object; // as in Object API.
/// Callers must verify the table before using this function because accessors may be unchecked
/// causing panics or UB. The `verify` method will do this, or the caller may trust that the table
/// was generated correctly.
unsafe fn init_from_table(&Table) -> Self;
// Makes sure all offsets are within the buffer and whatever else we want to promise so we can
// skip checks in accessors.
// verify could, equivalently take `self` or `Table` as arguments.
fn verify(buf: &[u8], loc: usize) -> Option<InvalidFlatbuffer>;
fn get_root(buf: [&u8]) -> Result<Self, InvalidFlatbuffer>;
// Builder stuff
fn new(&mut builder: FlatBufferBuilder) -> Self::Builder;
fn create(&mut builder: FlatBufferBuilder, args: &Self::Args) -> WipOffset<Self>;
// Object API stuff
fn unpack(&self) -> Self::Object;
} |
That's the issue I have with this idea - we place a requirement on the user, which can only be verified at runtime. In contrast, using a separate verified type allows us to enforce this at compile time. This is not only safer by design, but very Rust-like. I would personally prefer the compiler telling me I didn't do something, than crash message. |
That's why |
Even with this, any error will get caught at runtime. We're still relying on the user to call something before using the table, without any enforcement, and it's irrelevant if this something is marked as unsafe or not. Let's give a visual example. Having a piece of code like this for a table Foo:
Can we tell if this code is valid? We can't because we don't know what was called before. This code is safe, yet it may crash and cannot reason about it at all. Now let's consider another example:
Assuming VerifiedFoo comes from a verification step, by looking at this code, we can already tell it will not crash, because the underlying buffer must have been validated. We don't care what was written before the function call. The compiler will not allow this call from an unverified buffer, hence we don't allow crashes by simple mistakes. |
I disagree. We all can invoke |
2 types pros:
2 types cons:
What are pros and cons for your solution? |
Pros:
Cons:
I think we agree on the pros and cons of the two approaches. The "what" of the problem, if you will. Imo: Simplicity and maintainability is more important than the marginal safety benefits for an advanced and niche use case. Most Rust users will not use the unsafe API. We should optimize for the common/simple/95% use cases. We should optimize for maintainability since we don't have a lot of maintainers. Those that use unsafe to skip verification, presumably for performance reasons, will also be among our most sophisticated users and can handle the slightly more difficult contract. This decision, to do less, is also an easily reversible decision. We can always add the |
That's a good argument in general. I think it's hard to tell how much more work it would take to maintain two types; my guess is - not much, but it's just an assumption. |
@krojew @ImmemorConsultrixContrarie would one of you like to be assigned this issue? I'd say this should be our highest priority right now. This or the broken namespaces thing, which is causing user pain. |
My time is extremely limited for the foreseeable future, so I'm not the best candidate for hot tasks. |
I kinda already did this, read my first comment in this issue. If you have any questions I'd try to answer them. If you don't like the design, welp, can't help it, won't rewrite into something I don't like. |
Just noticed current master is no longer compatible with cfb verifier. This effectively means we shouldn't release a new version without the verifier, otherwise there's no way to use untrusted buffers. |
Tracking issue for the title.
@rw @krojew @ImmemorConsultrixContrarie
The text was updated successfully, but these errors were encountered: