Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Flatbuffer Verifier for Rust #6161

Closed
CasperN opened this issue Oct 5, 2020 · 61 comments
Closed

Implement Flatbuffer Verifier for Rust #6161

CasperN opened this issue Oct 5, 2020 · 61 comments
Labels

Comments

@CasperN
Copy link
Collaborator

CasperN commented Oct 5, 2020

Tracking issue for the title.

@rw @krojew @ImmemorConsultrixContrarie

@rw
Copy link
Collaborator

rw commented Oct 6, 2020

cc @aardappel

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

Linking cfb verifier for reference: https://github.com/nervosnetwork/cfb

@ImmemorConsultrixContrarie
Copy link
Contributor

ImmemorConsultrixContrarie commented Oct 6, 2020

Okay, there's how I see it.

First goes the trait:

/// A trait to access untrusted data in the flatbuffer.
///
/// # Safety
///
/// * results should be stable (pure):
///
///     * for all calls with the same `buf` and `loc`
///         `follow` *must* return the same results.
///
/// * the trait implementation should guarantee that if `follow` returned `Some`
///     then call to `follow_unchecked` with the same `buf` and `loc`
///     would not invoke UB.
pub unsafe trait FollowChecked<'a>
where
    Self: Follow<'a>,
{
    /// Checked getter for untrusted data.  
    ///
    /// # Note to implementors
    ///
    /// Please, try to avoid panics. 
    fn follow(buf: &'a [u8], loc: usize) -> Option<Self::Inner>;
}

The name of the function is a bikeshed question.

Let's look at how it would be implemented for u64:

unsafe impl FollowChecked<'_> for u64 {
    #[inline]
    fn follow(buf: &[u8], loc: usize) -> Option<Self::Inner> {
        buf.get(loc..loc + size_of::<Self>()).and_then(|s| {
            from_raw_bytes(s).map(|x| x.to_le())
        })
    }
}

/// Transmutation function.
#[inline]
pub fn from_raw_bytes<T: FromRawBytes>(s: &[u8]) -> Option<&T> {
    assert!(size_of::<T>() != 0);

    let ptr = s.as_ptr();
    if ptr as usize % align_of::<T>() == 0 && s.len() >= size_of::<T> {
        // SAFETY:
        // there is at least `sizeof(T)` bytes,
        // and the reference is aligned;
        // transmute safety should be guaranteed by trait implementor.
        Some(unsafe { &*(ptr as *const T) })
    } else {
        None
    }
}

/// Marker trait for types that could be safely read from raw aligned bytes
/// in a generic way.
///          
/// # Safety
///
/// * implementor type *must* be non-zero sized;
///                                                                       
/// * it *must* be safe and sound to transmute
///
///     * `[u8; size_of::<Self>()]` into `Self`,
///
///     * `[u8; N * size_of::<Self>()]` into `[Self; N]`,
///
/// # Note
///
/// It is not the same as [`AsRawBytes`]! For example, \
/// `#[repr(C)] struct Q(u8, u16);` \
/// can implement `FromRawBytes`, but can't implement `AsRawBytes`,
/// because implicit padding is treated as uninitialized memory in Rust!   
pub unsafe trait FromRawBytes: Copy {}

// SAFETY:
// all scalars could be safely transmuted from and into raw bytes.
unsafe impl FromRawBytes for u64 {}

There are things like enums and bitflag-enums, for which we may want to check values or may not want.
If we go #6098 way and do not check enum and bitflag values while parsing the data (without adding #[repr(transparent)] to those newtype enums and bitflag structs it is UB to read them from bytes, pls fix, @CasperN !), then we don't check anything in structs besides alignment and size, because any aligned slice of initialized bytes would be valid for the struct if it has length big enough.
If we go with native enums like #6158 , we'd need to check enums and structs that contain enums or contain structs with enums (recursion limit reached).

Follow trait, tables and slices (in flatbuffer you call it "vectors", but in Rust vectors are resizable, and those slices are not resizable, nor are they constant-sized to be called "array"), would need rework to be sound and to use safety guarantees we can provide with FollowChecked.

Let's start with tables:

#[derive(Debug)]
pub struct Table<'a> {
    buf_start: NonNull<u8>,
    loc: usize,
    _lifetime: PhantomData<&'a [u8]>,
}

impl<'a> Table<'a> {
    #[doc(hidden)]
    #[inline]
    pub const fn new(buf_start: NonNull<u8>, loc: usize) -> Self {
        Self {
            buf_start,
            loc,
            _lifetime: PhantomData,
        }
    }

    /// # Safety
    ///
    /// Absolutely none. It should not be ever called anywhere but autogenerated code.
    #[inline]
    unsafe fn vtable(&self) -> &'a [VOffset] {
        let buf_start = self.buf_start.get();
        let offset = SOffset::follow_unchecked(buf_start, self.loc);
        let vtable_loc = (self.loc as isize - offset as isize) as usize;
        let vtable_start = buf_start.add(vtable_loc);
        let vtable_len = VOffset::follow_unchecked(vtable_start, 0);
        slice::from_raw_parts(vtable_start as *const VOffset, vtable_len as usize / size_of::<VOffset>())
    }

    /// # Safety
    ///
    /// Absolutely none. It should not be ever called anywhere but autogenerated code.
    #[inline]
    pub unsafe fn get_optional<T>(&self, vtable_slot: usize) -> Option<T::Inner>
    where
        T: ?Sized + Follow<'a>,
    {
        unsafe { self.vtable() }
            .get(vtable_slot)
            .and_then(|&table_offset| {
                if table_offset == 0 {
                    None
                } else {
                    // SAFETY:
                    // should be guaranteed to be safe by the caller.
                    Some(unsafe { T::follow_unchecked(self.buf_start.get(), self.loc + table_offset as usize) })
                }
            })
    }

    /// # Safety
    ///
    /// Absolutely none. It should not be ever called anywhere but autogenerated code.
    #[inline]
    pub unsafe fn get_required<T>(&self, vtable_slot: VOffset) -> T::Inner
    where
        T: ?Sized + Follow<'a>,
    {
        debug_assert!(self.vtable().len() > vtable_slot as usize);
        // SAFETY:
        // should be guaranteed to be safe by the caller.
        unsafe {
            T::follow_unchecked(
                self.buf_start.get(),
                self.loc + *self.vtable().get_unchecked(vtable_slot as usize) as usize,
            )
        }
    }
}

Yeah, tables are unsafe as hell. How do we implement FollowChecked for tables then? We don't. Instead, we use this macro to implement FollowChecked for each wrapper struct around table:

/// Given a `buf`, `loc` and a list of fields
///
/// * for unions:
///     `{ (UnionType, is_required, union_vtable_slot, associated_ptr_vtable_slot)
///         match { [union_value => AssociatedType,] }`
///
/// * for any other type:
///     `(Type, is_required, vtable_slot)`
///
/// returns `Some(Table)` if all fields given could be followed,
/// `None` otherwise.
///
/// Note: `*_slot` must be an offset in `VOffset`s from the start of vtable,
/// not in bytes! It also must be of type `usize`.
///
/// Another note: types must be explicitly required!
/// Unlike the schema, this macro allows structs and scalars to be optional.
#[macro_export]
macro_rules! follow_table {
    ($buf:expr, $loc:expr, [ $( $field:tt, )* ]) => {{
        #[inline]
        fn table_follow<'a>(buf: &'a [u8], loc: usize) -> Option<$crate::Table<'a>> {
            use $crate::{
                core::{mem::{size_of, align_of}, slice, ptr::NonNull}, SOffset, VOffset, FollowChecked,
            };

            match SOffset::follow(buf, loc) {
                None => None,
                Some(offset) => {
                    // `loc` could only have value in `0..isize::MAX` range
                    // due to slice restrictions (already successfully indexed the slice with `loc`).
                    // For any possible `loc`, whenever it overflows, it would stay in
                    // negative `isize` range, thus would not be a valid index into `buf`;
                    // whenever it underflows, it also would stay in negative `isize` range;
                    // thus this would never return `Some` on underflow or overflow,
                    // thus it is indeed stable, and if this returned `Some`,
                    // it would not be UB to call `follow_unchecked` with the same `buf` and `loc`
                    // (if all table types are checked, of course).
                    let vtable_loc = (loc as isize - offset as isize) as usize;
                    let vtable = buf.get(vtable_loc..)?;

                    let vtable: &'a [VOffset] = {
                        // This `follow` checks that `vtable` is aligned for `VOffset`.
                        let vtable_len = VOffset::follow(vtable, 0)? as usize;
                        debug_assert_eq!(vtable.as_ptr() as usize % align_of::<VOffset>(), 0);

                        let v = buf.get(vtable_loc..vtable_loc + vtable_len)?;
                        // SAFETY:
                        // it is aligned for `VOffset`,
                        // and `VOffset` could be safely transmuted from bytes.
                        unsafe { slice::from_raw_parts(v.as_ptr() as *const VOffset, v.len() / size_of::<VOffset>()) }
                    };

                    $(
                        $crate::follow_table! { vtable, buf, loc, $field }
                    )*

                    Some($crate::Table::new(NonNull::from(buf.first().unwrap()), loc))
                }
            }
        }

        table_follow($buf, $loc)
    }};
    // Union branch.
    ($vtable:expr, $buf:expr, $loc:expr,
        {
            ($union_type:ty, $req:expr, $vtable_union_slot:expr, $vtable_assoc_type_slot:expr)
            match {$(
                $union_value:expr => $associated_type:ty,
            )*}
        }
    ) => {{
        let (vtable, buf, loc) = ($vtable, $buf, $loc);
        if $req {
            let union_table_offset = *vtable.get($vtable_union_slot)?;
            if union_table_offset == 0 {
                // Required can't be default.
                return None;
            } else {
                let assoc_loc = loc + *vtable.get($vtable_assoc_type_slot)? as usize;
                match <$union_type>::follow(buf, loc + union_table_offset as usize)? {
                    $(
                        $union_value => { <$associated_type>::follow(buf, assoc_loc)?; },
                    )*
                    // If union is `None` or unknown value, it is always valid, yet ignored.
                    _ => (),
                }
            }
        } else {
            // If the slot is not required and is out of vtable range,
            // then it is always valid to get, but default.
            if let Some(&union_table_offset) = vtable.get($vtable_union_slot) {
                // If it is `0`, then it is default, then it is valid.
                if union_table_offset != 0 {
                    let assoc_loc = loc + *vtable.get($vtable_assoc_type_slot)? as usize;
                    match <$union_type>::follow(buf, loc + union_table_offset as usize)? {
                        $(
                            $union_value => { <$associated_type>::follow(buf, assoc_loc)?; },
                        )*
                        // If union is `None` or unknown value, it is always valid, yet ignored.
                        _ => (),
                    }
                }
            }
        }
    }};
    // Any other field branch.
    ($vtable:expr, $buf:expr, $loc:expr, ($t:ty, $req:expr, $vtable_slot:expr)) => {{
        if $req {
            let table_offset = *$vtable.get($vtable_slot)?;
            if table_offset == 0 {
                // Required can't be default.
                return None;
            } else {
                <$t>::follow($buf, $loc + table_offset as usize)?;
            }
        } else {
            // If the slot is not required and is out of vtable range,
            // then it is always valid to get, but default.
            if let Some(&table_offset) = $vtable.get($vtable_slot) {
                // If it is `0`, then it is default, then it is valid.
                if table_offset != 0 {
                    <$t>::follow($buf, $loc + table_offset as usize)?;
                }
            }
        }
    }};
}

Because we know all three parts of the triplet this macro needs at schema compilation time, this macro is easy to use in the generated code. Though, this macro could generate a lot of Rust code, and to speed up unchecked compilations we may want to put FollowChecked implementations into another file, so it would be linked only when it is really needed.

The macro is used like this:

#[cfg(test)]
mod tests {
    /// Welp, yeah, tests that this thing compiles at all.
    #[test]
    fn it_works() {
        assert!(follow_table! {
            &[],
            0,
            [
                // Union field syntax.
                { (u64, true, 4, 5)
                    match {
                        // Actual types would be `ForwardsUOffset<str/[u8]>`;
                        // anyway, this is enough for a compilation test.
                        1 => str,
                        2 => [u8],
                    }
                },
                // Any other type field syntax.
                (u32, true, 2),
                (u16, true, 3),
            ]
        }
        .is_none());
    }
}

Slices would look like table: pointer to the buffer start, location and lifetime phantom marker.
FollowChecked should be implemented like this:

// Option<Self::Inner>
fn follow(buf: &'a [u8], loc: usize) -> Option<Slice<'a, T>> {
    UOffset::follow(buf, loc).and_then(|len| {
        let len = len as usize * size_of::<T>();
        let start = loc + size_of::<UOffset>();
        buf.get(start..start + len).and_then(|_| {
            let mut ptr = start;
            while ptr < start + len {
                T::follow(buf, ptr)?;
                ptr += size_of::<T>();
            }
        })
        Some(unsafe { Slice { buf_start: NonNull::from(buf.first().unwrap()), loc: loc, _lifetime: PhantomData })
    })
}

About the Follow trait: it doesn't actually need a [u8] for trusted access, only *const u8 that points to the start of the buffer.
And to allow tables and slices be only two usize wide, it should look like this:

/// A trait to access trusted data in the flatbuffer.
///
/// # Safety
///
/// Absolutely none. Implementing this trait outside this lib or autogenerated code is UB.
///
/// Consider this trait an implementation detail,
/// that we cannot hide due to autogenerated code.
///
/// ## But I'm writing for the lib!
///
/// Welp, then
///
/// * if `Self` is sized, then `sizeof(Self)` must equal to the actual number of bytes your implementation would read,
///     and `Self` must be non-zero-sized type;
///
/// * results should be stable (pure):
///
///     * for all calls with the same `buf` and `loc
///         `follow_unchecked` *must* return the same results
///         if the call with the given `buf` and `loc` is not UB.
pub unsafe trait Follow<'a> {
    /// The target type to aquire.
    type Inner;

    /// Unchecked getter for already checked or trusted data.
    ///
    /// # Safety
    ///
    /// Absolutely none. Using this function outside of the lib is UB.
    ///
    /// ## But I'm writing for the lib!
    ///
    /// Read inexhaustive list of when `FollowChecked::follow` returns `None`.
    /// Calling this function when `buf.wrapping_add(loc)` is unaligned
    /// or when `follow` returns `None` is UB.
    unsafe fn follow_unchecked(buf_start: *const u8, loc: usize) -> Self::Inner;
}

The list:

/// # When it returns `None`
///
/// First: if it always returns `None` whenever `&buf[loc]` is unaligned.
/// When is it unaligned? Well… It depends on the root type mostly,
/// but `Follow` is used to follow any type, not just root,
/// thus you can't actually check whether the given `buf` and `loc`
/// would produce aligned access or unaligned.
///
/// For undocumented implementations' behaviour, read the implementation description.
///
/// ## Primitives
///
/// For `u8, i8, u16, i16, u32, i32, u64, i64, f32, f64`
/// it returns `None` if `buf.len() < loc + size_of::<T>()` where `T` is one of those types.
///
/// ## Booleans
///
/// For [`bool`] and [`Bool`] it returns `None`
/// whenever `u8::follow(buf, loc)` returns `None`.
///
/// ## Native Rust slices
///
/// For `[T]` it returns `None` when
///
/// * `UOffset::follow(buf, loc)` returns `None`;
///
/// * `buf.len()
///     < loc
///         + size_of::<UOffset>()
///         + UOffset::follow(buf, loc).unwrap() as usize * size_of::<T>()`
///
/// Just to clarify things: [`UOffset`] is a type alias for one of primitives.
///
/// ## Flatbuffer slices
///
/// Same as native slices, plus it returns `None` if following any of the items in the slice returned `None`.
///
/// For more info read [`Slice::new`] documentation.
///
/// ## Strings
///
/// For string slices ([`str`]) it returns `None` when
///
/// * `<[u8]>::follow(buf, loc)` returns `None`;
///
/// * result of `<[u8]>::follow(buf, loc).unwrap()` does not contain valid UTF-8 sequence.
///
/// ## Offsets
///
/// For [`ForwardsUOffset<T>`] it returns `None` when
///
/// * `UOffset::follow(buf, loc)` returns `None`;
///
/// * `T::follow(buf, loc + UOffset::follow(buf, loc).unwrap() as usize)`
///     returns `None`.
///
/// [`bool`]: https://doc.rust-lang.org/std/primitive.bool.html
/// [`Bool`]: struct.Bool.html
/// [`UOffset`]: type.UOffset.html
/// [`Slice::new`]: slice/struct.Slice.html#method.new_unchecked
/// [`str`]: https://doc.rust-lang.org/std/primitive.str.html
/// [`ForwardsUOffset<T>`]: struct.ForwardsUOffset.html

The special flatbuffers Bool that allows to get a slice of bools from flatbuffer slice:

/// A special flatbuffers bool.
///
/// Flatbuffers booleans are simple: any `u8` value is valid for it;
/// if the inner `u8` equal zero, then `Bool` maps to `false`,
/// any other value maps to `true`.
#[derive(Copy, Clone)]
#[repr(transparent)]
pub struct Bool(u8);

// SAFETY:
// `repr(transparent)` over `u8`;
// could be safely cast to bytes.
unsafe impl AsRawBytes for Bool {}

// `Bool` is one byte wide, always ordered.
impl ByteOrdered for Bool {}

impl Bool {
    /// Constructor function.
    #[inline]
    pub const fn new(u: u8) -> Self {
        Self(u)
    }

    /// Casts this custom `Bool` into the real Rust `bool`.
    #[inline]
    pub const fn as_bool(self) -> bool {
        self.0 != 0
    }

    /// Casts a `&u8` to `&Bool`.
    #[inline]
    pub fn from_ref(u: &u8) -> &Self {
        // SAFETY: `repr(transparent)`, thus it's safe
        unsafe { &*(u as *const u8 as *const Bool) }
    }

    /// Casts a slice of bytes to slice of `Bool`s.
    #[inline]
    pub fn from_slice(s: &[u8]) -> &[Bool] {
        // SAFETY:
        // `repr(transparent)` over `u8` guarantee safe transmutation,
        // while other invariants are guaranteed by `s` being a valid slice
        // of items with the same align and size as `Bool`.
        unsafe { slice::from_raw_parts(s.as_ptr() as *const Bool, s.len()) }
    }
}

Marker traits and transmutation functions that would replace unsound EndianScalar, emplace_scalar and read_scalar:

use core::{
    mem::{align_of, size_of},
    slice,
};

/// Transmutation function.
#[inline]
pub fn as_raw_bytes<T: AsRawBytes>(t: &T) -> &[u8] {
    assert!(size_of::<T>() != 0);

    // SAFETY:
    // transmute safety should be guaranteed by trait implementor.
    unsafe { slice::from_raw_parts(t as *const T as *const u8, size_of::<T>()) }
}

/// Slice transmutation function.
#[inline]
pub fn slice_as_raw_bytes<T: AsRawBytes>(s: &[T]) -> &[u8] {
    assert!(size_of::<T>() != 0);

    // SAFETY:
    // the pointer and length are obtained from valid slice, thus pointer is valid,
    // and slice length *in bytes* is never bigger than `isize::MAX`;
    // transmute safety should be guaranteed by trait implementor.
    unsafe { slice::from_raw_parts(s.as_ptr() as *const u8, s.len() * size_of::<T>()) }
}

/// Transmutation function.
#[inline]
pub fn from_raw_bytes<T: FromRawBytes>(s: &[u8]) -> Option<&T> {
    assert!(size_of::<T>() != 0);

    let ptr = s.as_ptr();
    if ptr as usize % align_of::<T>() == 0 && s.len() >= size_of::<T> {
        // SAFETY:
        // there is at least `sizeof(T)` bytes,
        // and the reference is aligned;
        // transmute safety should be guaranteed by trait implementor.
        Some(unsafe { &*(ptr as *const T) })
    } else {
        None
    }
}

/// Slice transmutation function.
#[inline]
pub fn slice_from_raw_bytes<T: FromRawBytes>(s: &[u8]) -> Option<&[T]> {
    assert!(size_of::<T>() != 0);

    let ptr = s.as_ptr();
    if ptr as usize % align_of::<T>() == 0 {
        // SAFETY:
        // the pointer and length are obtained from valid slice,
        // and pointer is checked to be aligned for `T`;
        // transmute safety should be guaranteed by trait implementor.
        Some(unsafe { slice::from_raw_parts(ptr as *const T, s.len() / size_of::<T>()) })
    } else {
        None
    }
}

/// Slice transmutation function that does not check alignment.
///
/// # Safety
///
/// `s` must be aligned for `T`.
#[inline]
pub unsafe fn slice_from_raw_bytes_unchecked<T: FromRawBytes>(s: &[u8]) -> &[T] {
    assert!(size_of::<T>() != 0);

    // SAFETY:
    // the pointer and length are obtained from valid slice,
    // and pointer is checked to be aligned for `T`;
    // transmute safety should be guaranteed by trait implementor.
    unsafe { slice::from_raw_parts(s.as_ptr() as *const T, s.len() / size_of::<T>()) }
}

/// Marker trait for types that are represented in little endian byte order.
pub trait ByteOrdered {}

/// Marker trait for types that could be safely represented as raw bytes
/// in a generic way.
///
/// # Safety
///
/// * implementor type *must* be non-zero sized;
///
/// * it *must* be safe and sound to transmute
///
///     * `Self` into `[u8; size_of::<Self>()]`;
///
///     * `[Self; N]` into `[u8; N * size_of::<Self>()]`,
///
/// # Note
///
/// It is not the same as [`FromRawBytes`]! For example, `NonZeroI32`
/// can implement `AsRawBytes`, but can't implement `FromRawBytes`,
/// because raw bytes could contain invalid value for `NonZeroI32`!
pub unsafe trait AsRawBytes: Copy {}

/// Marker trait for types that could be safely read from raw aligned bytes
/// in a generic way.
///
/// # Safety
///
/// * implementor type *must* be non-zero sized;
///
/// * it *must* be safe and sound to transmute
///
///     * `[u8; size_of::<Self>()]` into `Self`,
///
///     * `[u8; N * size_of::<Self>()]` into `[Self; N]`,
///
/// # Note
///
/// It is not the same as [`AsRawBytes`]! For example, \
/// `#[repr(C)] struct Q(u8, u16);` \
/// can implement `FromRawBytes`, but can't implement `AsRawBytes`,
/// because implicit padding is treated as uninitialized memory in Rust!
pub unsafe trait FromRawBytes: Copy {}

The name SafeSliceAccess or ByteOrdered is a bikeshed, but I don't SafeSliceAccess, because it has nothing to do with safetyness, only markers endianness and could be safely implemented unlike FromRawBytes and AsRawBytes.

Though, AsRawBytes, its transmutation functions (which we don't really need, because builder should be aligned), and changes to make Push trait sound should be a part of another issue or PR.

Implementing codegen for this would be easier with #6098 , because recursive struct checks are a little bit tricky.

Any questions other than bikeshedding follow[_checked] and follow[_unchecked]?

Edit: forgot to tell about unbounded lifetime in the follow_unchecked — it is not a problem, because we bound it in the get_root_unchecked:

pub unsafe fn get_root_unchecked<'a, T>(buf: &'a [u8]) -> T
where
    T: Follow<'a> + Root
{
    debug_assert_eq!(buf.as_ptr() as usize % align_of::<T::Aligned>(), 0);
    ForwardsOffset<T>::follow_unchecked(buf.as_ptr(), 0)
}

Edit 2: okay, forgot about Root trait:

/// # Safety
///
/// Absolutely none. Implementing it anywhere but generated code is UB.
///
/// ## But I'm writing the codegen!
///
/// Then `Aligned` must be a generated type that looks like
/// `#[repr(C, align(N))] pub struct BytesFor[RootIdent]([u8; N]);`
/// where `N` is the maximum possible alignment for the root
/// (which is the maximujm of all possibly-contained structs, including nested flatbuffers),
/// and `[RootIdent]` is the name of the root.
pub unsafe trait Root {
    type Aligned: FromRawBytes + AsRawBytes;
}

This trait would also allow provide some safety guarantees for some of the builder buffers, because
assert!(align_of::<T>() <= align_of::<Root::Aligned>());
would be evaluated at constant time: it either never panics or always panics for T we push, and compiler can optimize this to no assert for things that never panic.

Without this trait creating builder would always be unsafe. Though, it's not yet clear how builder would look to use the Root information in a safe way, and it is not about verifier anyway.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

The biggest problem with this idea (besides a ton of unsafe) is that it continues to work on raw u8 slices, which doesn't give us anything as far as validation is concerned. The verifier should return a type representing a validated view into a given buffer (this was discussed some time ago). Therefore we would get essentially two ways of accessing data:

  • One with untrusted access, which would work similarly to now. Checked access with Results/Options/panics. Unchecked can be provided where applicable, when the user trusts the buffer implicitly.
  • One with trusted access on validated view. This one can skip all checks and retrieve the data the fastest way possible, including wrapping unsafe transmutations in safe functions, if needed.

Our use cases can be broken down as:

  1. Implicitly trusted buffer with fast access.
  2. Untrusted buffer with checked access.
  3. Validated buffer with fast access.

Number 1 can be achieved with unchecked access to a slice and is inherently unsafe. The user takes full responsibility for trusting the buffer.
2 is, more or less, what we have now. The difference being, we should return Results/Options where applicable rather than a hard panic. Lack of this makes current Rust API unusable on production with untrusted buffers, unless someone uses cfb verification.
Number 3 can be achieved by a trusted view into the data, as explain above. Am I missing something @rw @CasperN @aardappel ?

@ImmemorConsultrixContrarie
Copy link
Contributor

ImmemorConsultrixContrarie commented Oct 6, 2020

The biggest problem with this idea (besides a ton of unsafe) is that it continues to work on raw u8 slices, which doesn't give us anything as far as validation is concerned. The verifier should return a type representing a validated view into a given buffer (this was discussed some time ago). Therefore we would get essentially two ways of accessing data:

* One with untrusted access, which would work similarly to now. Checked access with Results/Options/panics. Unchecked can be provided where applicable, when the user trusts the buffer implicitly.

* One with trusted access on validated view. This one can skip all checks and retrieve the data the fastest way possible, including wrapping unsafe transmutations in safe functions, if needed.

Our use cases can be broken down as:

1. Implicitly trusted buffer with fast access.

2. Untrusted buffer with checked access.

3. Validated buffer with fast access.

Number 1 can be achieved with unchecked access to a slice and is inherently unsafe. The user takes full responsibility for trusting the buffer.
2 is, more or less, what we have now. The difference being, we should return Results/Options where applicable rather than a hard panic. Lack of this makes current Rust API unusable on production with untrusted buffers, unless someone uses cfb verification.
Number 3 can be achieved by a trusted view into the data, as explain above. Am I missing something @rw @CasperN @aardappel ?

Yes. This FollowChecked::follow would return either a validated buffer with safe and fast access or None; all getters in structs are safe, even though they use a lot of unsafety inside. The generated files should start from a comment about unsafe tainting, yeah, forgot that one. Your number one is what get_root_unchecked would return. Number two is something that would not be used often: the buffer is either valid/trusted, or it is invalid, isn't it?

And the problem is that we don't have anything of this right now. The getters are not the fastest, nor are they sound and totally safe. We cannot actually parse flatbuffer into valid struct without unsafe inside, because that would involve mutation of the buffer (which cannot be done for a number of reasons) and/or new allocations. However, we can totally validate it and guarantee that if user only uses safe functions from the generated code and no raw constructors, it is safe; and that's what FollowChecked does.

Getters in tables would look like

fn get_color(&self) -> Color {
    unsafe { self.tab.get_required(Self::COLOR_SLOT) }
}

But for get_root_unchecked we lift that unsafe into get_root_unchecked itself, and for safe get_root that could return None we validate that this is safe when we follow the table.

Edit: "ton of unsafe" is not a problem, it is reality. All those things that are unsafe are unsafe, and no less.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

2 is useful when interested only in a subset of data. So that FollowChecked would be implemented for every field or just for the table? It looks like for every field, which defeats the purpose of one-time validation. Also, what's the definition of follow_unchecked?

@ImmemorConsultrixContrarie
Copy link
Contributor

ImmemorConsultrixContrarie commented Oct 6, 2020

Can C/C++ verifier check only subset of data in the schema?

Anyway, you can do this (even if not with how flatc works right now) by shrinking the schema down to fields you are interested in. Flatc won't allow you to write table Human { name: string (id: 8); }, but if it would, it'd be easy to generate getters only for a required subset.

Edit: and if we'd use newtype enums instead of native, we won't ever need to follow struct fields, simplifying struct following to a one from_raw_bytes call.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

Can C/C++ verifier check only subset of data in the schema?

Irrelevant for the use case.

Anyway, you can do this (even if not with how flatc works right now) by shrinking the schema down to fields you are interested in. Flatc won't allow you to write table Human { name: string (id: 8); }, but if it would, it'd be easy to generate getters only for a required subset.

Also - irrelevant. This is a use case which we can support with low effort. What's relevant are my two previous questions. Can you answer them, so we can see if this is a viable way?

@ImmemorConsultrixContrarie
Copy link
Contributor

You wanna answers? Nice.

Also, what's the definition of follow_unchecked?

I wrote. Re-read my first comment, it is there.

So that FollowChecked would be implemented for every field or just for the table

No, it can't. But can you implement something like table with all getters both safe and as fast as possible? FollowChecked is a hit at two ways that flatbuffers are used most: totally trusted or totally untrusted accesses.

For example, if you worked with trusted buffers, but then started to work with untrusted (for example, when it was local first, and then also network), with FollowChecked the only thing you have to rewrite is one place where you get the root.

When all getters are either unsafe or safe but fallible, you need to rewrite all the code that works with it. It is against DRY, it is against "parse, don't validate", and finally, it is uncommon use case, which we can easily make working with a little change in schema parser.

@ImmemorConsultrixContrarie
Copy link
Contributor

Found a problem in the macro for tables: can't check unions. There actually should be a match for a union, and only then we'd know types. Shouldn't be too hard to fix, though.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

I searched for the definition of follow_unchecked and there seems to be none, hence my question. Also from your second answer I can't tell what "no" refers to. You mean it will be implemented for each field or the whole table once?

@ImmemorConsultrixContrarie
Copy link
Contributor

ImmemorConsultrixContrarie commented Oct 6, 2020

I searched for the definition of follow_unchecked and there seems to be none, hence my question. Also from your second answer I can't tell what "no" refers to. You mean it will be implemented for each field or the whole table once?

Only the whole table. If you check the table partially, but generate getters for all fields, it would be unsound. Unless, of course, you do not generate double getter "unsafe / safe-fallible" for unchecked fields.

I think, you should read flatCC verification doc.

I mostly followed this, with a bunch of exceptions, that could be closed lately, if needed:
Rust str is always UTF-8 and not zero-terminated;
table macro does not check table length, only that all slots are accessible;
vtable could be of any size, even zero, it only has to be aligned (because we don't check table length) — if table has any required field, then for such vtable we would return None, and if no required fields, then all fields are default;
UOffset does not need to have value at least sizeof(UOffset) — the pointee only has to be accessible and generated pointer has to be aligned;
FollowChecked has no recursion limit. This could be a problem, actually, but changing function to follow(buf: &[u8], loc: usize, recursion_levels_left: usize) should fix it.

tl;dr: FollowChecked catches only memory unsafety invariants, and none logical bugs.

Edit: follow_unchecked definition is a part of my first comment. Really.

Image

image

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

You provided a description, not definition. Please provide a definition, so we can see the whole proposed process of retrieving data.

@ImmemorConsultrixContrarie
Copy link
Contributor

You provided a description, not definition. Please provide a definition, so we can see the whole proposed process of retrieving data.

That's how it looks for u64 (and any other integer/float (actually, floats would use from_bits(to_bits().to_le())):

unsafe fn follow_unchecked(buf_start: *const u8, loc: usize) -> u64 {
    // SAFETY: this is guaranteed to be safe by the caller.
    unsafe { (*(buf_start.add(loc) as *const u64)).to_le() }
}

For tables it simply stores buf_start and loc in the table. Same for slices. For structs it would look like this:

unsafe fn follow_unchecked(buf_start: *const u8, loc: usize) -> &'a Struct {
    // SAFETY: this is guaranteed to be safe by the caller.
    unsafe { &*(buf_start.add(loc) as *const Struct) }
}

If enums are just a repr(transparent) newtype around integer — NewtypeEnum::new(ReprInt::follow_unchecked()).

For native enums it would be a transmute after following integer. But we can't transmute, lol, because it could be invalid value and UB; invalid values are not forbidden by flatbuffer schema evolution strategy. Thus only newtype enums could be followed, and native enums could be only stored as integer.
Thus newtype enums win, because they could be stored as a real type, not only as integer. And if you need, they also could be converted to native Rust enums, just like integer, using exactly same logic.
Really, it's not just about this new Follow, it's also about current Follow too, lol.

Are there any other types I forgot about?

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

This gives us only the unsafe path for a trusted buffer - we're still missing a verification step.

@ImmemorConsultrixContrarie
Copy link
Contributor

This gives us only the unsafe path for a trusted buffer - we're still missing a verification step.

Lolwut? Read about FollowChecked, mkay?

And how do you want to access a buffer without unsafe and without checks? Even the buffer you just built could be invalid, if you won't make all pushing functions in builder unsafe.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

First of all, let's try to keep to constructive arguments and drop "lol", "lolwut", "mkay" and similar teenage language. Second - FollowChecked is implemented for every field, as I see in your example. To whole point of having a verifier is to not do that, but have a single step of verification, after which every access can skip all checks. The follow_unchecked approach accomplishes the latter, but we still need the former. Or, if I'm seeing it wrong, please provide a full example, from start to finish.

@ImmemorConsultrixContrarie
Copy link
Contributor

First of all, let's try to keep to constructive arguments and drop "lol", "lolwut", "mkay" and similar teenage language. Second - FollowChecked is implemented for every field, as I see in your example. To whole point of having a verifier is to not do that, but have a single step of verification, after which every access can skip all checks. The follow_unchecked approach accomplishes the latter, but we still need the former. Or, if I'm seeing it wrong, please provide a full example, from start to finish.

FollowChecked recursively checks a validness of all getters we generate. For each table field if checks that table field could be followed. For each slice item, it checks that this item could be followed.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

Basing on:

unsafe impl FollowChecked<'_> for u64 {
    #[inline]
    fn follow(buf: &[u8], loc: usize) -> Option<Self::Inner> {
        buf.get(loc..loc + size_of::<Self>()).and_then(|s| {
            from_raw_bytes(s).map(|x| x.to_le())
        })
    }
}

it rather seems FollowChecked is being used by getters to get the actual data. It doesn't verify the validity - it simply reads data.

@ImmemorConsultrixContrarie
Copy link
Contributor

ImmemorConsultrixContrarie commented Oct 6, 2020

Basing on:

unsafe impl FollowChecked<'_> for u64 {
    #[inline]
    fn follow(buf: &[u8], loc: usize) -> Option<Self::Inner> {
        buf.get(loc..loc + size_of::<Self>()).and_then(|s| {
            from_raw_bytes(s).map(|x| x.to_le())
        })
    }
}

it rather seems FollowChecked is being used by getters to get the actual data. It doesn't verify the validity - it simply reads data.

It only reads valid data. See table macro, and follow implementation for flatbuffer slices (vectors).

Edit: welp, yes, it doesn't have to return Option<Self::Inner>, bool could be enough. But when it returns Option, it's harder to forget about any of the things you should check. Plus, yes, it also could be used to generate fallible getters, even though I can't see a need in it.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

We already have fallible getters, so there's no need to make such changes - we only need to get rid of all those unwraps. The problem with returning a bool, like c++ does, is that it doesn't enforce anything. That said - the same buffer put into such verification can be used by the unchecked api and it will compile just fine, even if the buffer is invalid. If unchecked getters for such "raw" buffer are unsafe, then ok - it's the user's decision. But, if the getters are to be safe, we are breaking the contract of not causing UB via a safe api. Therefore safe+unchecked api should be restricted to this verified buffer view, I talked about earlier. That way there's no possibility to misuse the api and the contract is upheld.

@ImmemorConsultrixContrarie
Copy link
Contributor

Found a problem in the macro for tables: can't check unions. There actually should be a match for a union, and only then we'd know types. Shouldn't be too hard to fix, though.

Updated the table macro and its example in my first comment.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 6, 2020

Okay, there is a lot to respond to and I haven't read all of it yet so sorry if I missed something.

  1. Implicitly trusted buffer with fast access.
  2. Untrusted buffer with checked access.
  3. Validated buffer with fast access.

Imo, we only want APIs 1 and 3. API 2 is kind of like lazy validation. I get it, but its a niche use-case, and less important after you have 1 and 3. Also, we should implement the verifier such that you could choose to only validate a whole "subtree", rather than each node of the tree. That will sufficiently cover the high performance / lazy use case. E.g.

// Imagine we're some high performance networking service that inspects metadata but not
// the rest of the message.
let m: Message = unsafe { flatbuffers::get_root_unchecked(buf) };
let metadata = m.metadata().verify(verifier)?;

Also, I'm assuming today's get_root becomes get_root_unchecked and the future flatbuffers::get_root would be

pub fn get_root<'a, T: Follow<'a> + 'a>(data: &'a [u8], verifier:: Verifier) -> Result<T::Inner, VerifierError> {
  let tab: T = unsafe { get_root_unchecked(data) };
  return tab.verify(verifier);
}

FollowChecked recursively checks a validness of all getters we generate. For each table field if checks that table field could be followed. For each slice item, it checks that this item could be followed.

That sounds like what is usually called verify, so I guess we're in agreement. Except that I'm following the C++ implementation and making it a method of the generated table rather than part of the follow trait. Its also unclear to me how to manually verify subtrees with the trait implementation.


We already have fallible getters, so there's no need to make such changes - we only need to get rid of all those unwraps. The problem with returning a bool, like c++ does, is that it doesn't enforce anything. That said - the same buffer put into such verification can be used by the unchecked api and it will compile just fine, even if the buffer is invalid. If unchecked getters for such "raw" buffer are unsafe, then ok - it's the user's decision. But, if the getters are to be safe, we are breaking the contract of not causing UB via a safe api. Therefore safe+unchecked api should be restricted to this verified buffer view, I talked about earlier. That way there's no possibility to misuse the api and the contract is upheld.

I think I agree. The principle is that we must not panic or cause UB unless the user typed "unsafe" somewhere and that we must wrap verified buffers in Result since that's fallible. I don't think we want fallible getters, they should all trust the flatbuffer is verified. We can force the choice between verification and unsafe at the root.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

I get it, but its a niche use-case, and less important after you have 1 and 3. Also, we should implement the verifier such that you could choose to only validate a whole "subtree", rather than each node of the tree. That will sufficiently cover the high performance / lazy use case.

Yeah, it's niche and partial validation would cover that. The question is about added complexity to such solution. Right now we can implement 2 simply by removing unwraps from current accessors and passing on the result.

Also, I'm assuming today's get_root becomes get_root_unchecked and the future flatbuffers::get_root would be

pub fn get_root<'a, T: Follow<'a> + 'a>(data: &'a [u8], verifier:: Verifier) -> Result<T::Inner, VerifierError> {
  let tab: T = unsafe { get_root_unchecked(data) };
  return tab.verify(verifier);
}

That would be ergonomic but the resulting type of both get_root variations should be different to avoid unsound safe api.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 6, 2020

The question is about added complexity to such solution. Right now we can implement 2 simply by removing unwraps from current accessors and passing on the result.

I'm not concerned about implementation complexity. I agree its small. I think its not worth the API complexity.

That would be ergonomic but the resulting type of both get_root variations should be different to avoid unsound safe api.

Yes

pub unsafe fn get_root_unchecked<'a, T: Follow<'a> + 'a>(data: &'a [u8]) -> T::Inner; 
pub fn get_root<'a, T: Follow<'a> + 'a>(data: &'a [u8], verifier:: Verifier) -> Result<T::Inner, VerifierError>;

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

pub unsafe fn get_root_unchecked<'a, T: Follow<'a> + 'a>(data: &'a [u8]) -> T::Inner; 
pub fn get_root<'a, T: Follow<'a> + 'a>(data: &'a [u8], verifier:: Verifier) -> Result<T::Inner, VerifierError>;

That's not enough. Let's assume T::Inner contains a x() getter (btw. I'll be dropping the get_ prefix, which we keep using). For a verified buffer x() should be:

fn x(&self) -> Something

But, since we return the same thing from get_root_unchecked, we might end up with such unsoundness:

// obvious invalid buffer
let data = [100u8];
let root = unsafe { get_root_unchecked(&data) };

// UB here in safe Rust
let x = root.x();

The return types of both get_root need to be different so the unchecked version should contain only unsafe accessors.

@ImmemorConsultrixContrarie
Copy link
Contributor

ImmemorConsultrixContrarie commented Oct 6, 2020

// Imagine we're some high performance networking service that inspects metadata but not
// the rest of the message.
let m: Message = unsafe { flatbuffers::get_root_unchecked(buf) };
let metadata = m.metadata().verify(verifier)?;

Welp, FollowChecked in the way that I already implemented, can't do that. If it returns Some, the whole root is safe to access. If it returned None, then getting things with follow_unchecked is UB, and we can't even say at what point it'd be UB to use the root.

However, I think that such partial usage should be expressed in a schema, not in code: probably with deprecated flag on all fields you don't need, as flatc does not allow to use some other way, like partial table (table Human { name: string (id: 8); }).

Also, I'm assuming today's get_root becomes get_root_unchecked and the future flatbuffers::get_root would be

pub fn get_root<'a, T: Follow<'a> + 'a>(data: &'a [u8], verifier:: Verifier) -> Result<T::Inner, VerifierError> {
  let tab: T = unsafe { get_root_unchecked(data) };
  return tab.verify(verifier);
}

Nope, see previous part of this comment. No unsafe in this function, only safe ForwardsUOffset<T>::follow(data, 0) returning Option<Root>.

Edit: also, the bound is T: 'a + FollowChecked<'a>, not T: 'a + Follow<'a>.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

Nope, see previous part of this comment. No unsafe in this function, only safe ForwardsUOffset<T>::follow(data, 0) returning Option.

We're discussing potential api, not your exact proposal.

@krojew
Copy link
Contributor

krojew commented Oct 6, 2020

@krojew We really do have people who care about eliminating unsafe code from their dependencies.

Also, it's risky, at least to start, to fully trust the Verifier.

I understand people not wanting unsafe - I'm one of them. That's why the only unsafe api I find justified is for people implicitly trusting buffers and wanting the fastest access possible. All other use cases should have a safe api, either provided by the verifier or by fallible accessors. Those are not mutually exclusive and fit both our lists of use cases.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 6, 2020

Also, it's risky, at least to start, to fully trust the Verifier.

Agree. I propose a short term state where regardless of verification, we do some checks and panics so we at least don't hit UB (RW 1).
The unsafe skip verification escape hatch will still be there (RW 4, krojew 1).

In the long term, assuming we trust the verifier, we should deprecate, or flag-gate, the checks in normal accessors so the default API does not do any checking after the verification step (rw 2, krojew 3). Users may skip the verification via unsafe (rw 4, krojew 1).

This leads to a separate question: do we offer safe, non-panicking, and non-Verified access (i.e. wrapping everything in Result)?

I'll grant that this kind of lazy check-as-you-go verification is probably the most "rust-y" thing to do since it allows for the finest-grained error handling, and that it'll be more performant for very large messages where only a few fields are read, but I claim that these aren't the most common cases. The normal Flatbuffers verify-once-then-trust model addresses the common cases more simply. Unless users raise issues claiming they need this, and have supporting benchmarks, I don't think we should do it.

@aardappel
Copy link
Collaborator

I'm not going to get into y'alls API debates, but please keep things somewhat ergonomic :)

Also, when implementing, please check the C++ verifier.. a fair bit of work has gone into fixing corner cases, and you'll waste a lot of time if not at least the implementation code derives from this.

Your only hope getting this code somewhat solid is by fuzzing, so I'd plan for that.

@CasperN CasperN added the rust label Oct 9, 2020
@krojew
Copy link
Contributor

krojew commented Oct 15, 2020

I think we need to push this forward. Let's summarize our proposals @rw @CasperN, so we can start discussing the details.

My proposal is to generate two versions of tables:

  1. First with Results/Options for checked access and unsafe for unchecked access. Similar to what we have here.
  2. Second with safe unchecked access, which is only obtainable from a verifier function. This will prevent people from shooting themselves in the foot at compile time.

What are your ideas?

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 15, 2020

I think we should do 2. We can hold off on removing checks until later.

Details:

  • We should implement a Verifier class in the flatbuffers crate, similar to [1]
  • every table should generate something like fn verify(&self, v: &Verifier) -> Result<Self, InvalidFlatbuffer>

[1]

class Verifier FLATBUFFERS_FINAL_CLASS {

@krojew
Copy link
Contributor

krojew commented Oct 15, 2020

  • every table should generate something like fn verify(&self, v: &Verifier) -> Result<Self, InvalidFlatbuffer>

I still think returning another type instead of Self is a better option - we catch errors at compile time, instead of a runtime surprise. Having a second type is really not a problem for a generator and is explicit enough to not cause confusion for the users. @rw @aardappel any opinions on this?

@aardappel
Copy link
Collaborator

I don't think I understand the question, or what the Self type is. But the signature of verify can be whatever it needs to be.. in C++ it just returns bool :)

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 16, 2020

I still think returning another type instead of Self is a better option - we catch errors at compile time, instead of a runtime surprise

I'm not sure what you mean, InvalidFlatbuffer would be an error that's impossible to determine at compile time.

I don't think I understand the question, or what the Self type is.

functions associated with types (methods) can refer to their associated type as Self.

impl Foo {
  fn new() -> Self;  // Self means Foo in this `impl` scope.
}

I was being confusing. fn verify(&self, v: &Verifier) -> Result<Self, InvalidFlatbuffer> doesn't make sense since it has the return type of a fallible factory function, yet takes self as an argument. Perhaps fn verify(&self, v: &Verifier) -> Result<(), InvalidFlatbuffer> makes more sense.

But the signature of verify can be whatever it needs to be.. in C++ it just returns bool :)

Somehow, I think Rust users would prefer more detailed errors, but whatever, this can be discussed when there's an implementation.

@krojew
Copy link
Contributor

krojew commented Oct 19, 2020

I'm not sure what you mean, InvalidFlatbuffer would be an error that's impossible to determine at compile time.

I mean we should only have access to safe and unchecked getters after verification. Which implies verify() cannot return Self, but must return a different type with that api.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 19, 2020

There are a few ways to get Flatbuffers. Via get_root, init_via_table or via an accessor. The accessor can be safe and unchecked assuming the checking happens at the other functions. In my opinion init_via_table should be unsafe, callers must verify the buffer first; and get_root should be safe and verified.

Those factory functions should have Result return types and can defer to the table's verify method to find errors.

Mildly related: We really should make a flatbuffers::GeneratedTable trait. The current implementation associates types using similar names but having a trait will make the structure clearer. We probably need traits for the associated types too but you get the idea.

trait flatbuffers::GeneratedTable {
  type Args;
  type Builder;
  type Object;  // as in Object API.

  /// Callers must verify the table before using this function because accessors may be unchecked
  /// causing panics or UB. The `verify` method will do this, or the caller may trust that the table
  /// was generated correctly.
  unsafe fn init_from_table(&Table) -> Self;
  // Makes sure all offsets are within the buffer and whatever else we want to promise so we can
  // skip checks in accessors. 
  // verify could, equivalently take `self` or `Table` as arguments.
  fn verify(buf: &[u8], loc: usize) -> Option<InvalidFlatbuffer>;
  fn get_root(buf: [&u8]) -> Result<Self, InvalidFlatbuffer>;

  // Builder stuff
  fn new(&mut builder: FlatBufferBuilder) -> Self::Builder;
  fn create(&mut builder: FlatBufferBuilder, args: &Self::Args) -> WipOffset<Self>;

  // Object API stuff
  fn unpack(&self) -> Self::Object;
}

@krojew
Copy link
Contributor

krojew commented Oct 19, 2020

callers must verify the buffer first; and get_root should be safe and verified.

That's the issue I have with this idea - we place a requirement on the user, which can only be verified at runtime. In contrast, using a separate verified type allows us to enforce this at compile time. This is not only safer by design, but very Rust-like. I would personally prefer the compiler telling me I didn't do something, than crash message.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 19, 2020

we place a requirement on the user, which can only be verified at runtime

That's why init_from_table is unsafe and the generated table's internal fields are private. We'll enforce, at all points of construction that either verification has happened, or the constructor is unsafe. There's no need for additional types.

@krojew
Copy link
Contributor

krojew commented Oct 19, 2020

That's why init_from_table is unsafe and the generated table's internal fields are private. We'll enforce, at all points of construction that either verification has happened, or the constructor is unsafe. There's no need for additional types.

Even with this, any error will get caught at runtime. We're still relying on the user to call something before using the table, without any enforcement, and it's irrelevant if this something is marked as unsafe or not. Let's give a visual example. Having a piece of code like this for a table Foo:

fn extract_bar<'a>(foo: Foo<'a>) -> Bar<'a> {
    foo.bar()
}

Can we tell if this code is valid? We can't because we don't know what was called before. This code is safe, yet it may crash and cannot reason about it at all. Now let's consider another example:

fn extract_bar<'a>(foo: VerifiedFoo<'a>) -> Bar<'a> {
    foo.bar()
}

Assuming VerifiedFoo comes from a verification step, by looking at this code, we can already tell it will not crash, because the underlying buffer must have been validated. We don't care what was written before the function call. The compiler will not allow this call from an unverified buffer, hence we don't allow crashes by simple mistakes.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 19, 2020

We're still relying on the user to call something before using the table, without any enforcement, and it's irrelevant if this something is marked as unsafe or not

I disagree. unsafe is absolutely enforcement. Every Rust user knows that typing the letters "unsafe" is kind of suspect, deserves at least looking up the documentation, and that its their fault if something bad happens. As mentioned earlier, unsafe is allowed to cause memory errors elsewhere in the program. That is a standard part of the unsafe contract.

We all can invoke unsafe Vec::from_raw_parts incorrectly, that doesn't mean Vec needs to be replaced by VerifiedVec<T> everywhere else because we don't know how it was constructed. We just expect correct construction in safe code.

@krojew
Copy link
Contributor

krojew commented Oct 19, 2020

Vec example isn't the best one because, with such low-level type (in terms of use cases), we need to take ergonomics into account much more. In our case we case safely generate two distinct types and not loose ergonomics (at least not much). Maybe let's list pros and cons of both solutions and see which are actually meaningful.

2 types pros:

  • access validity enforced at compiled time
  • unsafe code only when actual unsafe operations take place (accessing the data) instead of a step before (assigning a buffer to a table)

2 types cons:

  • need to generate two types for checked and unchecked access

What are pros and cons for your solution?

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 19, 2020

What are pros and cons for your solution?

Pros:

  • less code to generate, less generator code to maintain
  • The implementation matches other languages
  • Smaller API to teach users

Cons:

  • I guess the contract around unsafe is slightly more complicated? There's more action at a distance than your alternative.

I think we agree on the pros and cons of the two approaches. The "what" of the problem, if you will.
The real question is which design values should we prioritize -- the "why" of the problem.

Imo: Simplicity and maintainability is more important than the marginal safety benefits for an advanced and niche use case.

Most Rust users will not use the unsafe API. We should optimize for the common/simple/95% use cases. We should optimize for maintainability since we don't have a lot of maintainers. Those that use unsafe to skip verification, presumably for performance reasons, will also be among our most sophisticated users and can handle the slightly more difficult contract.

This decision, to do less, is also an easily reversible decision. We can always add the VerifiedFoo types later, if it turns out we have many such sophisticated users and the current API is too difficult to use safely. I claim we don't need this complexity, I may be wrong but let's find out from users before we invest in doubling the complexity. (#yagni)

@krojew
Copy link
Contributor

krojew commented Oct 19, 2020

Most Rust users will not use the unsafe API. We should optimize for the common/simple/95% use cases. We should optimize for maintainability since we don't have a lot of maintainers.

That's a good argument in general. I think it's hard to tell how much more work it would take to maintain two types; my guess is - not much, but it's just an assumption.

@CasperN
Copy link
Collaborator Author

CasperN commented Oct 22, 2020

@krojew @ImmemorConsultrixContrarie would one of you like to be assigned this issue? I'd say this should be our highest priority right now. This or the broken namespaces thing, which is causing user pain.

@krojew
Copy link
Contributor

krojew commented Oct 22, 2020

My time is extremely limited for the foreseeable future, so I'm not the best candidate for hot tasks.

@ImmemorConsultrixContrarie
Copy link
Contributor

@krojew @ImmemorConsultrixContrarie would one of you like to be assigned this issue? I'd say this should be our highest priority right now. This or the broken namespaces thing, which is causing user pain.

I kinda already did this, read my first comment in this issue. If you have any questions I'd try to answer them. If you don't like the design, welp, can't help it, won't rewrite into something I don't like.

@krojew
Copy link
Contributor

krojew commented Nov 17, 2020

Just noticed current master is no longer compatible with cfb verifier. This effectively means we shouldn't release a new version without the verifier, otherwise there's no way to use untrusted buffers.

@rw
Copy link
Collaborator

rw commented Nov 20, 2020

@CasperN made a PR for this, see #6269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants