Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe memory zeroing #2626

Open
gnzlbg opened this issue Jan 22, 2019 · 21 comments
Open

Safe memory zeroing #2626

gnzlbg opened this issue Jan 22, 2019 · 21 comments
Labels
A-unsafe Unsafe related proposals & ideas

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Jan 22, 2019

(note: I use the terms safe and valid below with the precise meanings specified here: https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/reference/src/glossary.md#validity-and-safety-invariant)

Motivation

See rust-lang/rust#53491. In a nutshell: std::mem::zeroed() is dangerous - running this on miri (playground):

fn main() {
    let _x: &'static str = unsafe { 
        std::mem::zeroed() 
    };  // Probably instant UB
}

produces an error of the form: "type validation failed: encountered NULL reference". Obviously, in the real world, code like this will be caught in code review, but catching this stops being easy when one has a struct with multiple fields and one has to manually verify that all fields in the struct are valid when all its bits are zero. Layer a couple of user-defined types on top of each other, and a small private change to one of them down the stack can easily make code using mem::zeroed have instant undefined behavior.

When this happens, right now, having a test suite and running it on miri is the only way we have to detect that. However, C FFI is one of the main usages of mem::zeroed and miri has very limited support for that. So even if you have a good test suite, miri won't help you here.

This RFC provides a solution that catches these errors at compile-time, allowing type users to zero-initialize types using safe Rust code reliably and allowing type authors to specify that doing this is a part of the type's API that they are committed to support (where changing this would be an API breaking change).

User-level explanation

Alternative: Zeroed trait like Default

(note: the trait name Zeroed is yet to be bikeshedded - I think it would be better to agree on the approach and the semantics, and when that has consensus, we can bikeshed the name at the end).

(note: I think I prefer the marker trait + const fn approach explained below)

Add a std::zeroed::Zeroed trait, similar to Default, that denotes that the zero bit-pattern is a valid bit-pattern of the type and that this bit-pattern is safe to use. This trait is unsafe to implement - implementing it for &T would make Zeroed::zeroed have undefined behavior.

// in libcore:
mod zeroed {
    /// A trait for types whose all-zeros bit-pattern is valid and safe.
    pub unsafe trait Zeroed {
        /// Instantiates a value with all bytes equal to zero.
        fn zeroed() -> Self where Self: Sized {
            unsafe { mem::MaybeUninit::<Self>::zeroed().into_inner() }
        }
    }
}

Implement Zeroed in core for all libcore types for which this is the case: integers, raw pointers, etc. - do not implement it for references, NonZero{...}, etc.

Add a custom-derive Zeroed that can be used to manually derive this trait for user-defined types without using unsafe Rust (e.g. if all the fields of a struct implement Zeroed). If the struct cannot derive Zeroed that should produce a compile-time error. Whether the all-zeros bit-pattern is valid and safe bit-pattern for a type is an API contract from the writer of the type to its users. This is why manually specifying it instead of using an auto trait feels like a better solution to the problem.

/// A type that is valid to zero-initialize, 
/// but not safe - this type does not derive Zeroed
struct Foo(u32);
impl Foo {
    pub fn new() -> Self { Self(1) }
    pub fn foo(&self) -> NonZeroU32 {
        // If this type was Zeroed, safe Rust code could
        // invoke undefined behavior
        unsafe { NonZeroU32::new_unchecked(self.0) }
    }
}

/// A type that is valid and safe to zero-initialize
#[derive(Zeroed)]
struct Bar(u32);
impl Bar {
    // bar is unsafe because the type can be zeroed
    // (Safety: call me only if self.0 != 0)
    pub unsafe fn bar(&self) -> NonZeroU32 {
        unsafe { NonZeroU32::new_unchecked(self.0) }
    }
    pub fn bar2(&self) -> NonZeroU32 {
        NonZeroU32::new(self.0) // panics if self.0 == 0
    }
}

/// This produces a compilation error, since this type is not valid to zero initialize
#[derive(Zeroed)]
struct Baz(u32, &'static str);
// ERROR: self.0 is not Zeroed

To upgrade code that previously was using mem::zeroed() to Zeroed, one changes:

let x: Foo = unsafe { mem::zeroed() };

to

// potentially adding a: use std::zeroed::Zeroed;
let x = Foo::zeroed();

We should probably add Zeroed to the std::prelude::v1.

After this change we can deprecate std::mem::zeroed with a deprecation warning "use std::zeroed::Zeroed instead".

An RFC for this feature would probably leave this as an unresolved question, but we probably should turn that deprecation message into an error in the next edition. That is, for crates using edition = rust2021 using std::mem::zeroed should error with the deprecation message instead. That is, libcore will contain mem::zeroed forever, so that Rust code using older editions can still use it, but we probably want to add a mechanism to ban using it from code that decides to use a newer edition.

Alternatives

auto trait

We could make Zeroed an auto-trait, but then I don't see how it could be used to denote that a zeroed value is safe to use - we could still use it to denote that the value is valid. This has two consequences:

  • zeroinit would need to be unsafe, since the resulting value might not be safe to use. That is, just because the zero bit pattern does not cause undefined behavior instantaneously does not imply that safe methods on the type might not all have a pre-condition that the bit-pattern is not all zeros. The user of the type might not want to provide a way to safely construct a value with such a bit-pattern, and it would be bad for the user to have to opt-out this auto trait to maintain safety.

  • Being able to zero-initialize a type is something that users of the type should be able to rely on. Once the type author commits to providing this API, it should be at least automatically noticeable when a change in the type breaks this API. With the non-auto trait + derive this happens automatically. With an auto-trait type authors would need to add a test for this (e.g. fn foo() -> T { Zeroed::zeroed() } or similar).

marker trait

We could also make Zeroed a marker trait, and have some function like:

const fn zeroed2<T: marker::Zeroed>() -> T {
    unsafe { mem::MaybeUninit::<T>::zeroed().into_inner() }
}

this approach has the advantage that zeroed2 is a const fn. The disadvantage is that we can't call this function mem::zeroed because such a function already exist, and we'd have to either put it somewhere else, or call it somewhere else (e.g. mem::zeroinit? ).

There is an RFC (rust-lang/const-eval#8) that would solve this problem by allowing us to:

pub trait Zeroed {
    #[default_method_body_is_const]
    fn zeroed() -> Self where Self: Sized { ... }
}

to indicate that the default method impl is const, and then allowing the Zeroed derive to perform a:

impl const Zeroed for $id { ... }

to add a const impl. However, the Zeroed::zeroed trait+trait method approach gives users the flexibility of adding their own zeroed implementations, and this is not a flexibility that I do think that we want. Paying for this flexibility might not be a good idea.

The simplicity of an unsafe to implement marker trait + a const fn somewhere in libcore is definitely appealing.

@comex
Copy link

comex commented Jan 22, 2019

Strong 👍 from me.

Bikeshed: The naming seems a bit confusing. Maybe Zeroable?

Edit: What's up with the placement of where Self: Sized? Is the idea that [T] can implement Zeroed if T does, allowing for some other safe function to zero the entire array? If so, shouldn't that function be included in the specification?

Also, why is this filed as an issue rather than a PR?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jan 22, 2019

Also, why is this filed as an issue rather than a PR?

I've included a couple of alternatives in the issue - I don't think there is a single best way about how to proceed. Writing an RFC and submitting a PR will be smoother when we agree on a general direction to go. For that, a pre-RFC style issue might be better.

What's up with the placement of where Self: Sized? Is the idea that [T] can implement Zeroed if T does, allowing for some other safe function to zero the entire array? If so, shouldn't that function be included in the specification?

If we implement a Zeroed trait like Default there are a couple of downsides, one is that either the trait has to require Sized, or the zeroed method has to. It is unclear to me whether supporting ?Sized types is worth it, the where clause leaves the door open for supporting those in the future.

The main drawback of the trait method is that zeroed is not a const fn, although there are some extensions that could help with that. Allowing users to "customize" the behavior of Zeroed::zeroed seems a bad thing to do.

I prefer the simplicity of the second alternative (using a marker trait + a const fn), even if that means having to find a new name for mem::zeroed, e.g., mem::zeroed2, mem::zeroinit, or something else.

@Lokathor
Copy link
Contributor

Yeah, if you're going to mess around with how zeroed works you should at least make it const. If that means that the trait version can't be made until trait methods support const fn then it seems fine to wait for that.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Feb 7, 2019

Note that currently mem::zeroed is not a const fn, but I agree that the replacement should be.

@retep998
Copy link
Member

retep998 commented Feb 7, 2019

I'm fully in favor of having a safe and const way to zero types that opt into it.

@danielhenrymantilla
Copy link

I like the idea.

Regarding the alternatives, I think the best combination is the following:

  1. Marker trait Zeroable (not an auto trait given the safe vs valid distinction);

  2. Use an extension trait (e.g. Zeroed) to provide the zeroed associated function notation to enable Foo::zeroed syntax (the extension trait being the one in the prelude):

    /// Marker trait
    pub unsafe trait Zeroable {}
    macro_rules! unsafe_impl_Zeroable_forall {(
        $(
            $(for<$($generics:tt)*>)? $T:ty
        ),* $(,)?
    ) => (
        $(
            unsafe impl $(<$($generics)*>)? Zeroable for $T {}
        )*
    )}
    unsafe_impl_Zeroable_forall! {
        bool, u32, i32, f32, // etc.
        <T : Zeroable> [T; 0],
        <T : Zeroable> [T; 1],
        <T : Zeroable> [T; 2],
        // etc.
    }
    
    /// Extension trait
    pub trait Zeroed : Zeroable + private::Sealed {
        fn zeroed () -> Self;
    }
    impl<T : Zeroable> Zeroed for T {
        #[inline(always)]
        fn zeroed () -> Self
        {
            unsafe { intrinsic::zeroed() } // whichever implementation is best; intrinsic, MaybeUninit, ...
        }
    }
    mod private {
        pub trait Sealed {}
        impl<T : super::Zeroable> Sealed for T {}
    }

    And then add a safe #[derive(Zeroable)] that under the hood generates code checking that each field checks the Zeroable trait bound (else a compilation error follows, but people are free to use unsafe impl Zeroable instead of the derive) so as to make the #[derive]-generated unsafe impl Zeroable for ... {} sound.

Lastly, zeroing memory should mostly matter at Drop time, instead of at init time (we have both Option and MaybeUninit for that), e.g. to clear sensitive data;
Now, the problem of "zeroing on drop" is that data being moved around is potentially copied thus not triggering the zeroing code on the old value (unless compiler magic was used).
So, to guarantee zeroing on drop, we could imagine Zeroable being used to bound a newtype wrapper that would pin its contents and zero on drop.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2019

Lastly, zeroing memory should mostly matter at Drop time, instead of at init time (we have both Option and MaybeUninit for that), e.g. to clear sensitive data;

This is out-of-scope for this proposal.

@Lokathor
Copy link
Contributor

Lokathor commented Mar 5, 2019

I only care about zeroed memory at init time, and I care nothing about at Drop time. And using MaybeUninit to get a zeroed value is terrible. Why would you want to force people who are using zeroed properly to go from the very simple:

zeroed()

into the needlessly more wordy

MaybeUninit::zeroed().into_inner()

For absolutely no reason?

As to drop time zeroing, we already have a crate for that

EDIT: that was at danielhenrymantilla

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2019

@Lokathor please try to keep this issue on topic.

@jethrogb
Copy link
Contributor

jethrogb commented Mar 5, 2019

@gnzlbg Agreed. Maybe edit the initial post to make it extra clear this proposal has nothing to do with security?

@hanna-kruppe
Copy link

hanna-kruppe commented Mar 5, 2019

re: the auto trait alternative, a problem not mentioned here AFAICT (but I'm by no means the first person to point this out in general) is that enum Void {} implements all auto traits, but as it's uninhabitable it must not implement the trait.

edit: a more general problem is that enums implement auto traits if all variants' fields implement them, but for the purpose of zeroing we also care about the discriminant values.

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Mar 27, 2019

About alternatives, instead of introducing Zeroed trait, couldn't Default trait be used for that purpose? FFI code needs to know the structure layout anyways to know how many bytes should be zeroed, so requiring a precise definition of a structure including internals is non-issue (it should be here already).

There are three issues with using Default trait for that purpose however.

  1. Raw pointers don't implement the Default trait... for whatever reason (see Default is not implemented for raw pointers (*const and *mut) #2464), but I think they could be changed to implement Default while preserving backwards-compatibility (it's not possible to implement Default for *const UserDefinedStruct).

  2. Default for arrays is only implemented up-to [T; 32], anything bigger than that won't get a default implementation. Probably won't be an issue in most cases, and when it is, a manual implementation could be provided.

  3. Some libraries may not be satisfied with essentially = {0}-like initialization and may actually want a memset (are there such libraries?).

@Lokathor
Copy link
Contributor

Of course it's possible to implement Default for *const T and *mut T, the default is just a null pointer.

@retep998
Copy link
Member

For my purposes in winapi derive(Default) results in abysmally slow compile times and is unusable anyway because of how many types don't implement Default and deriving wouldn't work. Currently I provide optional Default impls that wrap mem::zeroed(), so if mem::zeroed() is going to be replaced I need an equivalent that is equally performant both in compile time and run time.

@danielhenrymantilla
Copy link

@xfix Default is for

some kind of default value, and don't particularly care what it is

Thus using Default instead of a specific Zeroable / Zeroed is not the way to go: the semantics are different and this kind of "trick"/"hack" never pays off. For instance, what if NonNull<T> were to implement Default (it currently has the functionality with a NonNull::dangling() constructor) ? Nothing forbids it (the only reason it hasn't been done is that people already find that kind of footgunny, same as with Default for *const T and *mut T being NULLs), and it would clearly contradict zeroing (it is UB to create a zeroed NonNull<T>)

@retep998 there will always be a generic zeroing mechanism precisely for FFI, which requires such. But it is to become a deprecated non-FFI Rust pattern, given the UB danger of using it in a generic context that could monomorphise to nonnullable types such as NonNull<_> / NonZeroU... / & _ / &mut _ / fn(..) -> _.

@casey
Copy link

casey commented May 25, 2020

An alternative to this that I think merits mention in a potential RFC would be a trait along the same lines but which, when implemented for T, guarantees that all byte strings of the correct length and alignment are valid values of T.

This would allow additional use's beyond initializing values from zeroed memory. For example, it would allow safely initializing a value from an arbitrary (but not undefined) byte sequence of the correct length and alignment, which would be useful for zero-cost serialization and deserialization. It would also allow marking types as safe to pass into safe rust code over FFI boundaries.

I think the downside might be that some types could not implement Arbitrary (or whatever the trait would be called), but could implement Zeroed. For example, Option<T>, if the determinant of None is zero, could implement Zeroed, but could not implement Arbitrary. (Which also applies to other enums, as mentioned above by Hanna.) So any enum with a zero determinant whose fields implement Zeroed could implement Zeroed, but could not normally implement Arbitrary.

Other than that, it seems like they would be around equally complex to implement. Although I could definitely be missing other downsides to Arbitrary, or advantages of Zeroed.

@Lokathor
Copy link
Contributor

the safe transmute wg has been developing ideas for this sort of trait, just in the initial stages, but there's a non-zero amount of progress over time.

@casey
Copy link

casey commented May 25, 2020

@Lokathor Thanks for the pointer! I found mem-markers via the safe transmute zulip, which will be super useful ^_^

@retep998
Copy link
Member

Note that one commonly used type in FFI is Option<extern fn (...)> which cannot implement Arbitrary but can be zeroed just fine.

@snaggen
Copy link

snaggen commented Sep 23, 2020

I'm new to rust was using an thirdparty C-library I wrapped, which uses a lot of nested structs. And after I saw the Default trait, I immediately looked for a Zeroed trait since that felt like the natural way to do things in rust. I was quite surprised that this was not available in the language, so I ended up having the structs being zero by default instead (since they didn't really have any other sane default). But I really support the general idea to have this in the language.

@madsmtm
Copy link
Contributor

madsmtm commented Sep 26, 2023

Update: In the ecosystem there's now zerocopy::FromZeroes and bytemuck::Zeroable, and I think the project-safe-transmute is working on something similar too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-unsafe Unsafe related proposals & ideas
Projects
None yet
Development

No branches or pull requests