Canvas unsafe code in the wild #146

nikomatsakis · 2016-09-01T19:42:57Z

I think we should organize a kind of "canvas" to find examples of how unsafe code is used in the wild. To start, it'd be great to enumerate a list of interesting places to look.

Here is my start at a list. Further nominations welcome. I'll try to keep the list up to date. Moreover, if you feel you've examined the source, open any relate issues and we can check it off.

Standard library:
- btree, esp. the iterators
- hashmap
- ...?
External libraries:
- rayon
- take_mut
- abomonation
- ndarray
- intrusive_collections by @Amanieu, cited in One glossary should be enough #19
- rental

Other thoughts for packages? Is this a fruitful thing to examine?

The text was updated successfully, but these errors were encountered:

strega-nil · 2016-09-02T16:54:57Z

Well, I think pcb would be a nice place to start, it's very simple and has some unsafe code that would be seen as UB by the "correct lifetime" style. (example)

nikomatsakis · 2016-09-03T12:15:08Z

@ubsan do you think that the "interesting patterns" in pcb are thus summarized by #12? If not, maybe you can write-up an issue that covers them?

strega-nil · 2016-09-03T14:01:16Z

I think they're well enough summarized. (see here) referencing pcb

eternaleye · 2016-09-08T00:16:46Z

Another use of unsafe code, this one more type-focused than lifetime-focused: https://github.com/Diggsey/query_interface

Ericson2314 · 2016-09-08T00:20:28Z

https://github.com/nathan7/libfringe is wildly unsafe.

nikomatsakis · 2016-09-08T03:38:40Z

This discussion of how coroutines and scoped threads interact seems relevant, probably more as a negative case (i.e., enumerate where the coroutine abstractions goes wrong):

https://www.reddit.com/r/rust/comments/508pkb/unleakable_crate_safetysanityrefocus/d72703d

arielb1 · 2016-09-08T09:53:50Z

This discussion of how coroutines and scoped threads interact seems relevant, probably more as a negative case (i.e., enumerate where the coroutine abstractions goes wrong):

That is more of a soundness issue than a memory model issue. scoped assumes that stack frames never leak - that a lifetime parameter passed to a function must live until the function exits (through a return or resume), while coroutines flat out violates that assumption.

RalfJung · 2016-09-12T16:51:44Z

I don't entirely understand what is the criterion for listing pieces of the standard library. Certainly RefCell is a very interesting bit of unsafe code, showing how far interior mutability can go and playing games with lifetimes that I have not yet seen in any other piece of unsafe code.

mystor · 2016-09-12T16:55:32Z

The problem with RefCell is that we aren't sure that its implementation won't invoke Undefined Behavior in a future model for what defines undefined behavior in unsafe rust. The fact that it holds a &'a reference which isn't actually valid for all of &'a, but only for either the lifetime of the Ref or RefMut, or 'a, whichever is shorter. This could be considered UB under some models as the lifetimes are "incorrect".

We aren't suggesting removing RefCell, it's just a question of whether its current implementation should be UB.

RalfJung · 2016-09-12T17:00:01Z

What I am asking is why RefCell is not listed above as "how unsafe code is used in the wild".

nikomatsakis · 2016-09-12T17:20:58Z

@mystor @RalfJung I think the only reason I did not list RefCell is that we have #5 already. My goal with this issue is not for it to exist permanently, just to serve as a clearing house of work to be done in terms of researching and then opening issues.

RalfJung · 2016-09-12T17:25:22Z

I see, that makes sense.

Notice that the trouble I am having with RefCell in my semantic model is entirely unrelated to #5. However, I am not sure if the memory model would be affected by that...

arielb1 · 2016-09-12T17:55:40Z

Notice that the trouble I am having with RefCell in my semantic model is entirely unrelated to #5.

What's the trouble again?

nikomatsakis · 2016-09-12T18:04:17Z

@RalfJung if you see some other aspect of ref-cell that's not documented, please do open another issue...

RalfJung · 2016-09-12T18:17:30Z

I think the long version of this needs a blogpost, and I don't know when I will get around to write it -- and I'm still in the process of figuring out the details. I don't necessarily think it's an "issue" in the sense of something that needs to be fixed, it's more of an observation that RefCell does something which nothing else does (nothing else I saw, that is).

I will try a short summary, but beware -- I am thinking in terms of "how to prove the code correct" here, not in terms of a static type system or a memory model.
Fist, let me try to define the "time that a value is shared". This starts when when &x for some not-yet-shared x is executed, and it ends when the lifetime of that "original borrow" ends. For example, in

fn foo(x: &mut i32) {
*x = 13;
{ let y = &*x;
  { let z = &*y; }
}
}

the sharing of the data pointed to by x (and its aliases y and z) lasts for the scope of y. I found that in the base type system, this sharing is always equal to a lifetime, so I like to think of a type to be "shared for a lifetime". The type gets to perform some "logical transitions" when sharing starts. For example, when a Cell<T> is shared, it arranges for the data to be available to anyone running code in the current thread, so that everyone holding a &Cell<T> can access the data. This is un-done when the lifetime of the sharing ends.
Now, for RefCell: When the RefCell<T> is shared, its "content" (the T) is not yet shared. This is because someone might call borrow_mut. So, when and fir which lifetime is the content shared? Sharing starts when borrow is first called successfully, and it ends when either (a) the count reaches 0 again, or (b) the sharing of the containing RefCell ends (whatever happens first). This does not correspond to any lifetime that is actually considered during type-checking the program, and I am having a hard time making it possible to "synthesize" a lifetime with the right proeprties (i.e., the right lifetime inclusions). Of course the same happens in RwLock, but other than that, I am not aware of any type exercising a similar amount of control over the lifetime something is shared for.

arielb1 · 2016-09-12T18:30:38Z

So, when and fir which lifetime is the content shared? Sharing starts when borrow is first called successfully, and it ends when either (a) the count reaches 0 again, or (b) the sharing of the containing RefCell ends (whatever happens first).

That is the union of the active lifetimes of the guards. This is also the lifetime of the sharing of an Rc<T>.

RalfJung · 2016-09-12T19:06:25Z

That is the union of the active lifetimes of the guards.

What exactly do you mean by "active lifetime of the guards"? It's certainly not the lifetime parameter in Ref<'a, T>, as that one can easily outlive the sharing e.g. by calling drop on the guard. But it's also not the time until the guard is dropped, because the guard could be never dropped and still the sharing ends.

This is also the lifetime of the sharing of an Rc.

I think Rc is easer than RefCell though, because the "synthetic lifetime" here does not have to be a sublifetime of something larger, it just has all the guards lifetimes' as sublifetimes. For RefCell, there is the lifetime of the sharing of RefCell itself which upper-bounds the lifetime of the sharing of the content.

arielb1 · 2016-09-12T21:02:22Z

Active lifetime = intersection of data lifetime and lifetimes of all type parameters. Note that Rc with an allocator behaves just like RefCell.

Manishearth · 2016-10-04T16:57:04Z

The Servo-Gecko interface in stylo uses tons of unsafe code (which I'm trying to reduce and refactor). We make some (iffy) assumptions about unsafe code including:

&T can be used in an extern C function
Option<&T> and Option<Arc> are pointer-sized because of nonnull (and can be used in extern C functions in place of possibly-null pointer args
Cell<T> has the same repr as T for a repr(C) T. See Specify #[repr(transparent)] rfcs#1758
&(void enum) isn't optimized away. This is a trick we probably will get rid of.

strega-nil · 2016-10-04T18:52:11Z

@Manishearth I think, eventually, the fourth will be undefined behavior, but the other three seem like correct assumptions. (although ArcInner is not equivalent to #[repr(C)] { *const usize, *const usize, *const T })

Manishearth · 2016-10-04T18:57:23Z

Oh, we're not casting Option<&T> to Option<Arc<T>>, we're casting Option<&T> to Option<Arc<U>>, where the T <-> U relationship (T is opaque) is specified by an unsafe trait that enables this conversion 😄

mystor · 2016-10-04T19:16:58Z

I should clarify that this T @Manishearth is talking about in the casting is the enum Foo {}. in question, namely it looks like:

enum FooVoid {}
pub struct Foo(FooVoid);

Manishearth · 2016-10-04T19:28:13Z

(I'm thinking of replacing Foo with a ZST -- the ctypes lint doesn't like it, but &ZST should be safe through FFI)

arielb1 · 2016-10-04T19:29:00Z

Replace Foo with a ZST and use #[allow] to make the ctypes lint shut up.

Manishearth · 2016-10-04T19:29:40Z

Right, that's the plan

strega-nil · 2016-10-04T23:09:05Z

@Manishearth lol, I was just pointing that out :)

The ctypes lint is currently broken, imho. &() should be allowed (but &(#[repr(i8)] enum { __foo; }) works too).

mystor · 2016-10-05T00:42:51Z

@ubsan That enum unfortunately is not zero sized, so mem::swap can screw it up with &mut references. I think our best option right now is &[i8; 0] which is both zero sized and doesn't trigger the lint.

strega-nil · 2016-10-05T01:20:30Z

@mystor ... eww.

I'd rather do

#[repr(C)]
struct Foo {
    __: [i8; 0]
}

in order to get rid of the chance of conversion to &[i8]. But it's all still a hack to get around () not being C compatible.

mystor · 2016-10-05T02:07:39Z

@ubsan Yup, that's actually almost exactly what I'm doing.

And yes, I agree that it's a hack to get around () not being C compatible. I think we should probably fix that.

burdges · 2016-10-05T04:23:40Z

Just stating the obvious : There are going to be many situations where the only unsafe code consists of calls to std::mem::transmute<X,Y> where X is a packed C struct and Y is [u8; std::mem::size_of<X>] or vice versa. It sounds unfortunate to disable optimizations for that if X does not contain pointers.

arielb1 · 2016-10-05T16:27:19Z

The get_unchecked examples at rust-lang/rust#36976 are convincing me that we do not want to UB on just creating a reference to an invalid value, or even returning such a reference from a function. This would naturally make references to Void be well-defined too (through horribly unsafe).

arielb1 · 2016-10-05T16:32:52Z

BTW, even if invalid values are ignored, my aliasing memory models could end up with the result of an out-of-range [T]::get_unchecked not having a capability to access the out-of-range memory (because they are limited to the capability of the &[T] - i.e. only to the length of it).

bluss · 2017-01-21T15:37:22Z

I implement struct RevSlice<T>([T]) which is a reversed view of a slice. I'm unsure what's needed to put it on sound ground, specifically the conversions:

&[T] -> &RevSlice<T>
&mut [T] -> &mut RevSlice<T>
Box<[T]> -> Box<RevSlice<T>>

Amanieu · 2017-01-21T15:39:09Z

@bluss I think this is a good case for #[repr(transparent)]

bluss · 2017-01-22T13:58:10Z

Another crate:

linked-hash-map uses the same kind of reference to newtype wrapper.
- https://github.com/contain-rs/linked-hash-map/blob/69086c3bc5dfc81d7dd7bef8b99df0cdc4677517/src/lib.rs#L82-L96

llogiq · 2017-03-16T21:18:34Z

bytecount (used by ripgrep&xi) has one line of unsafe to transmute aligned byte slices to usize/SIMD slices. This pattern might perhaps be factored out into its own crate, or perhaps even find a place in libstd.

Manishearth · 2017-03-16T21:20:05Z

Can't byteorder do that?

bluss · 2017-03-17T18:01:48Z

byteorder does not deal with slice conversions.

I imagine that's something like split_aligned_for but even that formulation leaves much to be desired, slice iteratorish formulation seems to codegen much better (hence the other experiments there in the same junkyard and in rawslice).

Shnatsel · 2019-06-23T21:30:30Z

I'm not sure what this issue is supposed to achieve that downloading and grepping crates.io or searching for "unsafe" in Rust code on Github would not. But here are some bugs in unsafe code found in the wild, w/ short analysis. More of them here: https://rustsec.org/advisories/

Interesting or Rust-specific bugs:

SmallVec::insert_many is unsound servo/rust-smallvec#96 - double free if iterator over non-Copy type panics. Fixed via "set length on drop" trick.
inflate crate: unsafe code due to missing safe abstraction (no way to append contents of vector to itself), I've opened an RFC to provide one. Vulnerable, simplified example here. This is actually a recurring bug in DEFLATE decoders, another one with a different implementation also has this exact bug (pending disclosure atm).
Contents of uninitialized memory are leaked into the output on malformed inputs ruuda/claxon#10 exposure of uninitialized memory because of missing abstraction for an output buffer. The missing abstraction actually exists as a crate but nobody uses it because getting rid of 1-2 unsafe blocks per crate doesn't seem to be worth adding another dependency with unsafe code in it. In this case could be patched without regressing performance by requesting zeroed memory from the OS. This vuln triggered documentation improvements. I had to write a custom tool to find this bug!

Boring but serious bugs:

CVE in str::repeat, also victim of appending vector to itself unsafely, didn't check for overflow in capacity calculation.
use-after-free when growing to the same size servo/rust-smallvec#148 - use-after-free due to logic bug, boring all the way
Fix buffer overflow bug in deflate::core::BitBuffer::flush Frommi/miniz_oxide#47 - out-of-bounds write due to missing check. Now that u64::to_le_bytes is stabilized the unsafe is no longer needed. There is also precedent for the reverse pattern with from_le_bytes and TryInto. Safe versions are exactly as fast as the unsafe ones.

JakobDegen · 2023-04-25T16:39:50Z

Discussed in backlog bonanza: Not so sure this issue is providing much value and the discussion here is quite old. Closing

nikomatsakis changed the title ~~Unsafe code in the wild~~ Canvas unsafe code in the wild Sep 1, 2016

RalfJung transferred this issue from rust-lang/rust-memory-model Jun 18, 2019

RalfJung added the C-list Category: A list/collection of some sort. Please help maintain it! label Aug 14, 2019

JakobDegen closed this as completed Apr 25, 2023

Canvas unsafe code in the wild #146

Canvas unsafe code in the wild #146

Comments

nikomatsakis commented Sep 1, 2016

strega-nil commented Sep 2, 2016

nikomatsakis commented Sep 3, 2016

strega-nil commented Sep 3, 2016

eternaleye commented Sep 8, 2016

Ericson2314 commented Sep 8, 2016

nikomatsakis commented Sep 8, 2016

arielb1 commented Sep 8, 2016

RalfJung commented Sep 12, 2016

mystor commented Sep 12, 2016

RalfJung commented Sep 12, 2016

nikomatsakis commented Sep 12, 2016

RalfJung commented Sep 12, 2016

arielb1 commented Sep 12, 2016

nikomatsakis commented Sep 12, 2016

RalfJung commented Sep 12, 2016

arielb1 commented Sep 12, 2016 • edited Loading

RalfJung commented Sep 12, 2016

arielb1 commented Sep 12, 2016 • edited Loading

Manishearth commented Oct 4, 2016

strega-nil commented Oct 4, 2016

Manishearth commented Oct 4, 2016

mystor commented Oct 4, 2016

Manishearth commented Oct 4, 2016

arielb1 commented Oct 4, 2016

Manishearth commented Oct 4, 2016

strega-nil commented Oct 4, 2016 • edited Loading

mystor commented Oct 5, 2016

strega-nil commented Oct 5, 2016

mystor commented Oct 5, 2016

burdges commented Oct 5, 2016

arielb1 commented Oct 5, 2016 • edited Loading

arielb1 commented Oct 5, 2016

bluss commented Jan 21, 2017

Amanieu commented Jan 21, 2017

bluss commented Jan 22, 2017

llogiq commented Mar 16, 2017

Manishearth commented Mar 16, 2017

bluss commented Mar 17, 2017

Shnatsel commented Jun 23, 2019 • edited Loading

JakobDegen commented Apr 25, 2023

arielb1 commented Sep 12, 2016 •

edited

Loading

arielb1 commented Sep 12, 2016 •

edited

Loading

strega-nil commented Oct 4, 2016 •

edited

Loading

arielb1 commented Oct 5, 2016 •

edited

Loading

Shnatsel commented Jun 23, 2019 •

edited

Loading