Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concern about soundness of placement of stack memory #6

Open
RalfJung opened this issue May 16, 2019 · 8 comments
Open

Concern about soundness of placement of stack memory #6

RalfJung opened this issue May 16, 2019 · 8 comments

Comments

@RalfJung
Copy link

@japaric I read your post at https://blog.japaric.io/microamp/ with great interest. Most of that embedded stuff is very foreign to me, and I continue to be amazed by your ways to provide safe abstractions in this world. :)

That said, I have a concern about some of the things you describe there. Specifically:

The stack will be placed at the end of aliased BTCM1 (0x3_0000). Note that the stack grows downwards, towards smaller addresses.

References to stack variables can not be sent to a different core because they have non-static lifetimes and to place a value in a static variable it must only contain static lifetimes ( there’s an implicit T: 'static bound on all static variables).

I am not convinced that the latter is true -- that you will never see references to stack variables if you restrict yourself to T: 'static.

The first, maybe silly example is the following function:

pub fn make_static<T>(x: &T, f: impl FnOnce(&'static T)) -> ! {
    // Casting away the lifetime is okay because we know there is
    // a loop there that makes sure the reference remains valid forever.
    f(unsafe { &*(x as *const T)});
    loop {}
}

I think this function is safe in the sense that it doesn't break the type system and could conceivably be provided by a library. I even have a formal proof that this is compatible with our formal model of the type system.

But of course if a μAMP program uses this function, it could communicate a stack reference to another core, and suddenly the pointer is using the wrong address space.

Now this may sound somewhat artificial, but that doesn't make it less of a problem, at least in principle. We have two unsafely implemented abstractions (make_static and μAMP) which seem fine in isolation, but go bad when used together, and it is not clear "who is right". (Also from a formal perspective, I have no clear idea what is wrong with the function above that would make a correctness proof not go through -- you are using 'static in a way that means much more than just "a lifetime that goes on forever".)

I also am not sure if this example is the only problem: with intrusive collections, we can have elements in a collection that live shorter than the collection. This is certainly challenging to get right with concurrency in mind, but I see no fundamental reason why we couldn't have an intrusive concurrent collection where thread A can add elements (that will automatically remove themselves when the stack gets popped) and thread B can access them while they are in there.

So, generally, the rule you are postulating that T: 'static cannot refer to any thread's stack, is not something that necessarily has to be true, it would be a new piece in Rust's safety contract and I think it is in conflict with some legitimate uses (intrusive collections in particular, where the entire point is that you can add elements to a collection without showing that the elements outlive the collection).

I am curious about your thoughts on this. :)

@japaric
Copy link
Collaborator

japaric commented May 20, 2019

Thanks for commenting. I greatly appreciate people taking the time to poke holes
in my abstractions.

The first, maybe silly example is the following function:

I recall reading a GH (?) discussion about this pattern (manufacturing static
references in divergent functions) well over a year ago and someone mentioned
that it was not unwind safe, which is way I looked for alternatives to safely
producing &'static mut references and ended up using static mut variables.

I think this function is safe

This function is not memory safe in the presence of unwinding. Here's a
counter-example where this "safe" function plus some more safe code from
the standard library breaks Rust aliasing rules.

If you use catch_unwind in make_static then it might be "completely safe" --
at least I can't think of a counter-example in that case (is tkill / tgkill
considered "safe"?) -- but that makes the function depend std. Which then
makes it a non-issue for microamp since it's a no_std-only framework. Also,
one could implement unwind in no_std and not provide a catch_unwind
function (for performance); nothing mandates that both need to be implemented at
the same time.

The other change that may make this function safe is to wrap the &'static
reference in a NotSend newtype. Again, that would make it a non-issue for
microamp because the aliasing problem requires sending a reference across the
core boundary and that requires a : Send bound.

We have two unsafely implemented abstractions (make_static and μAMP) which
seem fine in isolation, but go bad when used together

I'm sure this will not be the last time we see this. :-)

and it is not clear "who is right"

All unsafely implemented abstractions have assumptions attached to them, even if
the author is currently not aware of them. std and no_std each have
different a set of "truths" associated to them -- no_std has less constraints
and less API surface so there are less things one can rely on but also people
are free to implement "standard"-like API more or less in whatever way they want
-- so an abstraction designed with only std in mind is likely to not hold its
ground when used in a no_std context; same thing the other way around.

My approach to safety / soundness is to spent time finding and documenting these
assumptions and then making sure they hold to the best of my ability. The
process is iterative as I'll find new no_std abstractions as time goes by.
However, I only concern myself with abstractions I deem safe and that has a
minimum requirement of not having a soundless hole when used in isolation and
when used in conjunction with the standard / core library.

you are using 'static in a way that means much more than just "a lifetime that
goes on forever".'

You are entirely correct. I'm a practical person :-). What I want is the user to
not send references that point into the stack and the T: 'static bound
(which I did not manually add but was already there due to the implementation)
does the job as far as I know.

I also am not sure if this example is the only problem: with intrusive collections

(Sorry, I don't have time look into this one at the moment)

I am curious about your thoughts on this. :)

At the moment I'm not overly concerned about people making static references out
of stack allocated things in the context of Real Time For the Masses, which is
the main consumer of this library, for the following reasons:

  • There's already a safe API to create &'static mut references out of
    static mut variables that end users are very likely to prefer.

  • Anything that contains an infinite loop or it's never ending is pretty much an
    anti-pattern in this framework.

  • It seems that the multi-core microcontrollers that are more commonly used
    don't do the memory aliasing that the UltraScale+ does so we won't even need
    to restrict sending pointers into the stack for those. The memory layout is
    addressed per target device; it's not mandated by the microamp crate
    itself.

  • As last resort, where required, we can do post compile time analysis in cargo microamp to error on programs that can't be proven safe in the type system.
    (Certainly, this would be no small amount of work but we are already
    considering analyzing the output LLVM-IR to reject uses of abstractions, like
    spin::Mutex, that are deadlock-y and make Worst-Case Execution Time analysis
    intractable)

@RalfJung
Copy link
Author

RalfJung commented May 20, 2019

This function is not memory safe in the presence of unwinding. Here's a
counter-example where this "safe" function plus some more safe code from
the standard library breaks Rust aliasing rules.

Indeed, I had forgotten about that as our formal model does not have unwinding. That is easy to fix though (without depending on libstd):

struct LoopGuard;

impl Drop for LoopGuard {
    fn drop(&mut self) {
        loop {}
    }
}

pub fn make_static<T: 'static>(x: &T, f: impl FnOnce(&'static T)) {
    // Casting away the lifetime is okay because we know there is
    // a loop there that makes sure the reference remains valid forever.
    // The loop will happen even if we unwind.
    let _guard = LoopGuard;
    f(unsafe { &*(x as *const T)});
}

(I'll come back to the rest of your comment later.)

@RalfJung
Copy link
Author

RalfJung commented May 21, 2019

All unsafely implemented abstractions have assumptions attached to them, even if
the author is currently not aware of them.

Sure. But as much as possible we should try to have one coherent framework that encompasses all of them. This is necessary to maintain basic compositionality -- the alternative is to give up on safety or to have an ecosystem split. Both do not seem like acceptable options to me.

std and no_std each have
different a set of "truths" associated to them -- no_std has less constraints
and less API surface so there are less things one can rely on but also people
are free to implement "standard"-like API more or less in whatever way they want
-- so an abstraction designed with only std in mind is likely to not hold its
ground when used in a no_std context; same thing the other way around.

I don't see that. I see nothing about std vs no_std that would have an effect here (and as I showed above, my example does not need std to become a proper safe abstraction even with panic=unwind). In particular, it would be a big soundness gap if a no_std library could not also safely be used in a std "context".

And conversely, I am not aware and I don't think crate authors are aware that safety in no_std would work any differently than in std. If anything, no_std libraries are harder to show safe because they can make less assumptions about their environments (because they can be used wherever a std library can be used, and more).

To be fair, no_std frameworks for specific hardware can make extra assumptions, which can help. But other than for interactions with the allocator or so, and maybe assuming panic=abort, I do not see a way this can be used to actually make more things safe.

We have two unsafely implemented abstractions (make_static and μAMP) which
seem fine in isolation, but go bad when used together

I'm sure this will not be the last time we see this. :-)

And in each case we'll have to find some way to resolve this conflict, or we have to start to build tooling such that cargo can reject compiling programs that link together incompatible libraries.

In the other cases that I am aware of (Rc+thread::scoped being unsound, as well as unions+josephine being unsound), one of the two libraries was decided to be incorrect. The only alternative to making such a decision is to split the ecosystem.

What I want is the user to
not send references that point into the stack and the T: 'static bound
(which I did not manually add but was already there due to the implementation)
does the job as far as I know.

I can see the need for excluding pointers-to-the-stack. But I am afraid using lifetimes is the wrong tool for this job. This sound more like it would need more refined reference/pointer types that carry something like an "address space" (which is a pattern that also comes up in other contexts).

There's already a safe API to create &'static mut references out of
static mut variables that end users are very likely to prefer.

I don't follow. My point is not that people will get an error about a 'static bound and then use something like my make_static. If "very likely" was good enough when it comes to type safety, we might as well all use C++. ;)

@japaric
Copy link
Collaborator

japaric commented May 23, 2019

That is easy to fix though (without depending on libstd):

Isn't this basically "relying on destructors running for memory safety" which I
think most people would agree it's not a great idea?

Your updated function assumes that the only way anything thread-like can be
terminated is by first running the destructors of all its stack allocated values
in all its stack frames and that's what a panicking std::thread does, but I
can write a (no_std) thread abstraction (using Linux system calls or on a
microcontroller) that can synchronously terminate itself without running any
destructor.

// instead of std::Mutex because you can construct it in "const context"
use parking_lot::Mutex;

static X: Mutex<i32> = Mutex::new(0);

my_thread::spawn(|| {
    let x = X.lock();
    let y = Box::new(0);
    let z = false;

    mk_static(&z, |z: &'static bool| {
       // omitted: send `z` to some other thread

        // terminates *only* this thread (`exit` syscall instead of `exit_group`)
       my_thread::exit();

       // `x` is leaked - OK
       // `y` is leaked - OK
       // `LoopGuard` is leaked - unsound
    });
})

This is basically a mem::forget(all_things_on_all_stack_frames) which ought to
be considered safe under the "leaking resources is safe in Rust" guideline. Of
course, this API can / will break abstractions that rely on destructors for
safety.

Alternatively, I can write a #[panic_handler] that terminates the thread but
does not walk the stack drop-ing all the things. I'm not aware of any
specification that mandates that panicking a thread (implemented outside std)
must run destructors -- that just happens to be how panic_unwind works.

I still do not consider your updated function to be sound. If you made it depend
on std, or at the very least panic_unwind, then it might be sound.

More fundamentally, though, to me loop {} seems too indirect a way to
communicate the threading / task / w/e runtime that you want some stack frame to
never be deallocated, at least compared to something like Box::leak.

I see nothing about std vs no_std that would have an effect here

There's a human factor. std users may think that std::thread-like or
panic_unwind-like behavior applies everywhere. This is fine if they link
their libraries to std (or panic_unwind) but becomes a problem if they
mislabel them as #![no_std].

This sound more like it would need more refined reference/pointer types that
carry something like an "address space"

Perhaps that'd be best but it doesn't exist and I'm targeting today's stable.
If I can work around the lack of a (stable) Rust feature to make something work
on stable I will. These workarounds are my least favorite part of Rust but I
prefer them over waiting months, or worst years.

@RalfJung
Copy link
Author

RalfJung commented May 23, 2019

Isn't this basically "relying on destructors running for memory safety" which I
think most people would agree it's not a great idea?

No, this is actually inherently relied on by several libraries to ensure exception safety (e.g. crossbeam nvm that uses catch_unwind, but here's one in libstd that is definitely needed for correctness, here's one in rayon where I cannot tell at a glance if it is needed for safety or just correctness -- but given that unsafe code might rely on functional correctness of its dependencies, that could still be crucial). You can rely on drop glue in your own functions running. What broke "scoped threads" back then was that it relied on drop glue in other functions to run, that doesn't work.

Pin has a "the destructor will run"-style guarantee that is inherently relied upon for intrusive collections, see the module docs. Again, this works because you know other people cannot mem::forget things in your stack frame.

For this reason, the discussion about setjmp/longjmp also had to take into account that using longjmp to skip over stack frames that have drop glues is in general incorrect. Similar concerns were raised in discussions on the forums.

(FWIW, I am a bit surprised that so many people seem to have the incorrect impression that you cannot rely on RAII. Seems like we need a concerted effort to raise awareness that mem::forget means you can have leaks, but basic RAII can still be relied upon? Cc @gnzlbg @Centril @rkruppe -- also WTF why is GitHub so stupid and I cannot ping rust-lang teams here)

Your updated function assumes that the only way anything thread-like can be
terminated is by first running the destructors of all its stack allocated values
in all its stack frames and that's what a panicking std::thread does, but I
can write a (no_std) thread abstraction (using Linux system calls or on a
microcontroller) that can synchronously terminate itself without running any
destructor.

That abstraction is unsafe and violating guarantees relied upon by a large amount of libraries. And that's not just because of the drop stuff I mentioned before.

For example, imagine rayon::join was implemented by running the first closure in the original thread, and the second closure in a new thread. rayon::join also allows sharing pointers to the stack of the original thread being shared with both threads. If the first thread did my_thread::exit(), deallocating its stack, the second thread would have dangling pointers:

fn main() {
  let m = Mutex::new(0);
  rayon::join(
    || my_thread::exit(), // deallocates m because the first closure runs on the main thread
    || m.lock().unwrap(), // *oops* accesses dangling memroy
  );
}

I'm not aware of any
specification that mandates that panicking a thread (implemented outside std)
must run destructors -- that just happens to be how panic_unwind works.

I am not aware of any specification for Rust. ;) More seriously though, when you use unsafe, it is your responsibility to show that this is safe -- not the docs responsibility to list this as unsafe.

@gnzlbg
Copy link

gnzlbg commented May 23, 2019

also WTF why is GitHub so stupid and I cannot ping rust-lang teams here)

You can only ping rust-lang/teams in repositories of the rust-lang org.. I don't know why this is the case, but this is how it is =/

FWIW, I am a bit surprised that so many people seem to have the incorrect impression that you cannot rely on RAII. Seems like we need a concerted effort to raise awareness that mem::forget means you can have leaks, but basic RAII can still be relied upon?

I've seen users confuse "destructors are not guaranteed to run" (e.g. because you can plug off the computer and prevent them from running, because leaking is safe, etc.) with "you can never rely on destructors running". This was the case, for example, in this issue (see that comment by @nikomatsakis and the following ones), where it wasn't considered bad that some destructors were not running because "we don't guarantee that".

While it is true that we don't guarantee that all destructors always run, for each point in a program's execution, we do guarantee that certain destructors have always finished running. Because we do guarantee that, unsafe code can rely on this guarantee, and code that breaks this guarantee is, therefore, not safe (you can only break the guarantee with unsafe code, so that makes any safe abstraction over such unsafe code unsound).

Why do people get this idea? I don't know, maybe because "destructors are not guaranteed to run" is repeated more often than what exactly is guaranteed. Remembering this catch phrase is enough to get by, and easier than remembering the rules. Also, many languages don't have destructors, but finalizers, and you can't really rely on any finalizer having run at any point in the execution of a program. So those who extrapole from finalizers to destructors might assume that the same thing applies there.

@RalfJung
Copy link
Author

I do agree though that I don't know of any place where we positively state when drop is guaranteed to be run, and it seems prudent to do so. I opened rust-lang/nomicon#135 for that.

@RalfJung
Copy link
Author

Here's another case of what I claim is a safe abstraction that relies on drop glue of a guard to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants