BTreeMap: fix pointer provenance rules #78480

ssomers · 2020-10-28T11:11:41Z

Fixes #78477 and includes #78476

library/alloc/src/collections/btree/node.rs

RalfJung · 2020-10-28T20:51:19Z

library/alloc/src/collections/btree/node.rs

@@ -342,28 +342,36 @@ impl<BorrowType, K, V, Type> NodeRef<BorrowType, K, V, Type> {
    /// # Safety
    /// The node has more than `idx` initialized elements.
    pub unsafe fn key_at(&self, idx: usize) -> &K {


This function is not even used any more in this file... is it even worth keeping (as opposed to expecting the caller to handle MaybeUninit?

Well it's there for search_linear? in search.rs

At first I found it weird to have a function returning a reference to something that you can only use assuming it's initialized. But it's there because I didn't know how else to copy/write a slice into uninitialized elements. And it's somewhat nice to have functions that only do one thing.

Well it's there for search_linear? in search.rs

If that's the only user, it can just to assume_init_ref itself though?

At first I found it weird to have a function returning a reference to something that you can only use assuming it's initialized. But it's there because I didn't know how else to copy/write a slice into uninitialized elements. And it's somewhat nice to have functions that only do one thing.

Well, I do not understand this code enough to suggest other APIs here, or to tell how many of these callers call this function on uninit elements.

Actually, it doesn't seem useful to change key_at after all. To copy a slice from one node to another, miri-track-raw-pointers wants the source pointer to point something it can read entirely, not just the first element. Which is sensible and easy to do, but it leaves key_at only useful to read a key.

The next problem, to which I don't see a way out at the moment, is as_internal_mut():

302 | unsafe { &mut *(self.node.as_ptr() as *mut InternalNode<K, V>) } | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ trying to reborrow for Unique at alloc22839+0x140, but parent tag <58889> does not have an appropriate item in the borrow stack |

We disguise a *mut InternalNode as a *mut LeafNode to squeeze it into BoxedNode, and there's no way around that. Now how do we unpack such a party pooper and convince Miri that this is an InternalNode?

-Zmiri-track-pointer-tag tracks the tag to the creator, BoxedNode::from_internal:

BoxedNode { ptr: Unique::from(&mut Box::leak(node).data) }

It does not help to make the reference raw:

BoxedNode { ptr: unsafe { Unique::new_unchecked(&raw mut Box::leak(node).data) } }

Nope. Given

fn from_internal(node: Box<InternalNode<K, V>>) -> Self { let whole = ptr::raw_mut!(*Box::leak(node)); let part = unsafe { ptr::raw_mut!((*whole).data) }; BoxedNode { ptr: unsafe { Unique::new_unchecked(part) } } }

the tag of the internal pointer still refers to:

| 127 | let part = unsafe { ptr::raw_mut!((*whole).data) }; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ created tag 58938

That is strange, I will have to investigate.

It does remind me of a much simpler example I've been trying to unravel.

Same behaviour with this example:

#![feature(ptr_internals, raw_ref_macros)] use std::ptr::{self, NonNull, Unique}; struct Part { _lame: i32, } #[repr(C)] struct Whole { part: Part, extra: i32, } fn main() { let it = Box::new(Whole { part: Part { _lame: 0 }, extra: 42 }); let whole = ptr::raw_mut!(*Box::leak(it)); let part = unsafe { ptr::raw_mut!((*whole).part) }; let unique = unsafe { Unique::new_unchecked(part) }; let raw = NonNull::from(unique); let typed = unsafe { &mut *(raw.as_ptr() as *mut Whole) }; assert!(typed.extra == 42); }

The UB also disappears by removing the _lame field.

Yeah this is a Miri / Stacked Borrows problem, good catch! See rust-lang/miri#1608.

ssomers · 2020-10-29T09:34:20Z

This particular test case now passes, and I cleaned up a bit. Now over to the standard test cases…

PS well it passes on the old master, but CI tries to compile on the latest master.

RalfJung · 2020-10-29T11:14:46Z

This particular test case now passes, and I cleaned up a bit.

That's great! Maybe we should review and land this then? It clearly improves the situation.

Now over to the standard test cases…

FWIW, I didn't even try to run any parts of the stdlib test suite with -Zmiri-track-raw-pointers, so you might run into plenty of problems that way.^^ If the error talks about "untagged" pointers, that is a false positive -- this means a int-ptr cast lost provenance, and in this mode Miri is not equipped to deal with that.

library/alloc/src/collections/btree/node.rs

RalfJung · 2020-10-29T11:18:07Z

library/alloc/src/collections/btree/node.rs

+    ///
+    /// # Safety
+    /// - The node has more than `idx` initialized elements.
+    fn keys_mut(&mut self, start: usize) -> &mut [MaybeUninit<K>]


This function is safe but has a safety comment? What is idx?

Also, the function is actually not safe, as I can do awful things like

foo.keys_mut()[0] = MaybeUninit::uninit();

I fixed that already, wait a second...

Is there any caller that does not use 0 for start? I don't entirely understand why you added that parameter.

Also, the function is actually not safe, as I can do awful things like

But that's basically what slice_remove does, and what MaybeUninit is for, so is that really unsafe? It's only unsafe when you later assume_init it, right?

The concern about the return type remains -- functions returning &mut MaybeUninit are rarely safe because they allow the caller to de-initialize the given memory.

I agree that for internal functions it can be less clear-cut who is responsible for upholding which invariant.

but because it might stump on other references.

it doesn't do anything odd with lifetimes though? So that seems more like something the caller needs to be aware of regardless to me.^^

But in the end it is somewhat arbitrary, so as long as it is properly documented I can live with any of these choices here.

I tagged that unsafe now

No I didn't. I forgot to tag as_leaf_mut, and if I try that, the avalanche means you're tempted to make all bodies unsafe.

So I made the as_X methods safe again, and made the ones returning MaybeUninit references and slices unsafe. It doesn't make much sense, since you can access the MaybeUninit array directly and safely. But you need an unsafe block to do use them anyway.

Maybe the fields shouldn't be open to the entire node code, and the methods returning MaybeUninit stuff should be members of LeafNode/InternalNode instead of NodeRef. I fenced off the BoxedNode field just to see what the reviewer(s) will say.

One class of methods that has been very clear to me from the start, is those on a plain self like first_edge, left_kv: they simply return the same ownership, borrowship, or whatever you call it.

Next to the easy ones are those on a plain self called into_X (where I think the unwritten rule is that X is a part of the original, not a neighbour). Some of them return an 'a lifetime, which stems from the NodeRef's marker::Immut<'a>/ValMut<'a>/Mut<'a>. It didn't understand what that implied for a long time, but they didn't look daunting either, thanks to the plain self.

I now understand why there is both into_key_at and key_at: because the former profits from the lifetime incorporated into marker::Immut<'a>/ValMut<'a>/Mut<'a>, and the latter is defined and used for all 3 of those, and the only way to do that is to define it over some BorrowType without lifetime. Therefore you can write the latter using the former, but not the other way around. into_key_at is easy enough and more sophisticated.

Therefore I turned all the new/changed function into the into_X class, and I like it.

If there's a good place to document that structure with a comment in the code, that seems like it could be helpful. :)

ssomers · 2020-10-29T11:21:00Z

I really don't know how to Miri the other test cases, other than writing each one as a standalone program (for those not peeking into internals). So I cleaned up some more instead.

RalfJung · 2020-10-29T11:24:02Z

I really don't know how to Miri the other test cases,

I can write instructions for this, but it is probably easier to wait until the things that we currently have in the pipeline have landed.

RalfJung · 2020-10-29T11:24:45Z

library/alloc/src/collections/btree/node.rs

+    /// Exposes the data of an internal node for reading.
+    ///
+    /// # Safety
+    /// We have exclusive access to the node (and not just to the `NodeRef`).


Isn't there supposed to be an invariant that a NodeRef<marker::Mut always has exclusive access to the node?

Somewhat, but with at least these exceptions:

temporary ones with reborrow_mut

temporary one with ptr::read in remove_kv_tracking (I have a PR queued to use reborrow_mut instead)

just realized today, the pointer returned by insert_fit

temporary ones with reborrow_mut

Oh, reborrow_mut doesn't use some lifetimes to make sure only one of the two ptrs can be used?
But then I'd say it is reborrow_mut that should be unsafe, with the safety invariant being "do not use the original ref until you stopped using the reborrowed one". (Somewhat like dorman-mut-ref actually...)

Well reborrow_mut is unsafe, with ample warning, and I think the borrow checker makes sure you can only use the top one, but you can still use it to deallocate the node that the original one refers to, and that's not cool…

RalfJung · 2020-10-29T11:26:08Z

library/alloc/src/collections/btree/node.rs

+    ///
+    /// # Safety
+    /// - The node has more than `idx` initialized elements.
+    fn keys_mut(&mut self, start: usize) -> &mut [MaybeUninit<K>]


The concern about the return type remains -- functions returning &mut MaybeUninit are rarely safe because they allow the caller to de-initialize the given memory.

ssomers · 2020-10-29T11:45:04Z

I really don't know how to Miri the other test cases,

That wasn't entirely true. I can Miri at any point, by rebasing the btree code on an older master, which I already did. But I can't -Zmiri-track-raw-pointers it.

RalfJung · 2020-10-29T11:48:31Z

You are using https://github.com/RalfJung/miri-test-libstd, right? So that would be something like

RUST_SRC=/path/to/rustc MIRIFLAGS="-Zmiri-track-raw-pointers" ./run-test.sh alloc btree

But be aware that I have never tried this; even the libtest harness might already be breaking pointer provenance rules for all we know... in fact, until #78484 lands, many uses of HashMap will break with raw ptr tracking.

EDIT: Oh and you'll have to ./miri install Miri master to get the track-raw-pointers flag.

ssomers · 2020-10-29T11:54:20Z

You are using https://github.com/RalfJung/miri-test-libstd, right?

No, never heard of that. I normally did cargo miri test in library/alloc. Or actually, on a symbolic copy with a lot of non-btree tests stripped out.

RalfJung · 2020-10-29T11:56:43Z

I normally did cargo miri test in library/alloc. Or actually, on a symbolic copy with a lot of non-btree tests stripped out.

Wow, I had no idea that that would even work.^^

ssomers · 2020-10-29T19:46:36Z

even the libtest harness might already be breaking pointer provenance rules for all we know

It's not that bad, but of the further two test cases I converted manually, one failed miserably on the .sort() of a simple Vec<u32>, before getting to anything btree.

Anyway, it seems this is about the best I can get out of it. Doesn't fail with the normal Miri based on ffa2e7a.

RalfJung · 2020-10-29T20:15:15Z

Okay so I guess this is ready for review by @Mark-Simulacrum then. :)

marked this pull request as ready for review 28 minutes ago

GH unfortunately doesn't send notifications for this so a reviewer would not notice such a status change.

Mark-Simulacrum · 2020-10-29T20:19:07Z

It's S-waiting-on-review, so I'll see it on my next round of reviews :) Probably this weekend or tomorrow, not sure yet.

Mark-Simulacrum · 2020-11-06T19:38:09Z

Okay. Seems fine.

@bors r+

bors · 2020-11-06T19:38:11Z

📌 Commit 3ce64acedb5c0219309dd2ae6439fb82c4d05778 has been approved by Mark-Simulacrum

bors · 2020-11-09T03:23:36Z

☔ The latest upstream changes (presumably #78889) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

ssomers · 2020-11-09T08:41:39Z

It seems we get a conflict over the same commit, rebased here on a newer master than the original.

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

Mark-Simulacrum · 2020-11-09T13:58:57Z

@bors r+

bors · 2020-11-09T13:58:58Z

📌 Commit 7ca6e8f has been approved by Mark-Simulacrum

Rollup of 12 pull requests Successful merges: - rust-lang#74754 (Add `#[cfg(panic = '...')]`) - rust-lang#76468 (Improve lifetime name annotations for closures & async functions) - rust-lang#77016 (Test clippy on PR CI on changes) - rust-lang#78480 (BTreeMap: fix pointer provenance rules) - rust-lang#78502 (Update Chalk to 0.36.0) - rust-lang#78513 (Infer the default host target from the host toolchain if possible) - rust-lang#78566 (Enable LLVM Polly via llvm-args.) - rust-lang#78580 (inliner: Break inlining cycles) - rust-lang#78710 (rustc_ast: Do not panic by default when visiting macro calls) - rust-lang#78746 (Demote i686-unknown-freebsd to tier 2 compiler target) - rust-lang#78830 (fix `super_visit_with` for `Terminator`) - rust-lang#78844 (Monomorphize a type argument of size-of operation during codegen) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

…Mark-Simulacrum BTreeMap: fix pointer provenance rules in underfullness Continuing on rust-lang#78480, and for readability, and possibly for performance: avoid aliasing when handling underfull nodes, and consolidate the code doing that. In particular: - Avoid the rather explicit aliasing for internal nodes in `remove_kv_tracking`. - Climb down to the root to handle underfull nodes using a reborrowed handle, rather than one copied with `ptr::read`, before resuming on the leaf level. - Integrate the code tracking leaf edge position into the functions performing changes, rather than bolting it on. r? `@Mark-Simulacrum`

rust-highfive assigned Mark-Simulacrum Oct 28, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 28, 2020

ssomers changed the title ~~BTreeMap: fix aliasing and pointer provenance rules~~ BTreeMap: fix pointer provenance rules Oct 28, 2020

RalfJung reviewed Oct 28, 2020

View reviewed changes

library/alloc/src/collections/btree/node.rs Outdated Show resolved Hide resolved

RalfJung reviewed Oct 28, 2020

View reviewed changes

library/alloc/src/collections/btree/node.rs Outdated Show resolved Hide resolved

ssomers marked this pull request as draft October 28, 2020 12:54

ssomers force-pushed the btree-alias branch 2 times, most recently from b7f4051 to 8c9214c Compare October 28, 2020 20:40

RalfJung reviewed Oct 28, 2020

View reviewed changes

ssomers force-pushed the btree-alias branch 2 times, most recently from 535a99d to 43e0c6a Compare October 29, 2020 09:30

ssomers force-pushed the btree-alias branch from 43e0c6a to 6a257da Compare October 29, 2020 10:44

RalfJung reviewed Oct 29, 2020

View reviewed changes

library/alloc/src/collections/btree/node.rs Show resolved Hide resolved

RalfJung reviewed Oct 29, 2020

View reviewed changes

ssomers force-pushed the btree-alias branch from 6a257da to 515a06b Compare October 29, 2020 11:19

RalfJung reviewed Oct 29, 2020

View reviewed changes

ssomers force-pushed the btree-alias branch 2 times, most recently from c4cbc0c to d0d876d Compare October 29, 2020 19:30

ssomers marked this pull request as ready for review October 29, 2020 19:46

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 6, 2020

m-ou-se mentioned this pull request Nov 7, 2020

Rollup of 12 pull requests #78831

Closed

Dylan-DPC-zz mentioned this pull request Nov 8, 2020

Rollup of 15 pull requests #78870

Closed

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 9, 2020

BTreeMap: fix pointer provenance rules, make borrowing explicit

7ca6e8f

ssomers force-pushed the btree-alias branch from 3ce64ac to 7ca6e8f Compare November 9, 2020 08:16

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 9, 2020

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 9, 2020

Dylan-DPC-zz mentioned this pull request Nov 9, 2020

Rollup of 12 pull requests #78904

Merged

bors merged commit 2187f3c into rust-lang:master Nov 10, 2020

rustbot added this to the 1.49.0 milestone Nov 10, 2020

ssomers deleted the btree-alias branch November 10, 2020 13:49

BTreeMap: fix pointer provenance rules #78480

BTreeMap: fix pointer provenance rules #78480

Conversation

ssomers commented Oct 28, 2020 • edited Loading

Choose a reason for hiding this comment

ssomers Oct 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers Oct 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers Oct 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers commented Oct 29, 2020 • edited Loading

RalfJung commented Oct 29, 2020

RalfJung Oct 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers commented Oct 29, 2020 • edited Loading

RalfJung commented Oct 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssomers commented Oct 29, 2020

RalfJung commented Oct 29, 2020 • edited Loading

ssomers commented Oct 29, 2020

RalfJung commented Oct 29, 2020

ssomers commented Oct 29, 2020

RalfJung commented Oct 29, 2020

Mark-Simulacrum commented Oct 29, 2020

Mark-Simulacrum commented Nov 6, 2020

bors commented Nov 6, 2020

bors commented Nov 9, 2020

ssomers commented Nov 9, 2020

Mark-Simulacrum commented Nov 9, 2020

bors commented Nov 9, 2020

ssomers commented Oct 28, 2020 •

edited

Loading

ssomers Oct 28, 2020 •

edited

Loading

ssomers Oct 28, 2020 •

edited

Loading

ssomers Oct 29, 2020 •

edited

Loading

ssomers commented Oct 29, 2020 •

edited

Loading

RalfJung Oct 29, 2020 •

edited

Loading

ssomers commented Oct 29, 2020 •

edited

Loading

RalfJung commented Oct 29, 2020 •

edited

Loading