Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lint [ptr_to_temporary] #10962

Closed
wants to merge 1 commit into from
Closed

Conversation

Centri3
Copy link
Member

@Centri3 Centri3 commented Jun 15, 2023

This is a small one, but very nice as even miri doesn't catch this (it catches it if it's used incorrectly, but not created erroneously).
There were a couple cases of this in other tests, I checked them all and they were 100% valid so I think this is (hopefully) bug-free. Still likely needs more tests though, as always :D (I'm pretty bad at them)

Closes #10959

changelog: New lint [ptr_to_temporary]

@rustbot
Copy link
Collaborator

rustbot commented Jun 15, 2023

r? @Alexendoo

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Jun 15, 2023
@Centri3 Centri3 force-pushed the ptr_to_temporary branch from 627aa92 to 0c88f66 Compare June 15, 2023 20:22
@Centri3 Centri3 force-pushed the ptr_to_temporary branch from 0c88f66 to caca5bd Compare June 15, 2023 20:33
@Centri3 Centri3 force-pushed the ptr_to_temporary branch from caca5bd to 96c9004 Compare June 15, 2023 21:09
@bors
Copy link
Contributor

bors commented Jun 16, 2023

☔ The latest upstream changes (presumably #10925) made this pull request unmergeable. Please resolve the merge conflicts.

@Jarcho
Copy link
Contributor

Jarcho commented Jun 16, 2023

I'm still not sure how the lint as written is useful. Blanket banning &e as *const _ where e is some temporary expression is overkill. Using it as an argument is fine so long as the function doesn't store the pointer somewhere to be used after the function returns. When bound to a local as in let x = &e as *const _, e will have it's lifetime extended to that of x, so there's no issue with that either.

@asquared31415
Copy link
Contributor

There's still a case of &0 as *const i32 in the tests that this is linting on when it's completely sound to use.
example, run with miri

I don't recall exactly where we say what does and doesn't get promoted to statics, and whether that's a stable promise, so I'm not sure that this lint can be useful in that regard. If those rules ever change, this lint is either going to miss too much (preferable, but not great), or error on code that's perfectly sound.

@Centri3
Copy link
Member Author

Centri3 commented Jun 17, 2023

Using it as an argument is fine so long as the function doesn't store the pointer somewhere

Huh, weird, I've went with the assumption that giving a function a pointer to temporary would cause immediate UB. Guess not.

and whether that's a stable promise

If it's not, I see no reason why that shouldn't be linted against (as such code changing at will due to an implementation detail is likely not the intended behavior).

Regardless, though; I've done some testing and I think, even if it's UB, it's impossible for it to be reproduced as allocating stack memory always involves subtracting from the stack pointer; in the case of allocating a temporary, then allocating actual locals, it'll never be overwritten because the temporary will be at [rsp+04h] and the rest will be at [rsp+08h] and so on. The compiler is free to reorder this however it wants but 2 locals should never be at the same memory region (and this isn't something rustc can change at will; it's up to LLVM to do this, afaict). Whether or not there's an edge-case that makes it UB doesn't matter; Passing a pointer to such a temporary will always be correct (with a modern-day compiler of course), even in called functions as they will subtract from the stack pointer again; not add. (miri still reports this as UB though, which is understandable)

But; returning a temporary, however, is UB and will always be reproduceable, as it'll both free the temporary and add to the stack pointer which can then in the caller overwrite the pointer's contents if you're unlucky. This should be linted even if it's a static, as for the reason stated above. So, I think we should isolate this to returning temporaries.

@y21
Copy link
Member

y21 commented Jun 17, 2023

Can this also catch vec![1].as_ptr() (or, idk if this can be made more general, maybe if a method is called on a temporary that takes &self and returns a raw pointer?), or is this outside the scope for this? This is a case where temporary lifetime extension doesn't happen, and I could see a lint being very useful there

@Centri3
Copy link
Member Author

Centri3 commented Jun 17, 2023

I'll look into that, that would be very useful. If &self and returning a raw pointer is all it takes to deem it as a lint-able temporary that would likely be pretty simple.

@Alexendoo
Copy link
Member

Huh, weird, I've went with the assumption that giving a function a pointer to temporary would cause immediate UB. Guess not.

Ignoring constant promotion and lifetime extension, temporaries usually live until the end of the statement, so they'll be dropped after the function has already returned

called on a temporary that takes &self and returns a raw pointer?

Allocators would be a moderately common counter example here, or any other thing that returns a pointer to something that outlives &self

Constant promotion should be easy enough to ignore, we can throw in an is_const_evaluatable check. Lifetime extension only happens to let statements so linting non-let statements where the pointer escapes the temporary scope would be a good way around that

One thing to be aware of if you're going for returns is block expressions like so https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9bc3a39542f9977e20653e34d50c9f64

In general let's go with whatever MIRI says as the source of truth, if it says it's UB we could lint it, if not we shouldn't

@Centri3
Copy link
Member Author

Centri3 commented Jun 17, 2023

Allocators would be a moderately common counter example here, or any other thing that returns a pointer to something that outlives &self

Good point, we could probably only lint if the temporary implements Drop but that will likely have the same issue, only a subset of those FPs would be solved, that could maybe be a restriction/nursery lint

Constant promotion should be easy enough to ignore, we can throw in an is_const_evaluatable check. Lifetime extension only happens to let statements so linting non-let statements where the pointer escapes the temporary scope would be a good way around that

I don't think we should just ignore constant promotion, again, imo, if code depends on an implementation detail to be correct that should still be disallowed

Also, this returns true on something like std::hint::black_box, which literally causes the compiler to not promote it to a constant, so, I don't think this is the right way to go about this, as it only checks if it's a const fn

In general let's go with whatever MIRI says as the source of truth, if it says it's UB we could lint it, if not we shouldn't

I think for the case of returning pointers to temporaries, we should make it more general to "usage of this pointer may cause Undefined Behavior if it isn't promoted to a constant". Even if it isn't UB in some cases, it's definitely not something intended (nor wanted) by the programmer.

(It's also worth mentioning constant promotion should not happen if it can change the behavior of the program, in this case it (understandably) can, as the program relies on it to be correct, so I don't think ignoring constant promotion is the right way to go about this.)

@Jarcho
Copy link
Contributor

Jarcho commented Jun 17, 2023

Both constant promotion and lifetime extension are defined as part of the language. Even if they weren't in the reference, they would de facto be part of the language; far too much code would break because of it.

@Centri3
Copy link
Member Author

Centri3 commented Jun 17, 2023

So, if I understand correctly, the exact criteria for constant promotion are:

  • no interior mutability
  • no destructors
  • no calls to functions that are not rustc_promotable
  • it will not change the behavior of the code

And this is entirely independent of opt-level/* or any other compiler flags (so, --debug and --release builds will promote the same constants [minus if they're optimized out entirely, of course]). Also, this behavior will not change; whatever's promoted today will still be (or not be) in 20 years.

If all of those are true, then I can see why we shouldn't lint it. It would be entirely pointless. But IMO, if even one of those aren't it should still be linted.

Also,

code depends on an implementation detail

This was a bad way to put it. It's not an implementation detail. I more so meant it depends on something outside of the programmer's control to be correct; which, if it can change at will, is not something I would deem "correct".

* Not 100% true. Have a look at what this code outputs with opt-level 0, 1, 2 and 3. With opt-level 0, it gets a reference to a promoted and puts it on the stack. 1, it's removed entirely and rax is set to 0 to signify success, and anything higher 1. In this case though, since the constant is being optimized out entirely rather than it being put on the stack instead, I'm leaning more towards it being a separate issue. I haven't found any cases where it's constant with --debug and not in --release (or the other way around) that doesn't involve the call being removed entirely, so it may indeed be true.

@Alexendoo
Copy link
Member

Looking at the optimised output of programs containing UB will not be that useful, because LLVM is free to optimise assuming it can't happen

It can necessarily change the behaviour of the code, many things won't compile without it (which is why it can't be opt level dependent) and addresses of references can be observed at runtime

@Centri3
Copy link
Member Author

Centri3 commented Jun 21, 2023

because LLVM is free to optimise assuming it can't happen

Which is what makes it so weird since it should prove that it can happen, it's as if it optimizes out storing 110 (which is the only major difference between the two).

many things won't compile without it

I meant code that would compile without it that's now promoted

In this case, that would be something like &(100 + 10) as *const i32, both compile fine but if it's suddenly no longer promoted it's dropped before the function returns. The opposite would be &(100 + *&(10)) as *const i32, which while 110 is promoted is still UB as it stores a pointer to it on the stack. Though of course, such a change would likely not happen since it would break &(100 + 10) as well (afaik).

and addresses of references can be observed at runtime

Depending on the exact address of something is a terrible idea :D Sure, it's observable at runtime, but the code has the same behavior, and if you're printing the address of something, it's gonna be practically random anyway; the OS is free to allocate the binary into memory wherever it pleases (address space layout randomization), any code that depends on that is incorrect (and honestly, should be linted in correctness too; [minus null of course for obvious reasons] something like p == &0x100 as *const i32 is never going to succeed but compiles anyway as no address can be on the first page of memory)

If there's an exhaustive list of everything required for constant promotion, then I'm happy to implement it that way but I still believe this should be linted, as a configuration option possibly.

@Alexendoo
Copy link
Member

it should prove that it can happen

That's thinking about it from the wrong angle, by definition the compiler can assume UB does not happen. That's precisely the reason it can consider the 110 a dead store - it would be UB to read from a dangling pointer, therefore it isn't read from, therefore there's no reason to store 110 in it. Or something like that.

I think a lint that suggests transforming cases where the promotion is being relied upon for safety to a separate binding would be fine, { let x: &'static _ = ...; &x as *const _ } or w/e, but for this correctness lint it would be nice if we could say "this is almost certainly UB"

@Centri3
Copy link
Member Author

Centri3 commented Jun 21, 2023

I agree. For this lint in particular, let's go with only linting non-promoted returned pointers then have a separate restriction lint (or, of course a configuration option since they're kinda the same lint anyway).

But in regards to the first paragraph, I meant that LLVM has to prove so, in this case the MIR generated is basically identical for that function with both opt-level=1 and opt-level=3, even the IR generated. But yeah the compiler should assume it doesn't because that's up to the programmer to uphold. I'm guessing that's what it does here on opt-level=2+.

@Centri3
Copy link
Member Author

Centri3 commented Jun 26, 2023

I've made it only lint non-promoted temporaries now. It's a bit ugly, though. I tried first recreating promote_consts::Collector and Validator, but this didn't go too well (it kinda worked, but not really). Plus it'd be a MIR lint, so, not really worth the effort just to clean it up a bit.

I also moved it away from casts, this was because it was a MIR lint but I'll move it back later as it no longer operates on MIR

@Centri3 Centri3 force-pushed the ptr_to_temporary branch from 1f10fb5 to aa2b23d Compare July 21, 2023 17:13
@bors
Copy link
Contributor

bors commented Jul 21, 2023

☔ The latest upstream changes (presumably #11003) made this pull request unmergeable. Please resolve the merge conflicts.

@Centri3
Copy link
Member Author

Centri3 commented Jul 22, 2023

I've made it operate on the MIR now for as_ptr calls and it works great! It seems to have no FPs I know of (though one minor FN I know how to fix), but there are surely some. Still WIP, so does anybody have some edge-cases I can add to the tests?

The reason why MIR works better is mostly because we know when locals are dropped just by looking at the statements/terminator, if there's a Drop for the temporary and no StorageDead for the pointer we know it outlives it. In the case there's no Drop in the basic block, it's either not temporary (which we know is false) or has no drop glue. In which case we need to be careful as the dead-order(?) is arbitrary, I think anyway.

^ Actually, this can probably be simplified even further: If there's a drop, we can use the same logic but continue to the target basic block for Drop. For every case I've seen the pointer is dead before the receiver if it's not dangling, or after if it's dangling. But somebody who knows MIR please confirm this ^^

MIR also has no concept of method calls, just function calls, so we can easily lint both Vec::as_ptr and .as_ptr() the same.

The earlier 'static problem was easily solvable by getting the function signature and seeing if there are any late bound or static regions. This is likely wrong but it seems to work. The place's type or typeck results cannot be used unfortunately as these are changed to ReErased.

@Centri3 Centri3 force-pushed the ptr_to_temporary branch from 8c84a12 to 75d9c16 Compare July 22, 2023 21:04
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jul 26, 2023
Add `sym::iter_mut` + `sym::as_mut_ptr` for Clippy

We currently have `sym::iter` and `sym::iter_repeat`, this PR adds `sym::iter_mut` as it's useful for rust-lang/rust-clippy#11038 and another Clippy lint, it also adds `sym::as_mut_ptr` as it's useful for rust-lang/rust-clippy#10962.
flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request Jul 28, 2023
Add `sym::iter_mut` + `sym::as_mut_ptr` for Clippy

We currently have `sym::iter` and `sym::iter_repeat`, this PR adds `sym::iter_mut` as it's useful for rust-lang#11038 and another Clippy lint, it also adds `sym::as_mut_ptr` as it's useful for rust-lang#10962.

if let ExprKind::Cast(cast_expr, _) = expr.kind
&& let ExprKind::AddrOf(BorrowKind::Ref, _, e) = cast_expr.kind
&& !is_promotable(cx, e)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should operate on the MIR here, as after a few reborrows(? stuff like _ = &(*_)) there should always be a const _ if it's promoted. Otherwise, it's temporary. This should be a good starting point.

traverse_up_until_owned_inner(body, local_assignments, start.as_ref(), 0)
}

fn traverse_up_until_owned_inner<'tcx>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use an analysis pass instead, and track where this owned data goes.

cc @Jarcho, do you have any ideas of what "owned" should be defined as? Basically, for Drop types, the place that is dropped. We could just use the place that is dropped for these types but a temporary that doesn't implement Drop won't have this, despite being freed. Checking the order of StorageDeads works, as the temporary cannot be dead before the pointer, but there may be some nuances to it I'm missing.

source_info.span,
body.source_scopes[source_info.scope]
.local_data
.clone()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.clone()
.as_ref()

Comment on lines +107 to +120
// Get the final return statement if this is a return statement, or don't lint
let expr = if let ExprKind::Ret(Some(expr)) = expr.kind {
expr
} else if let OwnerNode::Item(parent) = cx.tcx.hir().owner(cx.tcx.hir().get_parent_item(expr.hir_id))
&& let ItemKind::Fn(_, _, body) = parent.kind
&& let block = cx.tcx.hir().body(body).value
&& let ExprKind::Block(block, _) = block.kind
&& let Some(final_block_expr) = block.expr
&& final_block_expr.hir_id == expr.hir_id
{
expr
} else {
return false;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a use for find_all_ret_expressions

@Alexendoo
Copy link
Member

r? @Jarcho for the MIR stuff

@rustbot rustbot assigned Jarcho and unassigned Alexendoo Sep 1, 2023
@Jarcho
Copy link
Contributor

Jarcho commented Sep 3, 2023

Linting uses of as_ptr requires tracking values and not just places. Some simple examples of valid things:

let x = some_vec.as_ptr();
GLOBAL.lock().some_field = some_vec;
return x;

This can be detected by noting some_vec is never dropped and therefore the pointer is still valid.

let x = some_vec.as_ptr();
swap(&mut GLOBAL.lock().some_field, &mut some_vec);
return x;

This is basically the same, except some_vec is still dropped.

Cases where the local isn't used after the pointer is created can still be linted (e.g. return some_vec.as_ptr()). Still has some use, but it needs to be very conservative.

@GuillaumeGomez
Copy link
Member

Ping @Centri3. I'd love very much to have this lint. Do you need help?

@Centri3
Copy link
Member Author

Centri3 commented Jan 21, 2024

I don't have the time nor energy to work on this. I'd love to finish this but I can't remember what needed fixing or anything about this PR for that matter 😅 If you can take a look that'd be much appreciated, however I should be back soon-ish

@xFrednet
Copy link
Member

I'm closing this according to the last comment. Thank you for the time, you already put into this lint!

@GuillaumeGomez, since you expressed interest in a previous comment, you're welcome to pick this PR up :)

@rustbot label +S-inactive-closed -S-waiting-on-author -S-waiting-on-review

@rustbot rustbot added S-inactive-closed Status: Closed due to inactivity and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Mar 29, 2024
@xFrednet xFrednet closed this Mar 29, 2024
@Centri3 Centri3 deleted the ptr_to_temporary branch April 1, 2024 16:10
@Centri3
Copy link
Member Author

Centri3 commented Apr 1, 2024

If anybody tries to continue this, it will need a major rewrite. The current way of traversing backwards is... terse, verbose, and it shouldn't be done this way. There's rustc_mir_dataflow for MIR analysis like this (wish I knew that before), so a full rewrite basically, but the main premise of this will carry over at least.

Since Drop terminators aren't emitted if there isn't drop glue, we (unfortunately) can't use that which would work great if it was present. If it is (so it has a Drop implementation), then we can check IIRC whether a StorageDead is beyond it, or smth. Otherwise StorageDeads can be in any order afaik and that doesn't mean it's UB, since it should just be a hint to the optimizer?

So yeah, we need a bad heuristic for this lint. Ideally there'd be more info, and maybe we can have smth like unleash_the_miri_inside_of_you but for clippy instead in the future? I would love having more stable MIR lints...

The way MIRI detects this ofc only works if it's actually used after being dropped (that meaning, an access after a StorageDead i.e. being dropped) - so we can't do that ig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive-closed Status: Closed due to inactivity
Projects
None yet
9 participants