-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't cast thread name to an integer for prctl #95626
Conversation
libc::prctl and the prctl definitions in glibc, musl, and the kernel headers are C variadic functions. Therefore, all the arguments (except for the first) are untyped. It is only the Linux man page which says that prctl takes 4 unsigned long arguments. I have no idea why it says this. In any case, the upshot is that we don't need to cast the pointer to an integer and confuse Miri.
r? @dtolnay (rust-highfive has picked a reviewer for you, use r? to override) |
My understanding is that variadic functions are special and all scalar types are passed in a consistent way. But I don't actually properly understand what is going on. Some links from the last time this came up:
We ended up ensuring that all variadic arguments of |
I would like to note that in #95026 which we will eventually land (plausible delays notwithstanding) that we will be capable of simply eliminating this particular call from the codebase by using the pthread abstraction instead. |
Fair. But it might still be worth figuring out what we have to do with variadic functions... |
I don't have all the context here, but I did just do my best to read over the comments in all the PRs you linked. It sounds to me like the previous discussion has interleaved a lot of what i686/x86_64-linux-gnu do and what C guarantees. I can't tell if this is with good reason or not, but it makes me uncomfortable. So I'm going to lay out how I understand the situation, and maybe this will help clarify things, or other people can straighten me out if I misunderstand. On i686/x86_64-linux-gnu, you get particular behavior due to the calling convention. The first 6 integer or pointer arguments are passed in registers, which means there might as well be an In C however, the callee varargs API is that you declare a The reason I'm bringing this up is that due to the calling convention above, on x86_64 you can pass a bunch of nonnegative I saw that the other PRs were about syscalls and (old?) Miri. I've read over Miri's shims, but I don't understand how a system calling convention could be relevant there. It probably doesn't matter a whole lot because Miri is solidly an interpreter, it's not intercepting system calls after a |
The goal was for Miri to check that the caller passes all arguments with the right type, or at least the right size. Basically, if it is accepted by Miri, it should also work in the real program. |
FWIW, even when ignoring platform-specific properties, it looks like extra trailing arguments are fine -- the number of But of course we don't know how often |
Yes, trailing arguments ought to be ignored. Which is why I'm uncomfortable with the fact that they are in here, because for some other |
We should add a property computed in e.g. AFAIK SysV x86_64 does have this compatibility, at the cost of their rust/library/core/src/ffi/mod.rs Lines 295 to 301 in ac4b345
Some other ABIs that offer C variadic compatibility are similar in complexity (e.g. PowerPC and AArch64's AAPCS, at least AFAICT from But it is important to keep track of when this isn't the case. I believe I saw some 32-bit ARM stuff that was incompatible only for floats (integers in variadic arguments would still use registers), and more recently Apple's non-standard (i.e. not AAPCS) AArch64 ABI has always-on-stack variadics, with their documentation explicitly calling it out as a compatibility hazard. IMO C should've mandated one of:
|
Yeah, it should, but it didn't... so what do we do on the Rust side?^^ We have to match whether the function is variadic or not in the platform (which Miri currently doesn't check, i.e., it basically assumes "full compatibility between variadic and non-variadic"), and we have to match the types use by the |
I don't know what you mean by compatibility between variadic and non-variadic. If a function accepts two In terms of matching types, yes this is just a disaster in C. In many places, the only documentation that exists simply says to pass the result of some macro expansion. But due to ABI requirements, what type that macro expands to should be stable for a target triple. And it is probably reasonably stable for just architecture and operating system. It's just quite unfortunate that the only "documentation" is the headers. |
Compatible: the following code "works", as do its moral equivalents like "casting the variadic
int foo(int a, ...) {
va_list args;
va_start(args, a);
int b = va_arg(args, int);
va_end(args);
return a + b;
}
int foo(int, int);
#include <stdio.h>
#include "foo.h"
int main() {
printf("%d", foo(40, 2));
} In other words, when they're "compatible", the caller doesn't need to know that the callee is variadic to call it - it just calls it as if it was a function with exactly the callsite's signature. When they're "incompatible", then the caller must know that the callee is variadic, and handle it appropriately. |
@eternaleye I'm not sure it's allowed to do that in C. @eddyb C purpose have always to be as portable as possible (in 1980 it was :p), thus blaming C serve no purpose
The libc crates clearly use variadic https://docs.rs/libc/0.2.121/libc/fn.prctl.html
Normally we can remove them but I would let them cause the code suggest they check all arg every-time. It's very unclear what they are doing https://github.com/torvalds/linux/blob/7403e6d8263937dea206dd201fed1ceed190ca18/kernel/sys.c#L2342 For the type according to doc:
So we need a
@RalfJung I just don't understand why they did this, I suggest we contact them. It's not the first time people are confuse https://stackoverflow.com/questions/36551394/correct-way-to-use-prctl. Who like mailing list ? |
There's a lot of C in the world that's non-conformant but "works in practice". That's why Apple had to be careful to call out the fact that their ABI makes them incompatible - whether it's "allowed" or not, people do it. It also complicates the lives of any third-party compilers that need to link to C - the situation Rust is in - because it means that tricks based on writing the more-friendly signature (e.g. using |
This comment was marked as off-topic.
This comment was marked as off-topic.
I think we should eventually expand the degree to which Miri understand the vagaries of platform behavior. Doing so will probably be gross to implement, but I think that's just the price you pay for validating that you're using each target triple correctly. Or we could punt on this whole problem, like Miri does with threads. We don't catch all UB, but we catch what we can and narrow in over time. Personally, I'd very strongly prefer that we have less false positives. Miri has earned a reputation for having false positives for many reasons including past issues with
As far as I can tell from the man page, this call is fine.
I would be surprised if this is documented at all. So yes, this is not easy to validate, and if we do not have access to the source for the call we can't validate it at all so we'd need some other kind of approach for such calls. For example, we could accept integer or pointer arguments. It might be kind of wobbly, but if that's the interface... 🤷
Feels fair to me. We should be building interfaces where it is possible for people to be confident they are using them correctly, and we shouldn't make excuses. A formal memory model would be a good step in that direction ;) |
That call is weird though, why would it add 2 zeroes but then not also add the third one? If the goal is not to conform to the type from the man page, then why add any zeroes at all? |
FWIW, discussing whom to "blame" is off-topic here. This is for technical discussion, we need to figure out what ABIs/APIs platforms actually provide and what we have to do to be sure that we are calling these operations correctly. Given that the type signatures in the manpage and |
Oh also Cc @rust-lang/libs who might have other (better) ideas for what to do here. :) |
I do not think this will always be possible. These are platform APIs, so in some cases I think "whatever the platform accepts" will end up being the API as best it can be defined. If your code only runs on x86_64-linux-gnu and you're writing an ABI, why would you bother specifying whether this accepts a raw pointer, a pointer-sized integer, or any integer smaller than that? I hope that libs has a better answer.
An honest off-by-one error. |
I checked glibc, musl, and uClibc, and they all blindly read all 4 Glibc is rather blunt about that: /* Unconditionally read all potential arguments. This may pass
garbage values to the kernel, but avoids the need for teaching
glibc the argument counts of individual options (including ones
that are added to the kernel in the future). */ Musl does the same without comment: uClibc varies a little by arch -- avr32 and c6x both use a variadic definition that reads all 4 |
@cuviper thanks for the investigation! I would then argue that
|
Hm, OTOH So maybe fewer arguments are defensible, but the arguments we pass should at least all have the right size? |
One caution here is that |
It's UB on paper and I would argue it's they fault, but I agree I don't see any problem doing that like glibc say it's just garbage, I expect they don't use the arg if not needed, on system where it would cause problem (like system that implement variadic using allocation memory... I don't know one but it's allowed in C standard) I expect code implementation to respect variadic convention. It's funny, they invented the "variadic but in fact no" function call convention. I think people making man page have done they best to find a (strange) middle ground. |
📌 Commit e8a6f53 has been approved by |
We just have to hope nobody ever inlines |
One more for good measure -- Android's int prctl(int, unsigned long, unsigned long, unsigned long, unsigned long) all |
Rollup of 5 pull requests Successful merges: - rust-lang#95185 (Stabilize Stdin::lines.) - rust-lang#95626 (Don't cast thread name to an integer for prctl) - rust-lang#95709 (Improve terse test output.) - rust-lang#95735 (Revert "Mark Location::caller() as #[inline]") - rust-lang#95738 (Switch item-info from div to span) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
libc::prctl
and theprctl
definitions in glibc, musl, and the kernel headers are C variadic functions. Therefore, all the arguments (except for the first) are untyped. It is only the Linux man page which says thatprctl
takes 4unsigned long
arguments. I have no idea why it says this.In any case, the upshot is that we don't need to cast the pointer to an integer and confuse Miri.
But in light of this... what are we doing with those three
0
s? We're passing 3i32
s toprctl
, which doesn't fill me with confidence. The man page saysunsigned long
and all the constants in the linux kernel are macros for expressions of the form1UL << N
. I'm mostly commenting on this because looks a whole lot like some UB that was found in SQLite a few years ago: https://youtu.be/LbzbHWdLAI0?t=1925 that was related to accidentally passing a 32-bit value from a literal0
instead of a pointer-sized value. This happens to work on x86 due to the size of pointers and happens to work on x86_64 due to the calling convention. But also, there is no good reason for an implementation to be looking at those arguments. Some other calls toprctl
require that other arguments be zeroed, but notPR_SET_NAME
... so why are we even passing them?I would prefer to end such questions by either passing 3
libc::c_ulong
, or not passing those at all, but I'm not sure which is better.