-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cpu feature detection #392
Cpu feature detection #392
Conversation
link it into place when actually running on an arm system.
Oh, and I've had to liberally apply |
1 similar comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I didn't review this carefully, but I do have some suggestions for re-organizing it that will help with the next review. Basically, I think we should move all the platform-specific code out of ring::cpu_features
and into its submodules.
IMO, it makes sense to have the auxv
and cpuinfo submodules
either merged into the arm_linux
submodule (my preference), or as submodules of that submodule, because we only use those on ARM Linux. (This code is actually originally derived from AOSP's cpufeatures library, and it consists mostly of workarounds for bugs in Android's auxv support.)
If you need more samples of /proc/cpuinfo for ARM Android, I suggest that you look http://forum.xda-developers.com/showthread.php?t=2499338 and similar places to find /proc/cpuinfo for various phones.
use std::collections::HashSet; | ||
use std::string::{String, ToString}; | ||
|
||
use super::libc::c_ulong; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, let's just define a ulong
or unsigned_long
type in the c
submodule. It should be just like the recently-added c::long
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're not worried about maintaining the platform/bitwidth mapping, sounds good to me!
} | ||
|
||
mod auxv; | ||
mod cpuinfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that you wrap these in #[cfg(any(test, all(any(target_arch = "arm", target_arch = "aarch64")), target_os="linux"))]
. That way we can run the tests on all platforms but avoid dead code warnings on non-ARM-Linux platforms in non-test builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, with a slightly different cfg
; I think the parentheses didn't nest maybe the way you intended? The target_os
bit ended up excluded from the all
.
} | ||
|
||
let mut reader = &buf[..]; | ||
// TODO determine length to read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this TODO mean?
I think you can/must use cfg(target_pointer_size = "64")
and cfg(target_pointer_size = "32")
to decide whether to do a 32-bit or 64-bit read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's referring to the fact that I don't know at runtime whether to call read_u64 or read_u32.
According to libc's implementation of c_ulong
, solaris is always 64 bits, so using pointer width won't work in that case. Same for c_long
, by the way -- should we be adding this wrinkle to the declarations in c.rs
? Redox OS is also always using 64 bits for longs, and that at least used to be 32-bit only OS.
Is there a way we could express via a feature what the width of ulong
is? That way we could have two method declarations, one that invokes read_u64
and one with read_u32
, that are toggled with the feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The important thing is that the auxv code is only used in ARM Linux and we know the ARM Linux ABIs. Therefore, I suggest that instead of using/defining a generic "unsigned long" type in ring::c
, we “simply” use something #[cfg(all(target_pointer_width = "32", any(test, all(target_os = "linux", any(target_arch = "arm", target_arch = "aarch64")))))]
and #[cfg(all(target_pointer_width = "64", any(test, all(target_os = "linux", any(target_arch = "arm", target_arch = "aarch64")))))]
to define a type alias within this module. The idea is that when you run the tests on any 32-bit target, the 32-bit ARM variant gets used, and when you run the tests on a 64-bit target, the 64-bit AAarch64 variants get used, even when those types don't match the target platforms definition of "unsigned long". Does that make sense?
(The main reason we want to support running these tests on non-ARM-Linux platforms is to make them easy to maintain for people that don't have access to such targets.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It ended up being slightly more complicated because even on a 32-bit platform we need the tests to use 64 bits because that's what the test auxv files use. Fortunately, it only incurred a minor syntactic burden in a few places.
let found_aux_type = reader.read_u64::<NativeEndian>() | ||
.map_err(|_| AuxValError::InvalidFormat)?; | ||
let aux_val = reader.read_u64::<NativeEndian>() | ||
.map_err(|_| AuxValError::InvalidFormat)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How terrible would it be to replace these two lines with lines that don't require the byteorder
crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends how much it is problematic to depend on byteorder... :)
At first glance, https://github.com/BurntSushi/byteorder/blob/master/src/lib.rs is not enormous, so we could strip that down to only support the 32 bit and 64 bit cases, and then do some manual buffering rather than use their Read
extension. Several hundred lines in total, assuming we maintain the tests. It certainly can be done, but if we keep anything like the same API it will mean keeping most of that library as an internal copy.
If avoiding this particular dependency is of non-trivial importance, I can trace through the byteorder code and see if there's a more svelte way to extract what we need that would be more svelte.
#[cfg(all(target_os = "linux", target_arch = "arm"))] | ||
#[allow(dead_code)] // TODO | ||
fn search_procfs_auxv(auxv_types: &[c_ulong]) -> Result<AuxVals, AuxValError> { | ||
let mut auxv = File::open("/proc/self/auxv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that you merge all this logic into auxv::search_auxv
. In particular, that function should take the path to the file to open, and it should open and read the file itself. Then that function would be used by all the tests, and the arm submodule would simply call auxv::search_auxv("/proc/self/auxv", <auxv_types>)
.
Doing this would move all the platform-specific code from the cross-platform cpu_features
module to the target-specific submodule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Accepting a Read
was a misguided attempt to aid testability; it turned out that I never wanted to stitch together blocks of 16 bytes by hand...
#[cfg(all(target_os = "linux", target_arch = "arm"))] | ||
#[allow(non_snake_case)] | ||
extern "C" fn GFp_cpuid_setup() { | ||
if let Ok(r) = File::open("/proc/cpuinfo").map(|f| BufReader::new(f)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the suggestion below. All of this code is cross-platform except the reference to /proc/cpuinfo, so we should move it all to the cross-platform cpuinfo subcrate, possibly having parse_cpuinfo
open the file and read it itself.
// TODO is this used in practice? Its C equivalent is the only user of the | ||
// global, and the function name has no uses | ||
#[cfg(all(target_os = "linux", target_arch = "arm"))] | ||
#[allow(non_snake_case)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In BoringSSL, it is probably used by Google Chrome in order to help decide whether to prefer AES-GCM or ChaCha20-Poly1305 in TLS. We might do something similar later, but for now it is OK to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, removed.
use self::arm::arm_cpuid_setup; | ||
|
||
#[cfg(all(target_os = "linux", target_arch = "arm"))] | ||
static HAS_BROKEN_NEON: bool = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're going to remove this (see below), but note that this would need to be static mut
so that it can be properly set.
#[cfg(all(target_os = "linux", target_arch = "arm"))] | ||
use self::auxv::{AuxValError, AuxVals}; | ||
#[cfg(all(target_os = "linux", target_arch = "arm"))] | ||
use self::cpuinfo::CpuInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the above use
s can be moved to the arm
submodule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; cpu_feature.rs
is now just a place to host arm_linux
.
|
||
mod auxv; | ||
mod cpuinfo; | ||
mod arm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to name this submodule arm_linux
. We'll support many non-Linux ARM targets that won't have /proc/cpuinfo or /proc/self/aux, and we should reserve the name arm
for those non-Linux platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
because of name clashes with the extern vars; needed to make the C macro a little fancier to allow specifying a identifier-safe name for multi-word types.
…s reads from a file
Thanks for the feedback; I think I've addressed everything in round 1. I originally had I don't think I know of any other concrete cpuinfo scenarios that need tests, but I'm happy to add more if you know of other problematic cases. |
|
||
// TODO link to getauxval weakly and use that if available. In particular, | ||
// this is needed by the assumption in hwcap_from_cpuinfo that future architectures | ||
// will have a working getauxval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For getauxval()
, I recommend just defining a wrapper around it in C code that returns an error code if the function isn't available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I'm not particularly confident if I put the C bits in the best possible place, though.
little endian 64bit but let main code run in native endianness and type width.
I'd prefer that we make some 32-bit ARM auxv files and use them on 32-bit platforms, and similarly get some 64-bit Aarch64 auxv files and use them on 64-bit platforms, and not test 64-bit files on 32-bit and vice versa. We can probably do a bit of crowdsourcing of you're having trouble getting such files. |
Do you want me to scrap the |
|
||
/* |getauxval| is not available on Android until API level 20. Link it as a weak | ||
* symbol and use other methods as fallback. */ | ||
unsigned long getauxval(unsigned long type) __attribute__((weak)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out how to make weak linkage work on macOS's compiler toolchain, so this and all tests that hit the C bits are linux-only unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK. These are really Linux-specific and ARM-specific anyway, AFAICT.
However, let's just put this declaration in cpu-arm-linux.c. There's no reason for it to be in a header file, especially these "public" header files.
@@ -0,0 +1,19 @@ | |||
#include <openssl/cpu.h> | |||
|
|||
#ifdef __linux__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internet claims this is the most widely portable flavor of #define
for linux. Unfortunately there doesn't seem to be a "only glibc" define, but even if we had that we'd still have to deal with the weak linkage anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems right to me. However, I'd prefer to keep this in the cpu-arm-linux.c file, since it really is specific to ARM Linux.
let mut hwcap2 = AuxvUnsignedLongNative::from(0_u32); | ||
if let Some(v) = getauxval_provider.getauxval(auxval_types.AT_HWCAP2) { | ||
hwcap2 = v; | ||
} else if let Some(v) = procfs_auxv.get(&auxval_types.AT_HWCAP2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original C did not check procfs auxv for HWCAP2
if getauxval
was unavailable. I can't imagine why we wouldn't check that, so I put it in, but maybe you can think of why that would be undesirable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have asked somebody at Google about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work.
Unfortunately, this has become very difficult to review because it's adding 1,543 lines of code and test data to replace 362 lines of code. I am hoping that the Rust code can be made simpler according to my inline suggestions.
Even with those simplifications, I think the review is going to be difficult. I suggest that you split this up into multiple PRs, like so:
-
Remove alll the code in cpu-arm-linux.c,. Add the code for
getauxval_wrapper()
into cpu-arm-linux.c. -
Add the Rust code for using
getauxval_wrapper()
, without the code for falling back to /proc/self/auxv or /proc/cpuinfo. -
Add the code for falling back to /proc/self/auxv.
-
Add the code for detecting broken NEON using /proc/cpuinfo.
-
Add the code for falling back to /proc/cpuinfo when
getauxval()
and /proc/self/auxv don't work.
We can then do each PR one at a time, and I'll land them on a branch until all 5 are done. Then I'll merge that branch into master.
I realize that is probably a lot of work, but I think it's the best way to ensure the code gets reviewed properly. I'm open to alternative suggestions, however.
#include <openssl/cpu.h> | ||
|
||
#ifdef __linux__ | ||
unsigned long getauxval_wrapper(unsigned long type, char *success) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use this signature:
int getauxval_wrapper(unsigned long type, unsigned long *out_value) {
The idea is that success/failure is indicated by the return value and the output value is made an out parameter. This is consistent with other stuff in ring and BoringSSL.
I suggest we document this function's return value: It returns 1 if getauxval()
is available and 0 if getauxval()
is not available. If getauxval()
is available, it also sets out_value
to the result of getauxval(type)
. That result might be zero, which indicates that getauxval(type)
didn't recognize type
.
I think it's also good to document that this is a workaround for the fact that weak symbols aren't supported well in (stable) Rust.
} | ||
|
||
unsigned long result = getauxval(type); | ||
if (errno != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, it isn't legal to check errno
unless a function returns an error code first. I suggest that we just remove this if(...) { ... }
completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is weird, and felt wrong even with my pretty rusty C. Unfortunately, 0 is a valid value, so the original getauxval
API was simply broken in that case... In the manpage, they have this:
BUGS
Before the addition of the ENOENT error in glibc 2.19, there was no way to unambiguously distinguish the case where type could not be found from the case where the value corresponding to type was zero.
This is the implementation in glibc's misc/auxval.c
:
unsigned long int
__getauxval (unsigned long int type)
{
#ifdef HAVE_AUX_VECTOR
ElfW(auxv_t) *p;
#endif
if (type == AT_HWCAP)
return GLRO(dl_hwcap);
else if (type == AT_HWCAP2)
return GLRO(dl_hwcap2);
#ifdef HAVE_AUX_VECTOR
for (p = GLRO(dl_auxv); p->a_type != AT_NULL; p++)
if (p->a_type == type)
return p->a_un.a_val;
#endif
__set_errno (ENOENT);
return 0;
}
They also have elf/tst-auxv.c
:
static int
do_test (int argc, char *argv[])
{
errno = 0;
const char *execfn = (const char *) getauxval (AT_NULL);
if (errno != ENOENT)
{
printf ("errno is %d rather than %d (ENOENT) on failure\n", errno,
ENOENT);
return 1;
}
if (execfn != NULL)
{
printf ("getauxval return value is nonzero on failure\n");
return 1;
}
errno = 0;
execfn = (const char *) getauxval (AT_EXECFN);
if (execfn == NULL)
{
printf ("No AT_EXECFN found, AT_EXECFN test skipped\n");
return 0;
}
if (errno != 0)
{
printf ("errno erroneously set to %d on success\n", errno);
return 1;
}
if (strcmp (argv[0], execfn) != 0)
{
printf ("Mismatch: argv[0]: %s vs. AT_EXECFN: %s\n", argv[0], execfn);
return 1;
}
return 0;
}
I took this to mean we should check errno. https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=6619179 seems to agree that while regrettable, this is how it has to be done when every possible return value is potentially valid. Thoughts?
|
||
/* |getauxval| is not available on Android until API level 20. Link it as a weak | ||
* symbol and use other methods as fallback. */ | ||
unsigned long getauxval(unsigned long type) __attribute__((weak)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK. These are really Linux-specific and ARM-specific anyway, AFAICT.
However, let's just put this declaration in cpu-arm-linux.c. There's no reason for it to be in a header file, especially these "public" header files.
unsigned long getauxval(unsigned long type) __attribute__((weak)); | ||
|
||
#include <errno.h> | ||
unsigned long getauxval_wrapper(unsigned long type, char *success); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Don't include errno.h here. Instead include it in the .c file.
- The convention we've established in ring is that any function that is only called by Rust code should have the declaration in the C file, like this:
/* Prevent -Wmissing-prototypes warnings. */
unsigned long getauxval_wrapper(unsigned long type, char *success);
So, basically, move these to cpu-arm-linux.c.
@@ -91,7 +91,6 @@ define_type!(long, i32, test_long_metrics, GFp_long_align, GFp_long_size, | |||
define_type!(long, i64, test_long_metrics, GFp_long_align, GFp_long_size, | |||
"The C `long` type. Equivalent to `libc::c_long`."); | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a spurious change that should be undone or done in a separate commit specific to fixing formatting.
#[cfg(all(any(target_arch = "arm", target_arch = "aarch64"), | ||
target_os="linux"))] | ||
#[allow(non_snake_case)] | ||
extern "C" fn GFp_cpuid_setup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to make this a non-extern function and use the normal naming convention, and instead change the caller to call either this, or GFp_cpuid_setup(), depending on platform.
let auxv_types: AuxvTypes<AuxvUnsignedLongNative> = AuxvTypes::new(); | ||
let hwcap_features: ArmHwcapFeatures<AuxvUnsignedLongNative> | ||
= ArmHwcapFeatures::new(); | ||
if let Ok(c) = parse_cpuinfo() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please structure this as something like:
let cpu_info = match parse_cpuinfo() {
Ok(c) -> c;
Err(_) -> { return; }
};
= ArmHwcapFeatures::new(); | ||
if let Ok(c) = parse_cpuinfo() { | ||
// if we can't load procfs auxv, just let it be empty | ||
let procfs_auxv = match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong. If getauxval()
is available, then we should never read from /proc/self/auxv. But this seems to always read from /proc/self/auxv.
return armcap; | ||
} | ||
|
||
fn armcap_for_hwcap2<T: AuxvUnsignedLong>(hwcap2: T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This AuxvUnsignedLong
stuff is all too complicated. Let's simplify the code so that it isn't necessary, even if that means that we can only run these tests on 32-bit Linux.
return ret; | ||
} | ||
|
||
fn hwcap_from_cpuinfo<T: AuxvUnsignedLong>(cpuinfo: &CpuInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put this in the cpuinfo
sub-modulue? It seems like it makes sense for cpuinfo
to expose 4 functions:
- Read CPU info
- given that CPU info, determine if NEON is broken.
- given that GPU info, calculate HW_CAPS
- given that GPU info, calculate HW_CAPS2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on what you expect as a reader from cpuinfo
, I guess -- I kept that module completely hardware agnostic. At least in its current form, it only exists to put a pretty face on the format of /proc/cpuinfo
(just as auxv.rs is only a nice way to access the aux vector), and all arm-specific decisions based on that data are elsewhere. For context, my instinct with this sort of thing is to try to make things general purpose so as to avoid leaking concerns across layers unless there are performance concerns or something like that that would preclude such a design. As it is, you could hoist cpuinfo.rs pretty much as is into its own crate and it wouldn't leak any of its arm-feature-detection heritage, and that feels right to me, but I'm not tied to that structure if it bugs you.
Thanks for the detailed feedback. I certainly understand that this PR has become quite hefty, so I'll split it up. I'll leave this one open for a bit so I can continue to follow up on existing comments, but I'll close it once that's done. |
I'm closing this in favor of #408 and whatever follow-up PRs come from that. |
As per #357. It looked feasible to re-implement basically all of the offending C file, so I did so.
I've sprinkled TODOs where there are things still to resolve, but the main things I'd like your input on are:
getauxval
: In C, it's linked weakly and used if it's available at runtime. I couldn't find how to do this in Rust. Is there a way, or is the best option to leave a weak linkage in C with a little helper function that we call from Rust to then invokegetauxval
if available and otherwise return an error?cpu_feature.rs
is just a guess at how things might look.GFp_has_broken_NEON
yields no usages. Can we ditch that function and its underlyingg_has_broken_neon
global? (GFp_armcap_P
is used all over, so the logic is still used to populate that.)getauxval
(and/proc/self/auxv
) are defined in terms ofunsigned long
. To get things working I usedlibc
'sc_ulong
, but IIRC you're trying to staylibc
-free. How should I representunsigned long
in its many varieties since Replacering::c
with use of the upcoming standard ctypes crate #380 is still extant?