Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace C code for reading and parsing /proc/cpuinfo and /proc/self/auxv with Rust code #357

Closed
briansmith opened this issue Nov 19, 2016 · 10 comments

Comments

@briansmith
Copy link
Owner

briansmith commented Nov 19, 2016

cpu-arm-linux.c contains code that determines the CPU capabilities using getauxval() on newer versions of Android and by reading from /proc/cpuinfo or /proc/self/auxv if getauxval() isn't available. This is the last bit of C code that does file I/O and one of the last remaining users of the C standard library in ring.

Regarding the part that parses /proc/cpuinfo, here's an example of the bad CPU that reports NEON is available but where NEON is actually broken, from https://bugs.chromium.org/p/chromium/issues/detail?id=606629#c12:

Processor	: ARMv7 Processor rev 0 (v7l)
processor	: 0
BogoMIPS	: 13.50

processor	: 1
BogoMIPS	: 13.50

Features	: swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 
CPU implementer	: 0x51
CPU architecture: 7
CPU variant	: 0x1
CPU part	: 0x04d
CPU revision	: 0

Hardware	: SAMSUNG M2
Revision	: 000e

Note that the spaces between the field name and the colon are actualy tabs.

We should have a test that verifies that the parser for the above output is correctly recognized as not supporting NEON, and probably other tests. We don't actually need to run the test on one of the buggy phones; instead we can have the test for this run on in all #[cfg(all(test, feature = "use_heap"))] builds on any CPU. In particular, this means that the code that opens the file and parses it needs to be callable from tests that pass in alternative file paths. Then we can put the above output into a test vector file.

@briansmith
Copy link
Owner Author

briansmith commented Nov 19, 2016

See https://rbcommons.com/s/OpenH264/r/383/:

Do full register loads instead of single-lane loads in DeblockLumaEq4H_neon

Instead of loading the registers one lane at a time, load full
registers and then transpose them.

This is faster, reducing the runtime for the function from about
506 cycles to 434 cycles (tested on a Cortex A8).

This also avoids an issue which seems like a cpu bug, present
on Sony Xperia T (cpu implementer 0x51 architecture 7 variant 0x1
part 0x04d). On such a device, it seemed like the "vswp q9, q10"
could start executing before the previous
vld4.u8 {d20[x],d21[x],d22[x],d23[x]}, [r3], r1
had finished and written back their result. Changing the
"vswp q9, q10" into "vswp q10, q9", or into separate
"vswp d18, d20; vswp d19, d21" (or the other way around) seemed to
avoid the issue. This happened occasionally (a couple times per
100000 invocations or so).

It might be the case that we can work around the bug another way. I think we should just delete this file I/O code and pursue other, simpler, avenues of workarounds.

@briansmith
Copy link
Owner Author

OK, it looks like getauxval() isn't available until KitKat (Android 4.4) and there are still about 25% of devices in use that use older versions. So it is indeed worthwhile to port the BoringSSL code that reads from /proc/cpuinfo and /proc/self/auxv to (idiomatic) Rust.

@marshallpierce
Copy link
Contributor

I'm happy to tackle the i/o and parsing. Is the end goal to set GFp_armcap_P in pure Rust the way GFp_cpuid_setup does, or instead should there be a Rust > C > Rust callstack with just the I/O bits in Rust?

@briansmith
Copy link
Owner Author

I'm happy to tackle the i/o and parsing.

Awesome!

Is the end goal to set GFp_armcap_P in pure Rust the way GFp_cpuid_setup does, or instead should there be a Rust > C > Rust callstack with just the I/O bits in Rust?

The end goal is to do as much of this in Rust that we can do in a reasonable way without using unstable features. The most immediate goal is to get rid of the use of non-Rust I/O and non-Rust memory management, in particular the STRING_PIECE stuff.

@briansmith
Copy link
Owner Author

As I mentioned in the current PR., I think we should just implement getauxval() ourselves, by reading past the end of environ, which is exactly what the initialization code for getauxval() does on Android and what Android was doing for other reasons even prior to implementing getauxval(). See:

If we implement an analog in Rust then we can avoid weak linkage (which Rust doesn't support, forcing us to use a C wrapper), and we can avoid problems such as https://code.google.com/p/android/issues/detail?id=161012.

@briansmith
Copy link
Owner Author

The patches to actualy set AT_HWCAP2 to a useful value on ARM are at torvalds/linux@8258a98 and torvalds/linux@a092aed. Not sure what versions of Android got those, but presumably that happened after the Nexus 6P was released, according to https://bugs.chromium.org/p/boringssl/issues/detail?id=46#c3.

See also https://lkml.org/lkml/2013/12/23/162.

@briansmith
Copy link
Owner Author

As I mentioned in the current PR., I think we should just implement getauxval() ourselves, by reading past the end of environ

Of course, that doesn't work because it isn't safe (thread-safe or otherwise) due to the setenv fiasco.

@marshallpierce
Copy link
Contributor

The deal-breaker for walking the stack from environ is that a putenv entirely breaks it. See https://bitbucket.org/marshallpierce/rust-auxv/src/7e33cf2186ba66d354d434a7e9cf51bd014708b7/examples/elf_stack_broken_modified_environ.rs?at=master&fileviewer=file-view-default. Reading stale data from racing with someone setenv'ing an existing environment variable wouldn't be so bad; we'd still read a non-zero value and progress correctly. But if environ is wrong there's no way to recover from that since we can't get access to the libc-internal pointer to auxv that it gets from the kernel.

@briansmith
Copy link
Owner Author

@marshallpierce Yes, I agree and that's what I was referring to when I referenced the setenv fiasco.

I did some more research, and now I'm fine with just completely skipping the fallback for reading from /proc/self/auxv and /proc/cpuinfo. Instead let's just use the non-NEON code when getauxval isn't available, and when /proc/cpuinfo says we're running on that broken CPU.

For Android ARM builds

getauxval() was added to the official API in API level 20. However, it is available via weak/dynamic linking since Jelly Bean: either 4.1.x (API level 16), 4.2.x (API level 17) or 4.3.x (API level 18). According to http://androidxref.com/4.3_r2.1/search?q=&defs=getauxval&refs=&path=&hist=&project=bionic, it is in 4.3 (API level 18). However, based on similar searches for older versions, e.g. 4.2.2_r1, 4.3 is the first version it is in.

According to https://developer.android.com/about/dashboards/index.html, 88% of devices have Android version 4.3 or later. That number is going to grow over time.

For non-Android ARM builds

For non-Android ARM builds, we should simply require that libc implements getauxval(), which means glibc version 2.16 or later, or a recent musl. Note that glibc doesn't report the right error code until glibc 2.19.

@briansmith
Copy link
Owner Author

Closing this. As of ring 0.14 we require getauxval() to be available and working for all Linux and Android targets, and we've removed all the /proc/cpuinfo and /proc/self/auxv parsing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants