Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal target feature unsafe #2212

Closed

Conversation

hsivonen
Copy link
Member

@hsivonen hsivonen commented Nov 9, 2017

RFC 2045 defines [#target_feature] as applying only to unsafe functions. This RFC allows [#target_feature] to apply safe functions outside trait implementations but makes it unsafe to call a function that has instruction set extensions enabled that the caller doesn't have enabled (even if the callee isn't marked unsafe). Taking a function pointer to a safe function that has [#target_feature] is prohibited.

[#target_feature] applying only to functions that are declared unsafe makes Rust's safe/unsafe distinction less useful, because it causes unnecessarily many things to become unsafe. Specifically, it causes operations that depend on instruction set extensions, such as SIMD operations to become unsafe wholesale and it causes the entire part of the program that uses instruction set extensions (compared to the program-wide baseline) to become unsafe.

Worse, the logic that causes instruction set extension-using operation like SIMD operations to become unsafe (they might execute UB unless the programmer has properly checked at run time that the instruction set extension is supported but the host CPU) means that it's impossible to create efficient safe abstraction over the unsafe operations without cheating about the notion of safety.

Rendered

@hsivonen hsivonen mentioned this pull request Nov 9, 2017
@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 9, 2017

Worse, the logic that causes instruction set extension-using operation like SIMD operations to become unsafe (they might execute UB unless the programmer has properly checked at run time that the instruction set extension is supported but the host CPU) means that it's impossible to create efficient safe abstraction over the unsafe operations without cheating about the notion of safety.

The target feature RFC provides:

  • if cfg!(target_feature = "...") { ... }: a compile-time check with zero run-time cost that allows building safe abstractions
  • if cfg_target_feature!("...") { ... }: a zero-cost abstraction that does a run-time check and allows building safe abstractions

So maybe could you explain what you mean by "it's impossible to create efficient safe abstraction over the unsafe operations without cheating about the notion of safety."?

Taking a function pointer to a safe function that has [#target_feature] is prohibited.

Does this mean that if you write #[target_feature] fn function you cannot take a function pointer to it?

A very common SIMD idiom is to, on program initialization, perform run-time feature detection, and set a global function pointer to the most efficient implementation of some algorithm.

The target feature proposal allows this. If this proposal disallows it that would create an ecosystem split (e.g. I can't use some algorithm because somebody didn't wanted to annotate a target feature function with unsafe).

@hsivonen
Copy link
Member Author

hsivonen commented Nov 9, 2017

So maybe could you explain what you mean by "it's impossible to create efficient safe abstraction over the unsafe operations without cheating about the notion of safety."?

I mean that it's not possible to safely wrap individual operations without also putting the check inside the wrapper, which isn't efficient. I'll amend the PR.

Does this mean that if you write #[target_feature] fn function you cannot take a function pointer to it?

Yes, but you could still write #[target_feature] unsafe fn and take a function pointer to it.

A very common SIMD idiom is to, on program initialization, perform run-time feature detection, and set a global function pointer to the most efficient implementation of some algorithm.

You'd have to make the entry point to the algorithm unsafe as is the case without this RFC.

A function pointer erases information about set of instruction set extensions, so the set of instruction set extensions of the callee and the caller can't be compared statically by the compiler when a function is called through a function pointer.

@hsivonen
Copy link
Member Author

hsivonen commented Nov 9, 2017

I guess, alternatively, taking a function pointer to a safe function that has #[target_feature] could be allowed in an unsafe block.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 9, 2017

I mean that it's not possible to safely wrap individual operations without also putting the check inside the wrapper, which isn't efficient. I'll amend the PR.

How isn't cfg!(target_feature) efficient? It is a compile-time check and has zero run-time cost.

I guess, alternatively, taking a function pointer to a safe function that has #[target_feature] could be allowed in an unsafe block.

Another alternative might be to just say that function pointers to "safe" #[target_feature] functions are unsafe:

#[target_feautre = "sse4.2"] fn bar() -> (); 
let a: fn()->() = bar; // FAILS
let b: unsafe fn() -> () = bar; // OK

@BurntSushi BurntSushi added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Nov 9, 2017
@BurntSushi
Copy link
Member

BurntSushi commented Nov 9, 2017

I don't quite have the compiler/language chops to evaluate this properly, but if we can get away with it, then this would be great! I have a couple of small comments:

However, the precedent in Rust is to use unsafe for all kinds of unsafe instead of having a taxonomy of different checks that unsafe waives.

Can you enumerate the different "kinds of unsafe" that Rust has today? As I understand it, they all boil down to UB (which is consistent with the "type of unsafety" this RFC is trying to prevent).

This complicates the notion of unsafe a bit by requiring the caller context to be designated as unsafe in a case where the callee isn't declared unsafe.

I think it's more galactic then that. I'm pretty sure this qualifies as a new unsafe superpower, since none of the four existing superpowers cover this RFC's use of unsafe.

cc @rust-lang/libs @rust-lang/lang @rust-lang/compiler

@hsivonen
Copy link
Member Author

hsivonen commented Nov 9, 2017

@gnzlbg :

How isn't cfg!(target_feature) efficient?

It is, but I don't see how it can simultaneously be true that (in the absence of this RFC) the vendor operations have to be unsafe and cfg!(target_feature) can make them safe at no cost.

I think of it this way:

The proposal to stabilize vendor operations currently says that they'd all be unsafe. The reason why they are unsafe is that under the target_feature RFC, only unsafe functions are allowed to emit instructions that aren't part of the program-level baseline. The reason why that's the case is that those non-baseline instructions could be UB if the program hasn't already dynamically checked that the host CPU supports them.

Suppose I want to safely abstract over Intel _mm_add_epi8 and ARM vaddq_u8 both of which would be unsafe per above. (Let's, for the purpose of this example, ignore that I really would like to get the abstraction for this particular operation directly from LLVM.) If I could do this statically with cfg!(target_feature) in my wrapper for these, why couldn't the cfg!(target_feature) check go inside the vendor functions for Intel _mm_add_epi8 and ARM vaddq_u8 instead making them themselves not have to be unsafe?

@BurntSushi :

Can you enumerate the different "kinds of unsafe" that Rust has today?

On the language level, there are the four you linked to and, with the target_feature merged, the fifth mentioned below. AFAICT, the rest that exist conceptually (materializing a &str from &[u8] without a run-time check, etc.) reduce to being able to call unsafe functions.

I'm pretty sure this qualifies as a new unsafe superpower, since none of the four existing superpowers cover this RFC's use of unsafe.

I'd argue that the target_feature RFC was the one that introduced a new superpower: To execute instructions that aren't permitted by the program's baseline configuration.

(By precedent of Rust's default 32-bit x86 target enabling SSE2, even though Rust doesn't emit an actual cpuid check at the program entry point, executing the instructions that are permitted by the program's baseline configuration is treated as safe even if there exists a CPU that doesn't support all the instructions in the program's baseline configuration.)

@BurntSushi
Copy link
Member

I'd argue that the target_feature RFC was the one that introduced a new superpower: To execute instructions that aren't permitted by the program's baseline configuration.

While correct, that is orthogonal to my point. My point is that the target_feature RFC does not require any additions to that "unsafe superpower" list. This RFC does.

@hsivonen
Copy link
Member Author

hsivonen commented Nov 9, 2017

My point is that the target_feature RFC does not require any additions to that "unsafe superpower" list. This RFC does.

Good point. Indeed, the target_feature RFC is covered by the point about calling unsafe functions.

But then, dereferencing a raw pointer could be removed from the list by making it a method on pointer types instead of part of the language syntax. That being the case, the distinction between having to have an item on the list vs. being covered by the item about calling unsafe functions is more syntactic than galactic.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 9, 2017

It is, but I don't see how it can simultaneously be true that (in the absence of this RFC) the vendor operations have to be unsafe and cfg!(target_feature) can make them safe at no cost.

I show it below.

Suppose I want to safely abstract over Intel _mm_add_epi8 and ARM vaddq_u8 both of which would be unsafe per above.

The target_feature RFC shows how to write safe wrappers over stdsimd. For example, a zero-cost wrapper can be written like this:

#[inline(always)]
fn generic_add_u8x8(x: u8x8, y: u8x8) -> u8x8 {
  #[cfg(target_feature = "sse2")] { unsafe { _mm_add_epi8(x, y) } }
  #[cfg(target_feature = "neon")] { unsafe { vaddq_u8(x, y) } }
  #[cfg(not(any(target_feature = "sse2", target_feature = "neon")))] {
     // fallback implementation, unimplemented!(), ...
     compile_error!("generic_add_u8x8 needs sse2 or neon");
  }
}

That's a zero cost safe generic_add_u8x8 for sse2 and neon. This is not just a zero-cost abstraction, it is truly zero overhead: it adds no run-time cost whatsoever.

why couldn't the cfg!(target_feature) check go inside the vendor functions for Intel _mm_add_epi8 and ARM vaddq_u8 instead making them themselves not have to be unsafe?

As discussed in the target_feature RFC, a "useful" library would need to provide a fallback implementation instead of failing to compile or panicking on a target without sse2 or neon. A different useful library might even want to delay those checks until run-time, so that if one compiles a binary for all targets that have sse, but the host the binary runs on happens to support sse2, the faster intrinsic is used.

The target_feature RFC and stdsimd satisfy both use cases, it is not their job to make a choice for their users nor to provide fallback implementations of all intrinsics.

So maybe when the RFC states that:

it's impossible to create efficient safe abstraction [using the old RFC] over the unsafe operations without cheating about the notion of safety.

it should elaborate a bit more about exactly what is meant here, e.g., by providing an example of such a safe inefficient abstraction.

Providing users the tool to create safe efficient abstraction is target_feature's and stdsimd's job. If they fail in any way it would be good to know.

@est31
Copy link
Member

est31 commented Nov 14, 2017

Thanks @hsivonen for opening the RFC!

Motivation section

The motivation section and the PR description are a bit negative about the #[target-feature] RFC, especially the unsafe requirement it places on all such functions.

During the discussions for that RFC, I have pushed towards that unsafety requirement because I wanted to have assistance by the language itself to ensure that you don't accidentially call a function with a wider #[target-feature] setting. SIGILL might be "safe" in the sense that it doesn't violate any of the memory safety features of Rust, but it would still affect the developer experience through being a potential run-time footgun. It would mean a hard to debug error as you'd need to have a debugger and can't just set RUST_BACKTRACE, etc. My basic idea was: if it is possible to implement reasoning about #[target-feature] safety into the compiler, why not do it? My initial attempt was to get the minimal target feature RFC do something like what this RFC does: a set of rules of what can be done with target-feature functions and what can't be done safely. This didn't get much positive feedback though, as I think back then there was generally little awareness that preventing SIGILL through safe Rust at compile time is a good idea. Therefore, I shifted my position towards supporting the unsafe proposal that was already floating around.

Later in the discussion it was revealed that different AVR CPUs apparently map the same instructions to different behaviour, so this gave additional support for the unsafe proposal. This is probably the main contribution to why the RFC ended up with the unsafe requirement inside.

Now when arguing for the unsafe requirement, I never wanted it to be the end state. Instead I argued for unsafe so that we can have a second discussion about implementing actual reasoning inside the compiler. If target_feature would have been stabilized without the unsafe requirement, that chance would have been gone :).

Before RFC 2045, it was impossible to write #[target-feature] in stable Rust. I think RFC 2045 should be seen as a step towards a good target-feature situation on Rust, doing the minimal things to get any target-feature support to stable Rust. It would be great if the motivation section could include some statement like this, stating that this RFC is an improvement of RFC 2045.

A new unsafe superpower

Yes, I think the RFC adds a new unsafe superpower. But that is nothing bad! It is much better than shrugging off this potential footgun as something "safe", ignoring that on AVR there is UB involved. If you can implement avoidance of SIGILL inside the compiler, why shouldn't you?

Regarding function pointers

I think one can allow people to take function pointers of #[target-feature] functions, if you somehow represent the target-feature attribute inside the function pointer type. So basically preventing you from doing the following (if you don't like the attribute syntax, propose a better one! This is a bikeshed-able point :p):

#[target-feature(enable = "avx2")]
fn foo(a: u32) -> (u32, u32) { (a, a) }

fn main() {
    let ptr_a: #[target-feature(enable = "avx2")] fn(u32) ->(u32, u32) = foo; // works
    let ptr_b: fn(u32) ->(u32, u32) = foo; // ERROR: mismatched types
}

The proposed way of using this would be through pointer casting:

fn foo_generic(a: u32) -> (u32, u32) { (a, a) }

#[target-feature(enable = "avx2")]
fn foo_avx2(a: u32) -> (u32, u32) { (a, a) }

lazy_static! {
    static ref FOO: fn(u32) -> (u32, u32) = {
        use std::mem::strip_target_feature;
        if runtime_check_that_avx2_available {
            let ptr: #[target-feature(enable = "avx2")] fn(u32) -> (u32, u32) = foo_avx2;
            unsafe { strip_target_feature(ptr) }
        } else {
            foo_generic
        }
    };
}

fn main() {
    FOO(42);
}

You will still be required to say unsafe somewhere but you can defer it to only the function that determines the function pointer!

Why a new function strip_target_feature and not transmute? It is due to the ABI. From what I've heard is that some target feature extensions change the ABI, e.g. how you pass floats. Now the proposal was that the compiler automatically deals with the new ABI when it sees a call to a function with a superset of the currently present target-feature set. However, if we are casting pointers, we can only have one ABI.

So you have to add a new built in function unsafe fn strip_target_feature<U,T>(f: U) -> T {} (please suggest better names) which first creates a wrapper doing the ABI transition around the fn with the target features, and then returns the address to that wrapper. You'll have to tell people to use that function instead of transmute through docs and maybe through a lint.

You could theoretically add such functionality to transmute itself, but this would violate one of the promises of transmute: that it is a trivial byte copy. This gets even more of an issue if the fn pointer is nested deep inside some type you pass to transmute. Also, many people assume that if you can do transmute, you can also do pointer casting.

Also, strip_target_feature would have the following advantages:

  • It would be more convenient to use by not just being able to take function pointers, but also functions itself. fn foo() -> () {} let f: fn(u32)->u32 = transmute(foo) doesn't compile while fn foo() -> () {} let ptr: fn() -> () = foo; let f: fn(u32) -> u32 = transmute(ptr); compiles.
  • It would only work with function pointers and would also check the function parameters so you won't accidentally forget to add some parameter or mix up functions, adding even more safety

A library for dynamic checking could expose the functionality in a safe manner through a macro used like:

target_feature_function_pointers! {
    #[pointers_to(fallback = foo_generic, avx2 = foo_avx2)]
    FOO: fn(u32) -> u32,
    #[pointers_to(fallback = bar, avx2 = bar_avx2, sse = bar_sse)]
    BAR: fn(u32) -> u32,
}

Then you can use FOO(), BAR() inside your code as they were normal function pointers
Internally it would use e.g. the lazy_static! macro.

Another alternative to strip_target_feature would be to make function pointers to a target-feature function point to their wrapper. However that would hide non-trivial (compile-time) functionality inside a coercion and it might mean a slowdown when you call such function pointers from a context where the target-feature is available, as you'd first have to move stuff out of your fast SSE registers in order to call the wrapper then put stuff back in inside the wrapper. Additionally, you would want to make the coercion unsafe and I'm not aware of any coercions that only work in unsafe contexts (only of casts).

Summary

This RFC is an important step in requiring less unsafe in use around #[target-feature]'d functions. It introduces the concept of instruction set extensions that are attached to each function and makes it safe to call functions with only a subset of instruction set extensions.

So it allows you to safely split up your target-feature code into multiple functions. That is really great!

If you fix the function pointers issue e.g. by adopting my proposal from above, you would also enable dynamic dispatch functionality usable from safe code.

@parched
Copy link

parched commented Nov 16, 2017

@gnzlbg Unless I'm mistaken, the improvement this RFC proposes is that you should be able to write

#[inline(always)]
fn generic_add_u8x8(x: u8x8, y: u8x8) -> u8x8 {
  #[cfg(target_feature = "sse2")] { _mm_add_epi8(x, y) }
  #[cfg(target_feature = "neon")] { vaddq_u8(x, y) }
  #[cfg(not(any(target_feature = "sse2", target_feature = "neon")))] {
     // fallback implementation, unimplemented!(), ...
     compile_error!("generic_add_u8x8 needs sse2 or neon");
  }
}

instead. Currently with blanket unsafe, if __mm_add_epi8 actually required SSE3 instead of SSE2, you would get no error from the compiler because you already wrapped it in unsafe, but with this RFC without unsafe you would get an error. I think that's a useful enhancement.

Another alternative might be to just say that function pointers to "safe" #[target_feature] functions are unsafe.

Good idea, I think that should be added to this RFC.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 17, 2017

@parched

the improvement this RFC proposes is that you should be able to write

Yes, that's it, and I think it is a really nice enhancement; it makes the intrinsics easier and safer to use. Performance-wise, however, there should be no difference.

@hsivonen
Copy link
Member Author

hsivonen commented Nov 17, 2017

The target_feature RFC shows how to write safe wrappers over stdsimd. For example, a zero-cost wrapper can be written like this:

#[inline(always)]
fn generic_add_u8x8(x: u8x8, y: u8x8) -> u8x8 {
  #[cfg(target_feature = "sse2")] { unsafe { _mm_add_epi8(x, y) } }
  #[cfg(target_feature = "neon")] { unsafe { vaddq_u8(x, y) } }
  #[cfg(not(any(target_feature = "sse2", target_feature = "neon")))] {
     // fallback implementation, unimplemented!(), ...
     compile_error!("generic_add_u8x8 needs sse2 or neon");
  }
}

The above works as a safe abstraction if sse2 or neon is enabled for the whole program, right? What if this safe abstraction is part of a crate that's supposed to work even when called after a run-time check for sse2 or neon when the whole program doesn't have them enabled? In that case, the above would, it seems to me, yield a compilation error.

With this RFC, I'd expect something like:

#[cfg(target = 'x86')]
#[target_feature(enable = "sse2")]
fn generic_add_u8x16(x: u8x16, y: u8x16) -> u8x16 {
  _mm_add_epi8(x, y)
}

#[cfg(target = 'arm')]
#[target_feature(enable = "neon")]
fn generic_add_u8x16(x: u8x16, y: u8x16) -> u8x16 {
  vaddq_u8(x, y)
}

...which would allow the baseline compilation flags to target 32-bit x86 without SSE2 and ARMv7 without NEON but allow generic_add_u8x16 to be called without unsafe after the program somewhere doing an unsafe transition to an SSE2 or NEON-enabled part of the code.

That is, the safe abstraction would abstract over different SIMD ISAs but not over the presence or absence of SIMD support.

By efficiency/performance, I mean putting a CPUID check inside the safe abstraction so that instead of the application checking CPUID and taking an unsafe action to assert that it has indeed checked that certain ISA features are supported, the safe library has to keep re-checking a CPUID result on a per-library entry-point basis.

(Note: If I don't show up on GitHub for two weeks, it doesn't mean that I've forgotten about this. My apologies in advance.)

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 17, 2017

The above works as a safe abstraction if sse2 or neon is enabled for the whole program, right?

Incorrect, this also works as a safe abstraction if they aren't enabled for the whole program.

What if this safe abstraction is part of a crate that's supposed to work even when called after a run-time check for sse2 or neon when the whole program doesn't have them enabled?

Then one can write this abstraction to use cfg_target_feature! instead of cfg(target_feature), which does compile-time feature detection first, and falls-back to run-time feature detection if the feature is not enabled at compile-time.

In that case, the above would, it seems to me, yield a compilation error.

This depends on which fall-back implementation the library author is interested in providing. It could fail to compile, which is safe, panic, which is also safe, or do something else as long as that is safe. This doesn't fail to compile, is safe, portable, and introduces zero run-time cost:

#[inline(always)]
fn generic_add_u8x8(x: u8x8, y: u8x8) -> u8x8 {
  #[cfg(target_feature = "sse2")] { unsafe { _mm_add_epi8(x, y) } }
  #[cfg(target_feature = "neon")] { unsafe { vaddq_u8(x, y) } }
  #[cfg(not(any(target_feature = "sse2", target_feature = "neon")))] {
    // I could use x + y here, but this is how it works in general: 
    let mut r = u8x8::splat(0);
    for i in 0..8 {
        r.replace(x.extract(i) + y.extract(i), i);
    }
    r
  }
}

This doesn't fail to compile, it is safe, portable, and does feature detection at compile-time first, and if that fails, dispatches at run-time:

#[inline(always)]
fn generic_add_u8x8(x: u8x8, y: u8x8) -> u8x8 {
  #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
     if cfg_target_feature!("sse2") {
        unsafe { _mm_add_epi8(x, y) }
     } 
  }
  #[cfg(any(target_arch = "arm", target_arch = "aarch64"))] {
      if cfg_target_feature!("neon") {
        unsafe { vaddq_u8(x, y) }
      } 
  }
  else { x + y }
}

With this RFC, I'd expect something like:

This expectation looks incorrect to me. At least, with this RFC "as is", calling generic_add_u8x16 in a context without sse2 or neon enabled will fail to compile unless the user writes unsafe { ... } in which case if that code-path is reached on a host that does not support sse2 or neon it will invoke undefined behavior.

By efficiency/performance, I mean putting a CPUID check inside the safe abstraction so that instead of the application checking CPUID and taking an unsafe action to assert that it has indeed checked that certain ISA features are supported

Incorrect, SIMD libraries do not need to do this. They can if they want to, but for example the faster crate does not.

, the safe library has to keep re-checking a CPUID result on a per-library entry-point basis.

This is partially incorrect. One library could do indeed do this, and on every library call, perform the check (which just fetches a cached value), trusting that the compiler would hoist these checks (run-time feature detection is only executed once per program, sub-sequent calls to the run-time feature detection library just read a cached value).

But this is not the only option. The RFC also allows a different library to set some function pointers on initialization and have all its code just blindly call those function pointers, without any checks.


I recommend you to either read the RFC and the associated discussion, or just give stdsimd a try. All of this is already available on nightly Rust.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jan 5, 2018

@hsivonen I just want to ping you that we haven't forgotten about this; during the impl period we were just more busy getting stdsimd ready for inclusion into std. Linking this RFC in an internal forum post thread might help giving it some more exposure.

@hsivonen
Copy link
Member Author

I feel I need this explained to me like I'm five. I think I'm in particular missing something important about cfg_target_feature!.

The above works as a safe abstraction if sse2 or neon is enabled for the whole program, right?

Incorrect, this also works as a safe abstraction if they aren't enabled for the whole program.

If SSE2 isn't enabled for the whole program, can the abstraction both be safe and use SSE2 instructions without having the "is SSE2 available" check within generic_add_u8x8()?

This doesn't fail to compile, is safe, portable, and introduces zero run-time cost:

#[inline(always)]
fn generic_add_u8x8(x: u8x8, y: u8x8) -> u8x8 {
 #[cfg(target_feature = "sse2")] { unsafe { _mm_add_epi8(x, y) } }
 #[cfg(target_feature = "neon")] { unsafe { vaddq_u8(x, y) } }
 #[cfg(not(any(target_feature = "sse2", target_feature = "neon")))] {
   // I could use x + y here, but this is how it works in general: 
   let mut r = u8x8::splat(0);
   for i in 0..8 {
       r.replace(x.extract(i) + y.extract(i), i);
   }
   r
 }
}

If the program as a whole is compiled for i586 without SSE2 but SSE2 is available at run time, which code path does the above take and how does it know to take it?

My goal here is that it would take the first path without a CPUID check at the time of the call while still being safe--the unsafety of the application pledging to the compiler that CPUID has been checked having been at a distance.

This doesn't fail to compile, it is safe, portable, and does feature detection at compile-time first, and if that fails, dispatches at run-time:

#[inline(always)]
fn generic_add_u8x8(x: u8x8, y: u8x8) -> u8x8 {
 #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
    if cfg_target_feature!("sse2") {
       unsafe { _mm_add_epi8(x, y) }
    } 
 }
 #[cfg(any(target_arch = "arm", target_arch = "aarch64"))] {
     if cfg_target_feature!("neon") {
       unsafe { vaddq_u8(x, y) }
     } 
 }
 else { x + y }
}

Does this do a CPUID dispatch on every call to generic_add_u8x8()? If not, by what mechanism not?

@nikomatsakis
Copy link
Contributor

Would it be fair to say that this RFC is moving the point where a target-feature-fn is made unsafe from the fn declaration site to the call site (or reification site, in the case of a function pointer?). If so, I think that makes a lot of sense and I don't immediately see any problems with that setup.

One thing that was not crystal clear to me from skimming the RFC and the summary is what the recommended style would be. I would think that the recommended style would be to use target-feature and declare all of your target-feature functions as "safe". Of course, actually calling them from any context that does not itself contain target-feature would be unsafe (because the callee gets to assume their target-features are available, and it's the caller's job to validate that).

Then you would have wrappers that do the dynamic testing or whatever and use unsafe { foo() } in order to call the "root" function (any target-feature functions that foo() invokes would be safe, presuming it doesn't attempt to use yet more instructions that weren't originally tested).

Is this about right?

If so, is there a reason this design was not pursued in the first place? (I've not been following this debate as closely as perhaps I should've been.)

@gnzlbg
Copy link
Contributor

gnzlbg commented Jan 28, 2018

@hsivonen

If SSE2 isn't enabled for the whole program, can the abstraction both be safe and use SSE2 instructions without having the "is SSE2 available" check within generic_add_u8x8()?

First, generic_add_u8x8 is always safe to call, independently of what the CPU supports at run-time. If the whole program is compiled with SSE2, it will always use SSE2 without any run-time checks. If the whole program is not compiled with SSE2, it is still safe to call this function. If the CPU where the binary runs on supports SSE2, it will use SSE2, and if not, it will use whatever features the whole program supports (e.g. SSE, MMX, etc.). Whether the CPU supports SSE2 in this case is detected once at run-time and cached. This function will always have a branch that checks this cache for this. Since this function can always be inlined, the optimizer can hoist these branch out of multiple functions, loops, etc.

That is, the answer is yes (in general): this function is safe, will use SSE2 at run-time if available (and SSE or something else otherwise), and in most cases you won't have a check per function call because the check will be hoisted in SIMD code.

In some cases, however, you will have the check. First, note that to do something depending on the CPU features enabled at run-time, this check needs to happen somewhere in your program at run-time. Second, it is possible to write a generic_add_u8x8 function that does not have this branch ever, but this has other trade-offs (the target_feature RFC has an example of this: the ifunc proc macro).

The way this works is to do run-time feature detection once (e.g. on binary initialization), and then based on that change globally which function will be called, by either swapping symbols at link-time, or changing the value of a global function pointer. The consequence in both cases is the same: in this approach, the function cannot ever be inlined (because if you inline it then you can't swap symbols/addresses).

The target_feature RFC let's you pursue both approaches.

There is a third approach: to never have the check, which is not safe. The target_feature let's you do this by just opening an unsafe { ... } block.

If the program as a whole is compiled for i586 without SSE2 but SSE2 is available at run time, which code path does the above take and how does it know to take it?

The generic path (without SSE2): cfg(target_feature) does compile-time feature detection, so if SSE2 is not enabled at compile-time, that path won't be taken.

My goal here is that it would take the first path without a CPUID check at the time of the call while still being safe--the unsafety of the application pledging to the compiler that CPUID has been checked having been at a distance.

You can do this by setting a function pointer on binary initialization (using cfg_target_feature! for run-time feature detection), and then just calling the function via the function pointer. The trade-off is that the call via the function pointer can't be inlined.

Does this do a CPUID dispatch on every call to generic_add_u8x8()? If not, by what mechanism not?

The mechanism is the #[inline(always)] annotation. So the answer here is "No, it does not do a CPUID dispatch on every call to generic_add_u8x8, it does it in some cases". For example, if the binary is compiled with SSE2 globally enabled, it never does a CPUID check (it is resolved at compile-time). If not, cpuid is executed only once per binary, the rest of the times a cache is read (testing a bit in an usize). Even then, the cache is not read on every call. The branch reads an immutable global value, so it can be hoisted out of chunks of code that all test the branch at compile-time (otherwise this would be an implementation bug).

@gnzlbg
Copy link
Contributor

gnzlbg commented Jan 28, 2018

@nikomatsakis

If so, is there a reason this design was not pursued in the first place? (I've not been following this debate as closely as perhaps I should've been.)

IIRC it complicated the implementation (which had other issues until recently), the RFC (where we hadn't agreed over fundamental issues like whether calling an intrinsic should be unsafe or not, whether we needed run-time feature detection or not, etc.), and also it was unclear what to do with trait object methods, function pointers, etc. or whether those problems were even worth solving.

Everybody agreed that it would be nicer to only have to write unsafe when you actually needed it, but improving this could be done in a backwards compatible way and we ended up focusing on non-ergonomic issues first.

The major problem that pushed this back is that "features" sometimes (e.g. for SIMD) come in hierarchies. That is, when one marks a function with avx2 but not sse2 it would be weird to require unsafe for calling an sse2 function but not if you call an avx2 function. So now to get this right one needs to define feature hierarchies, which is hard if one doesn't even know which features one wants to stabilize (the target_feature RFC proposed how features ought to work without proposing any feature to circumvent this issue).

@hsivonen
Copy link
Member Author

@gnzlbg

Whether the CPU supports SSE2 in this case is detected once at run-time and cached.

Thanks. I was missing the part about the automatic cache and potential hoisting of the check.

@nikomatsakis

Would it be fair to say that this RFC is moving the point where a target-feature-fn is made unsafe from the fn declaration site to the call site (or reification site, in the case of a function pointer?).

The call site of the function that uses more ISA extensions than the caller, yes. (Without more checks or unsafe as long and the calls deeper in the stack don't add ISA extensions.)

One thing that was not crystal clear to me from skimming the RFC and the summary is what the recommended style would be. I would think that the recommended style would be to use target-feature and declare all of your target-feature functions as "safe". Of course, actually calling them from any context that does not itself contain target-feature would be unsafe (because the callee gets to assume their target-features are available, and it's the caller's job to validate that).

Then you would have wrappers that do the dynamic testing or whatever and use unsafe { foo() } in order to call the "root" function (any target-feature functions that foo() invokes would be safe, presuming it doesn't attempt to use yet more instructions that weren't originally tested).

Is this about right?

Yes.

@nikomatsakis
Copy link
Contributor

OK, so, I've been thinking more about this. It still seems to me that this RFC describes the way that I would expect safety to "obviously" work when it comes to target-feature; however, this also feels like something of a "nice to have" sort of refinement. I'd hate to slow down the progress of SIMD just on this point. Ultimately, I wouldn't expect this feature to have a big impact on the "usability" of SIMD, since

(a) feature testing will probably be encapsulated in some library that would handle the transition from safe-to-unsafe
(b) once you are writing an unsafe fn (i.e., in today's design), you can call other unsafe fns without any annotation

So the main case being addressed here is accidentally invoking the wrong target-feature fn from your fn, right (as well as encapsulating other uses of unsafe)? This is certainly important, but not make or break.

So probably what I'm saying is that I'd like to hear a bit more from those folks who've been driving the SIMD design to completion about where this fits in the "priority of things". Maybe this is better treated as an idea to 'file away for later'?

(Relatedly, are there concerns about this RFC "on the merits" -- that is, that do not have to do with prioritization?)

@nikomatsakis
Copy link
Contributor

Also, @hsivonen, is your intention that the setting of -Ctarget-cpu would be taken into account when deciding if code is safe or unsafe?

@hsivonen
Copy link
Member Author

hsivonen commented Mar 2, 2018

So the main case being addressed here is accidentally invoking the wrong target-feature fn from your fn, right (as well as encapsulating other uses of unsafe)?

The main case being addressed is not having a situation where either stuff becomes unsafe wholesale (the whole part of the program that uses non-baseline instruction set extensions) or there's cheating about what's safe or there are non-programmer-controlled run-time checks.

Also, @hsivonen, is your intention that the setting of -Ctarget-cpu would be taken into account when deciding if code is safe or unsafe?

Yes, mostly because SSE2 should work without ceremony with the default i686 target and the hoop jumping should be up to the users of the i586 target.

@nikomatsakis
Copy link
Contributor

@rfcbot fcp postpone

It seems like discussion has stalled here. Although I am sympathetic to the aims of this RFC -- as I have previously expressed -- I also have the feeling that the time may not be ripe here. This feels like it's a kind of "side concern" to the main thrust of getting SIMD up and going in some form on stable; it also seems like something that would be better addressed once that work is more mature. Therefore, I move to postpone, and propose that we re-address this once some core part of SIMD is stable (or at least much closer).

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Mar 15, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 15, 2018

Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@est31
Copy link
Member

est31 commented Mar 21, 2018

I disagree with postponement. For me, safety is the core distinguisher of Rust from languages like Go, C or C++ (Go is not threadsafe!), and being able to write fast code safely is important. I'm saying this as someone who argued for making target_feature require unsafe in the first place. Even though this particular RFC wouldn't get SIMD into a state where you could use it without any unsafe at all (so I'd have to adjust the marketing), it feels like an important step into that direction to me. Even if it won't be implemented right away I believe it is definitely implementable by contributors.

This feels like it's a kind of "side concern" to the main thrust of getting SIMD up and going in some form on stable

Merging the RFC doesn't mean that it has to slow down SIMD stabilisation. In fact, I believe that SIMD is highly useful without this feature as well.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 21, 2018

@est31 I think the main problem is that without a prototype implementation of this RFC it is hard to get a feeling for it.

Iterating on a prototype upstream behind a feature gate, and then revisiting this RFC once things are clearer sounds like a more productive path to me. Somebody just needs to pick the torch and do it.

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Mar 22, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 22, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 1, 2018

The final comment period is now complete.

@Centril
Copy link
Contributor

Centril commented Apr 1, 2018

Postponed as per completed FCP.

Thanks @hsivonen for the RFC!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants