Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missoptimization of functions with C calling convention #33868

Closed
Florob opened this issue May 25, 2016 · 14 comments
Closed

Missoptimization of functions with C calling convention #33868

Florob opened this issue May 25, 2016 · 14 comments
Labels
A-FFI Area: Foreign function interface (FFI) P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta.

Comments

@Florob
Copy link
Contributor

Florob commented May 25, 2016

When trying to look at how aggregates are passed to functions I came across the following:
The code presented below erroneously outputs 488447261 instead of 20.
This is using the 1.9.0 preview release (or anything newer) and seems to be a regression from 1.8.0.
The behaviour only occurs when using the C calling conventions and optimizations, eliminating either factor gives the correct result.

#[repr(C)]
struct S {
    a: u32,
    b: f32,
    c: u32
}

extern "C" fn test(s: S) -> u32 {
    s.a + s.b as u32 + s.c
}

fn main() {
    println!("{}", test(S { a: 12, b: 3.4, c: 5 }));
}
@retep998
Copy link
Member

Which target is this for? There are multiple calling conventions across operating systems and architectures after all.

@Florob
Copy link
Contributor Author

Florob commented May 25, 2016

This is on x86_64-unknown-linux-gnu.

@michaelwoerister
Copy link
Member

I can confirm this for rustc 1.10.0-nightly (dd6e8d45e 2016-05-23)

@nagisa nagisa added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label May 25, 2016
@nagisa
Copy link
Member

nagisa commented May 25, 2016

Works on stable.

@nagisa
Copy link
Member

nagisa commented May 25, 2016

cc @eddyb could it be your call trans refactor?

@nagisa
Copy link
Member

nagisa commented May 25, 2016

Seems like we invoke some UB:

; Function Attrs: noinline norecurse nounwind readnone uwtable
define i32 @test({ i64, i64 }) unnamed_addr #0 {
entry-block:
  ret i32 undef
}

is what the function ends up being optimised to.

@nagisa nagisa added I-wrong A-FFI Area: Foreign function interface (FFI) labels May 25, 2016
@nagisa
Copy link
Member

nagisa commented May 25, 2016

This is what I managed to reduce the issue to.

@arielb1
Copy link
Contributor

arielb1 commented May 25, 2016

This suffices:

#[no_mangle]
pub extern "C"  fn test(s: S) -> u32 {
    s.c
}

Bad IR:

define i32 @test({ i64, i64 }) unnamed_addr #0 !dbg !176 {
entry-block:
  %s = alloca %S
  %1 = bitcast %S* %s to { i64, i64 }*
  store { i64, i64 } %0, { i64, i64 }* %1, align 4
  call void @llvm.dbg.declare(metadata %S* %s, metadata !202, metadata !203), !dbg !204
  %2 = getelementptr inbounds %S, %S* %s, i32 0, i32 2
  %3 = load i32, i32* %2, !dbg !205
  ret i32 %3, !dbg !207
}

@arielb1
Copy link
Contributor

arielb1 commented May 25, 2016

The old, working functionality uses a shim:

  %1 = alloca { i64, i64 }
  store { i64, i64 } %0, { i64, i64 }* %1
  %2 = bitcast { i64, i64 }* %1 to %S*
  %3 = call i32 @_ZN4test10__rust_abiE(%S* noalias nocapture dereferenceable(12) %2)
  ret i32 %3

@arielb1
Copy link
Contributor

arielb1 commented May 25, 2016

We are storing 16 bytes of data in a 12-byte bag and that is killing us.

@nagisa
Copy link
Member

nagisa commented May 25, 2016

For the reference, define i32 @test(i64 %s.coerce0, i32 %s.coerce1) #0 { is the signature generated by clang for an identical function. Alternatively we could use {i64, i32} instead of {i64, i64}.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 25, 2016

@arielb1 what was the signature of the shim function? nevermind

@nikomatsakis nikomatsakis added the P-high High priority label May 25, 2016
@eddyb
Copy link
Member

eddyb commented May 25, 2016

Ah, this is the problem @dotdash was aware of, isn't it?
Where none of our current system works perfectly right with respect to cast arguments/returns of different sizes but some of the old code was long-winded enough to avoid the problem.

EDIT: That was actually more about calls, rather than definitions, and I can't get calls to misbehave (we do the same thing there that the old code does AFAICT). One thing that we might need to look into is MIR trans not doing a cast returned value memcpy.

@alexcrichton
Copy link
Member

The core team discussed this issue today and our conclusions were as follows:

  • Definitely seems like something we should fix immediately, hence the P-high.
  • We're very likely to backport the fix to beta (unless it turns out to require a compiler refactor, which seems unlikely)
  • We're moving forward with the 1.9 release tomorrow as this has been on nightly for at least 6 weeks and hasn't been discovered yet. That's at least a signal that the breakage isn't major enough to warrant blocking or delaying the entire release.
  • Once the fix is backported to beta and has some testing on the beta branch we can consider whether to make a point release (e.g. 1.9.1). If this bug is widespread in practice then we can definitely do that, but if the impact continues to be low we may not do that.

@bors bors closed this as completed in 97bf80f May 26, 2016
alexcrichton pushed a commit to alexcrichton/rust that referenced this issue May 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-FFI Area: Foreign function interface (FFI) P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta.
Projects
None yet
Development

No branches or pull requests

8 participants