Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use out pointer more eagerly (return value optimisation) #7298

Closed
huonw opened this issue Jun 22, 2013 · 5 comments
Closed

Use out pointer more eagerly (return value optimisation) #7298

huonw opened this issue Jun 22, 2013 · 5 comments
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@huonw
Copy link
Member

huonw commented Jun 22, 2013

The following pattern occurs reasonably often in the standard lib:

fn new() -> SomeStruct {
   let mut foo = SomeStruct { empty... };
   // ... initialise foo ...
   foo
}

which currently uses a memcpy at the end to copy the result into the out pointer, rather than just directly placing foo into it to begin with.

An example:

pub fn foo() -> [uint, .. 8] {
    [0, .. 8]
}
pub fn bar() -> [uint, .. 8] {
    let a = [0, .. 8];
    a
}

Compiling with -O -S --emit-llvm:

define void @_ZN3foo15_5274bbbd7427d03_00E([8 x i64]* nocapture, { i64, %tydesc*, i8*, i8*, i8 } addrspace(1)* nocapture) #1 {
static_allocas:
  %2 = bitcast [8 x i64]* %0 to i8*
  call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 64, i32 8, i1 false)
  ret void
}

define void @_ZN3bar15_5274bbbd7427d03_00E([8 x i64]* nocapture, { i64, %tydesc*, i8*, i8*, i8 } addrspace(1)* nocapture) #1 {
static_allocas:
  %2 = alloca [8 x i64], align 8
  %3 = bitcast [8 x i64]* %2 to i8*
  call void @llvm.memset.p0i8.i64(i8* %3, i8 0, i64 64, i32 8, i1 false)
  %4 = bitcast [8 x i64]* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %4, i8* %3, i64 64, i32 8, i1 false)
  ret void
}
@mstewartgallus
Copy link
Contributor

So I thought that this might be due to being only at optimization level 2 but even at optimization level 3 the code is still the same. By the way what does optimization level 3 actually enable over 2?

define void @_ZN3foo17_87e6ad83abde57f23_00E([8 x i64]* nocapture, { i64, %tydesc*, i8*, i8*, i8 } addrspace(1)* nocapture) #1 {
static_allocas:
   %2 = bitcast [8 x i64]* %0 to i8*
   call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 64, i32 8, i1 false)
   ret void
}

define void @_ZN3bar17_87e6ad83abde57f23_00E([8 x i64]* nocapture, { i64, %tydesc*, i8*, i8*, i8 } addrspace(1)* nocapture) #1 {
static_allocas:
   %2 = alloca [8 x i64], align 8
   3 = bitcast [8 x i64]* %2 to i8*
   call void @llvm.memset.p0i8.i64(i8* %3, i8 0, i64 64, i32 8, i1 false)
   %4 = bitcast [8 x i64]* %0 to i8*
   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %4, i8* %3, i64 64, i32 8, i1 false)
   ret void
}

@huonw
Copy link
Member Author

huonw commented Jun 23, 2013

If I'm reading back::passes correctly, --opt-level=3 adds argpromotion and loop-vectorize.

thestinger added a commit that referenced this issue Sep 11, 2013
This brings Rust in line with how `clang` handles return pointers.

Example:

    pub fn bar() -> [uint, .. 8] {
        let a = [0, .. 8];
        a
    }

Before:

    ; Function Attrs: nounwind uwtable
    define void @_ZN3bar17ha4635c6f704bfa334v0.0E([8 x i64]* nocapture, { i64, %tydesc*, i8*, i8*, i8 }* nocapture readnone) #1 {
    "function top level":
      %a = alloca [8 x i64], align 8
      %2 = bitcast [8 x i64]* %a to i8*
      call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 64, i32 8, i1 false)
      %3 = bitcast [8 x i64]* %0 to i8*
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 64, i32 8, i1 false)
      ret void
    }

After:

    ; Function Attrs: nounwind uwtable
    define void @_ZN3bar17ha4635c6f704bfa334v0.0E([8 x i64]* noalias nocapture sret, { i64, %tydesc*, i8*, i8*, i8 }* nocapture readnone) #1 {
    "function top level":
      %2 = bitcast [8 x i64]* %0 to i8*
      call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 64, i32 8, i1 false)
      ret void
    }

Closes #9072
Closes #7298
@thestinger
Copy link
Contributor

This has regressed in LLVM after the issue was originally closed. Despite having both sret and dereferenceable, this does not optimize out.

@thestinger thestinger reopened this Aug 16, 2014
@luqmana
Copy link
Member

luqmana commented Aug 22, 2014

So it turn out that it's the lifetime markers that are stopping LLVM from optimizing this into just one memset. I have patch for LLVM upstream: http://reviews.llvm.org/D5020

@luqmana luqmana mentioned this issue Oct 4, 2014
@luqmana luqmana added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 4, 2014
bors added a commit that referenced this issue Oct 5, 2014
Update our LLVM snapshot to master (as of ~ Wed Oct 1 18:49:58 2014 +0000). 

Since my patches have landed upstream this fixes #13429 and #7298.
@luqmana
Copy link
Member

luqmana commented Oct 5, 2014

Closed by #17776.

@luqmana luqmana closed this as completed Oct 5, 2014
flip1995 added a commit to flip1995/rust that referenced this issue Jul 29, 2021
…ednet,flip1995

Switch CI to new metadata collection

r? `@xFrednet`

Things we have to keep in mind:

- This removes the template files and the scripts used for deployment from the checkout. This was added in rust-lang#5517. I don't think we ever needed those there. Not sure though.
- ~~As a result, we can't remove the python scripts yet. We have to wait until this hits a stable Clippy release.~~ I'll just break the next stable deploy and do it by hand once.
- This should be merged together with rust-lang#7279. Me and `@xFrednet` will coordinate the switch
- ...?

I still have to try out some things:

- [x] Is it worth caching? Yes
- [x] ~~Is it worth to do a release build?~~ Nope
- [x] Does it actually work? With a few changes, yes
- [ ] ...?

changelog: Clippy now uses a lint to generate its documentation 🎉
flip1995 pushed a commit to flip1995/rust that referenced this issue Jul 29, 2021
Rollup of 3 pull requests

Successful merges:

 - rust-lang#7279 (Adapting the lint list to Clippy's new metadata format)
 - rust-lang#7298 (Switch CI to new metadata collection)
 - rust-lang#7420 (Update lint documentation to use markdown headlines)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup

changelog: rollup
flip1995 pushed a commit to flip1995/rust that referenced this issue Aug 12, 2021
Updated changelog for 1.55

This has again been a bit of work, but I'm happy to notice that my English is still improving, and I'm getting faster at these things. That's a very nice side effect of contributing and getting feedback on reviews 😊

Moving on, there were a few things that I was unsure about:
* The PR rust-lang#86717 changes an old entry in the change log, is this worth mentioning? I've left it out for now.
* The stabilization of `cargo clippy --fix` is quite awesome and important IMO. It sadly gets a bit lost in the *Other* entry, as it's the last one. Do we maybe want to move it somewhere else or change the headline order for this release?
* I've listed the introduction of the new `suspicious` group under the *Moves and Deprecations* section. Is this alright, or should it be moved to the *Other* section as well?
* Last but definitely not least, some fun! I've used the 🎉 emoji in the `cargo clippy --fix` entry, is this okay?

Sorry for the bombardment of questions xD

---

The PR already includes the entries for the new metadata collection and website updates. These are not merged yet, but should probably be to make this correct. This might also require the commit hashes to be updated (Not sure on this, though). It would actually be super fitting to get this into this release as we also stabilize `--fix`. TODOs:
* [x] Merge metadata collection PRs:
  1. rust-lang#7279
  2. rust-lang#7298
  3. rust-lang#7420 (Hope to not get any merge conflicts)

---

[Rendered 📰](https://github.com/xFrednet/rust-clippy/blob/changelog-1-55/CHANGELOG.md)

r? `@flip1995`

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants