Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poorly optimized assembly generation around Ipv4Addr, Ipv6Addr #77583

Closed
TyPR124 opened this issue Oct 5, 2020 · 6 comments · Fixed by #83831
Closed

Poorly optimized assembly generation around Ipv4Addr, Ipv6Addr #77583

TyPR124 opened this issue Oct 5, 2020 · 6 comments · Fixed by #83831
Labels
C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@TyPR124
Copy link
Contributor

TyPR124 commented Oct 5, 2020

I tried this code:

pub fn ipv4_bitand_mask(ipv4: Ipv4Addr, mask: u32) -> Ipv4Addr {
    let ipv4_u32 = u32::from_ne_bytes(ipv4.octets());
    Ipv4Addr::from((ipv4_u32 & mask).to_ne_bytes())
}

I expected to see this happen: assembly generated should be equivalent to:

pub fn ipv4_bitand_mask_unsafe(ipv4: Ipv4Addr, mask: u32) -> Ipv4Addr {
    let ipv4_u32: u32 = unsafe { std::mem::transmute(ipv4) };
    unsafe { std::mem::transmute(ipv4_u32 & mask) }
}

Instead, this happened: the generated assembly retains calls to Ipv4::octets and Ipv4::from, leaving the assembly poorly optimized

I've played with different ways of implementing the safe version, but cannot find anyway get it to optimize further.

Good assembly:

example::ipv4_bitand_mask_unsafe:
        mov     eax, edi
        and     eax, esi
        ret

Bad assembly:

example::ipv4_bitand_mask:
        push    rbx
        sub     rsp, 16
        mov     ebx, esi
        mov     dword ptr [rsp + 8], edi
        lea     rdi, [rsp + 8]
        call    qword ptr [rip + std::net::ip::Ipv4Addr::octets@GOTPCREL]
        and     eax, ebx
        mov     edi, eax
        call    qword ptr [rip + <std::net::ip::Ipv4Addr as core::convert::From<[u8; 4]>>::from@GOTPCREL]
        add     rsp, 16
        pop     rbx
        ret

Playground and GodBolt (both include an IPv6 example as well)

Adding '-Z mir-opt-level=2' (or higher) does optimize away the call to octets, but not the call to from.

I get the same optimization behavior when building a binary, at least on my own platfom (x86-64 Windows MSVC)

Meta

rustc --version --verbose:

rustc 1.49.0-nightly (beb5ae474 2020-10-04)
binary: rustc
commit-hash: beb5ae474d2835962ebdf7416bd1c9ad864fe101
commit-date: 2020-10-04
host: x86_64-pc-windows-msvc
release: 1.49.0-nightly
LLVM version: 11.0
@TyPR124 TyPR124 added the C-bug Category: This is a bug. label Oct 5, 2020
@jonas-schievink
Copy link
Contributor

Maybe it's just missing #[inline]s?

@jonas-schievink jonas-schievink added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 5, 2020
@TyPR124
Copy link
Contributor Author

TyPR124 commented Oct 5, 2020

I will try to test with adding #[inline] in the appropriate places in std

@scottmcm
Copy link
Member

scottmcm commented Oct 6, 2020

This is a trivial method on a concrete type, so yeah, probably just needs #[inline]. Or you want to turn on LTO.

(If it were generic it wouldn't be necessary, but for concrete things the bodies aren't in the meta.)

@TyPR124
Copy link
Contributor Author

TyPR124 commented Oct 6, 2020

So I found when a crate exports a function with #[inline], it does get optimized where called (though I'm . On the other hand, the same logic inside function will not be inlined: this basic program, for instance, retains assembly calls to octets() and from():

fn main() {
    let ip: Ipv4Addr = std::env::args().nth(1).unwrap().parse().unwrap();
    let ipbeu32 = u32::from_be_bytes(ip.octets());
    let net = Ipv4Addr::from((ipbeu32 & 0xffffff00u32.to_be()).to_ne_bytes());
    println!("{}", net);
}

Edit: copy the right "basic program" this time...

With #[inline] on the methods in libstd, the two calls are optimized away.

Is this worth a PR to add #[inline] to libstd? If so, is there anything that shouldn't have #[inline] in the API? Seems to me like nearly everything would be better inlined, at least for IPv4, but I'm far from an expert.

@LeSeulArtichaut
Copy link
Contributor

LeSeulArtichaut commented Oct 7, 2020

Feel free to open a PR I guess, your reviewer will decide whether it should be merged or not.

@mati865
Copy link
Contributor

mati865 commented Mar 14, 2021

Ping @TyPR124, do you want to work on fixing this issue?

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Apr 5, 2021
…ine-for-ip, r=m-ou-se

Add `#[inline]` to IpAddr methods

Add some inlines to trivial methods of IpAddr
Closes rust-lang#77583
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Apr 5, 2021
…ine-for-ip, r=m-ou-se

Add `#[inline]` to IpAddr methods

Add some inlines to trivial methods of IpAddr
Closes rust-lang#77583
jyn514 pushed a commit to jyn514/rust that referenced this issue Apr 5, 2021
…ine-for-ip, r=m-ou-se

Add `#[inline]` to IpAddr methods

Add some inlines to trivial methods of IpAddr
Closes rust-lang#77583
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Apr 5, 2021
…ine-for-ip, r=m-ou-se

Add `#[inline]` to IpAddr methods

Add some inlines to trivial methods of IpAddr
Closes rust-lang#77583
@bors bors closed this as completed in a3d0fa8 Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants