Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline constructors and field getters #48

Merged
merged 2 commits into from
Mar 27, 2023
Merged

Conversation

jmeggitt
Copy link
Contributor

Recently, I have been running into some performance issues while reading BGP table dumps. After doing a number of profiles using VTune on Windows and valgrind on Linux, I found one of performance issues is caused by a missed optimization involving Ipv6Net::new. There are a number of other performance issues with my program that I still need to look at, however this one is by far the easiest to fix.

The crux of the issue is that Rust uses thin LTO by default so inlining is not performed across codegen units unless explicitly requested via #[inline] (or when constructed for a generic type, but that isn't the case here). While it is possible to avoid this by enabling fat LTO within a crate's Cargo.toml, this setting is ignored when compiling dependencies making this solution ineffective for library developers.

As you can see in this screenshot of a profile I ran in VTune, Ipv6Net::new took up a massive 10% (2.972 seconds) of the total program runtime. This large amount of CPU time is only possible because my workload of going through BGP data consists almost entirely of reading IPv6 prefixes. However, all of the work being done by this function is unnecessary when viewed in the context of the caller. The majority of the time spent by this function is constructing the return value from the function arguments. When inlined, the compiler is able to construct the Ipv6Net in place so these moves are not required. Thanks to branch prediction the impact of the prefix_len check is minimal, but when inlined the compiler is able to reliably remove the entire check.
image

In this pull request I propose adding #[inline(always)] to the constructors (Ipv6Net::new and Ipv4Net::new) and getters (Ipv6Net::addr, Ipv6Net::prefix_len, Ipv4Net::addr, and Ipv4Net::prefix_len). Additionally I added #[inline(always)] to Ipv6::max_prefix_len and Ipv4::max_prefix_len as they were the only other const functions in the crate and I saw no downside in doing so.

@krisprice
Copy link
Owner

Thanks @jmeggitt - happy to add this. Is #[inline(always)] necessary or does a simple #[inline] achieve the same?

@jmeggitt
Copy link
Contributor Author

@krisprice I did a quick test and it looks like #[inline] is sufficient to get the compiler to inline the function. Since #[inline(always)] might be a bit too heavy handed, I changed it to #[inline] in this pull request.

@krisprice krisprice merged commit 80b8bd9 into krisprice:master Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants