Inline constructors and field getters #48
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently, I have been running into some performance issues while reading BGP table dumps. After doing a number of profiles using VTune on Windows and valgrind on Linux, I found one of performance issues is caused by a missed optimization involving
Ipv6Net::new
. There are a number of other performance issues with my program that I still need to look at, however this one is by far the easiest to fix.The crux of the issue is that Rust uses thin LTO by default so inlining is not performed across codegen units unless explicitly requested via
#[inline]
(or when constructed for a generic type, but that isn't the case here). While it is possible to avoid this by enabling fat LTO within a crate'sCargo.toml
, this setting is ignored when compiling dependencies making this solution ineffective for library developers.As you can see in this screenshot of a profile I ran in VTune,
Ipv6Net::new
took up a massive 10% (2.972 seconds) of the total program runtime. This large amount of CPU time is only possible because my workload of going through BGP data consists almost entirely of reading IPv6 prefixes. However, all of the work being done by this function is unnecessary when viewed in the context of the caller. The majority of the time spent by this function is constructing the return value from the function arguments. When inlined, the compiler is able to construct theIpv6Net
in place so these moves are not required. Thanks to branch prediction the impact of theprefix_len
check is minimal, but when inlined the compiler is able to reliably remove the entire check.In this pull request I propose adding
#[inline(always)]
to the constructors (Ipv6Net::new
andIpv4Net::new
) and getters (Ipv6Net::addr
,Ipv6Net::prefix_len
,Ipv4Net::addr
, andIpv4Net::prefix_len
). Additionally I added#[inline(always)]
toIpv6::max_prefix_len
andIpv4::max_prefix_len
as they were the only otherconst
functions in the crate and I saw no downside in doing so.