Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cmd/compile: boost inlining into FORs
As already Than McIntosh mentioned it's a common practise to boost inlining to FORs, since the callsite could be "hotter". This patch implements this functionality. The implementation uses a stack of FORs to recognise calls which are in a loop. The stack is maintained alongside inlnode function works and contains information about ancenstor FORs relative to a current node in inlnode. There is "big" FOR which cost is >= inlineBigForCost(105). In such FORs no boost is applied. Updates golang#17566 The following results on GO1, while binary size not increased significantly 10454800 -> 10475120, which is less than 0.3%. goos: linux goarch: amd64 pkg: test/bench/go1 cpu: Intel(R) Xeon(R) Gold 6230N CPU @ 2.30GHz name old time/op new time/op delta BinaryTree17-8 2.15s ± 1% 2.17s ± 1% ~ (p=0.065 n=6+6) Fannkuch11-8 2.70s ± 0% 2.69s ± 0% -0.25% (p=0.010 n=6+4) FmtFprintfEmpty-8 31.9ns ± 0% 31.4ns ± 0% -1.61% (p=0.008 n=5+5) FmtFprintfString-8 57.0ns ± 0% 57.1ns ± 0% +0.26% (p=0.013 n=6+5) FmtFprintfInt-8 65.2ns ± 0% 63.9ns ± 0% -1.95% (p=0.008 n=5+5) FmtFprintfIntInt-8 103ns ± 0% 102ns ± 0% -1.01% (p=0.000 n=5+4) FmtFprintfPrefixedInt-8 119ns ± 0% 118ns ± 0% -0.50% (p=0.008 n=5+5) FmtFprintfFloat-8 169ns ± 0% 174ns ± 0% +2.75% (p=0.008 n=5+5) FmtManyArgs-8 445ns ± 0% 447ns ± 0% +0.46% (p=0.002 n=6+6) GobDecode-8 4.37ms ± 1% 4.40ms ± 0% +0.62% (p=0.009 n=6+6) GobEncode-8 3.07ms ± 0% 3.04ms ± 0% -0.78% (p=0.004 n=5+6) Gzip-8 195ms ± 0% 195ms ± 0% ~ (p=0.429 n=5+6) Gunzip-8 28.2ms ± 0% 28.2ms ± 0% ~ (p=0.662 n=5+6) HTTPClientServer-8 45.0µs ± 1% 45.4µs ± 1% ~ (p=0.093 n=6+6) JSONEncode-8 8.01ms ± 0% 8.03ms ± 0% +0.31% (p=0.008 n=5+5) JSONDecode-8 35.3ms ± 1% 35.1ms ± 0% -0.72% (p=0.008 n=5+5) Mandelbrot200-8 4.50ms ± 0% 4.49ms ± 1% ~ (p=0.937 n=6+6) GoParse-8 3.03ms ± 1% 3.00ms ± 1% ~ (p=0.180 n=6+6) RegexpMatchEasy0_32-8 55.4ns ± 0% 53.2ns ± 3% -3.92% (p=0.004 n=5+6) RegexpMatchEasy0_1K-8 178ns ± 0% 175ns ± 1% -1.57% (p=0.004 n=5+6) RegexpMatchEasy1_32-8 50.1ns ± 0% 48.3ns ± 5% ~ (p=0.082 n=5+6) RegexpMatchEasy1_1K-8 271ns ± 1% 262ns ± 1% -3.26% (p=0.004 n=6+5) RegexpMatchMedium_32-8 949ns ± 0% 886ns ± 7% ~ (p=0.329 n=5+6) RegexpMatchMedium_1K-8 27.1µs ± 7% 28.1µs ± 6% ~ (p=0.394 n=6+6) RegexpMatchHard_32-8 1.28µs ± 2% 1.29µs ± 0% ~ (p=0.056 n=6+6) RegexpMatchHard_1K-8 38.5µs ± 0% 38.4µs ± 0% -0.25% (p=0.009 n=6+5) Revcomp-8 397ms ± 0% 396ms ± 0% ~ (p=0.429 n=6+5) Template-8 48.1ms ± 1% 48.1ms ± 0% ~ (p=0.222 n=5+5) TimeParse-8 213ns ± 0% 213ns ± 0% ~ (p=0.210 n=4+6) TimeFormat-8 295ns ± 1% 259ns ± 0% -12.22% (p=0.002 n=6+6) [Geo mean] 40.5µs 40.1µs -1.00% name old speed new speed delta GobDecode-8 176MB/s ± 1% 174MB/s ± 0% -0.61% (p=0.009 n=6+6) GobEncode-8 250MB/s ± 0% 252MB/s ± 0% +0.79% (p=0.004 n=5+6) Gzip-8 100MB/s ± 0% 100MB/s ± 0% ~ (p=0.351 n=5+6) Gunzip-8 687MB/s ± 0% 687MB/s ± 0% ~ (p=0.662 n=5+6) JSONEncode-8 242MB/s ± 0% 242MB/s ± 0% -0.31% (p=0.008 n=5+5) JSONDecode-8 54.9MB/s ± 1% 55.3MB/s ± 0% +0.71% (p=0.008 n=5+5) GoParse-8 19.1MB/s ± 1% 19.3MB/s ± 1% ~ (p=0.143 n=6+6) RegexpMatchEasy0_32-8 578MB/s ± 0% 601MB/s ± 3% +4.10% (p=0.004 n=5+6) RegexpMatchEasy0_1K-8 5.74GB/s ± 1% 5.85GB/s ± 1% +1.90% (p=0.002 n=6+6) RegexpMatchEasy1_32-8 639MB/s ± 0% 663MB/s ± 4% ~ (p=0.082 n=5+6) RegexpMatchEasy1_1K-8 3.78GB/s ± 1% 3.91GB/s ± 1% +3.38% (p=0.004 n=6+5) RegexpMatchMedium_32-8 33.7MB/s ± 0% 36.2MB/s ± 7% ~ (p=0.268 n=5+6) RegexpMatchMedium_1K-8 37.9MB/s ± 6% 36.5MB/s ± 6% ~ (p=0.411 n=6+6) RegexpMatchHard_32-8 24.9MB/s ± 2% 24.8MB/s ± 0% ~ (p=0.063 n=6+6) RegexpMatchHard_1K-8 26.6MB/s ± 0% 26.7MB/s ± 0% +0.25% (p=0.009 n=6+5) Revcomp-8 640MB/s ± 0% 641MB/s ± 0% ~ (p=0.429 n=6+5) Template-8 40.4MB/s ± 1% 40.3MB/s ± 0% ~ (p=0.222 n=5+5) [Geo mean] 175MB/s 177MB/s +1.05%
- Loading branch information