Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Baseline interpreter by using padded code #315

Merged
merged 4 commits into from
Jun 4, 2021

Conversation

chfast
Copy link
Member

@chfast chfast commented May 4, 2021

During code analysis the code is copied and padded with 33 zero bytes. This guarantees that push data is always available in the code buffer and the code ends with STOP. This allows optimizing the interpreter loop and PUSH instructions.

Performance gains are up to ~5%.

This is also needed to enable other optimization: implementing interpreter with "computed goto" or "tail calls".

@chfast chfast marked this pull request as draft May 4, 2021 13:40
Base automatically changed from baseline_api to master May 4, 2021 15:33
@chfast chfast force-pushed the baseline_padded_code branch 3 times, most recently from 1085606 to 915e7b5 Compare May 31, 2021 21:11
@chfast chfast marked this pull request as ready for review May 31, 2021 21:14
@chfast chfast requested review from axic and gumb0 May 31, 2021 21:14
@codecov
Copy link

codecov bot commented May 31, 2021

Codecov Report

Merging #315 (5bf1e76) into master (3a2dbeb) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 5bf1e76 differs from pull request most recent head 6538e6a. Consider uploading reports for the commit 6538e6a to get more accurate results

@@           Coverage Diff           @@
##           master     #315   +/-   ##
=======================================
  Coverage   99.78%   99.78%           
=======================================
  Files          29       29           
  Lines        4108     4112    +4     
=======================================
+ Hits         4099     4103    +4     
  Misses          9        9           
Flag Coverage Δ
consensus 91.08% <93.02%> (-0.12%) ⬇️
unittests 99.78% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
lib/evmone/baseline.cpp 99.82% <100.00%> (+<0.01%) ⬆️

@chfast chfast force-pushed the baseline_padded_code branch from 915e7b5 to fbf2d80 Compare June 1, 2021 07:30
@chfast
Copy link
Member Author

chfast commented Jun 1, 2021

Haswell 4.4 GHz, clang-12

Comparing o/b-master to o/b-padded
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
baseline/analyse/main/blake2b_huff_mean                               +0.0094         +0.0094             5             5             5             5
baseline/execute/main/blake2b_huff/empty_mean                         +0.0067         +0.0067            21            21            21            21
baseline/execute/main/blake2b_huff/2805nulls_mean                     -0.0306         -0.0306           342           332           342           332
baseline/execute/main/blake2b_huff/5610nulls_mean                     -0.0339         -0.0339           664           642           664           642
baseline/execute/main/blake2b_huff/8415nulls_mean                     -0.0301         -0.0302           969           940           969           940
baseline/execute/main/blake2b_huff/65536nulls_mean                    -0.0342         -0.0342          7501          7245          7501          7245
baseline/analyse/main/blake2b_shifts_mean                             +0.0557         +0.0557             3             3             3             3
baseline/execute/main/blake2b_shifts/2805nulls_mean                   -0.0475         -0.0475          3504          3337          3504          3337
baseline/execute/main/blake2b_shifts/5610nulls_mean                   -0.0473         -0.0473          7181          6841          7181          6841
baseline/execute/main/blake2b_shifts/8415nulls_mean                   -0.0521         -0.0522         10804         10241         10804         10241
baseline/execute/main/blake2b_shifts/65536nulls_mean                  -0.0534         -0.0533         86116         81517         86110         81516
baseline/analyse/main/sha1_divs_mean                                  +0.1272         +0.1272             0             1             0             1
baseline/execute/main/sha1_divs/empty_mean                            -0.0270         -0.0270            57            55            57            55
baseline/execute/main/sha1_divs/1351_mean                             -0.0322         -0.0322          1169          1131          1169          1131
baseline/execute/main/sha1_divs/2737_mean                             -0.0274         -0.0274          2272          2210          2272          2210
baseline/execute/main/sha1_divs/5311_mean                             -0.0274         -0.0274          4436          4314          4436          4314
baseline/execute/main/sha1_divs/65536_mean                            -0.0324         -0.0324         54213         52456         54214         52456
baseline/analyse/main/sha1_shifts_mean                                +0.1276         +0.1276             0             1             0             1
baseline/execute/main/sha1_shifts/empty_mean                          -0.0464         -0.0464            35            34            35            34
baseline/execute/main/sha1_shifts/1351_mean                           -0.0423         -0.0423           729           698           729           698
baseline/execute/main/sha1_shifts/2737_mean                           -0.0463         -0.0463          1425          1359          1425          1359
baseline/execute/main/sha1_shifts/5311_mean                           -0.0429         -0.0429          2778          2659          2778          2659
baseline/execute/main/sha1_shifts/65536_mean                          -0.0432         -0.0432         33867         32403         33867         32403
baseline/analyse/main/weierstrudel_mean                               +0.0547         +0.0547             7             7             7             7
baseline/execute/main/weierstrudel/0_mean                             -0.0259         -0.0259           184           179           184           179
baseline/execute/main/weierstrudel/1_mean                             -0.0490         -0.0490           411           391           411           391
baseline/execute/main/weierstrudel/3_mean                             -0.0525         -0.0525           641           607           641           607
baseline/execute/main/weierstrudel/9_mean                             -0.0589         -0.0589          1324          1246          1324          1246
baseline/execute/main/weierstrudel/14_mean                            -0.0598         -0.0598          1898          1785          1898          1785
baseline/analyse/micro/beginsub_push1s_0xffff_mean                    +0.0612         +0.0612            57            60            57            60
baseline/execute/micro/beginsub_push1s_0xffff_mean                    +0.0660         +0.0660            57            61            57            61
baseline/analyse/micro/beginsubs_0xffff_mean                          +0.1499         +0.1499            23            26            23            26
baseline/execute/micro/beginsubs_0xffff_mean                          +0.1569         +0.1570            23            27            23            27
baseline/analyse/micro/jumpdests_0xffff_mean                          +0.0934         +0.0934            79            87            79            87
baseline/execute/micro/jumpdests_0xffff_mean                          -0.0085         -0.0085           197           196           197           196
baseline/analyse/micro/loop_with_many_jumpdests_mean                  +0.0916         +0.0916            30            32            30            32
baseline/execute/micro/loop_with_many_jumpdests_mean                  -0.0336         -0.0336         13879         13413         13879         13413
baseline/analyse/micro/push1s_0xffff_mean                             +0.0561         +0.0561            60            63            60            63
baseline/execute/micro/push1s_0xffff_mean                             +0.0617         +0.0617            61            64            61            64
baseline/analyse/micro/push32s_0xffff_mean                            +0.8915         +0.8915             4             7             4             7
baseline/execute/micro/push32s_0xffff_mean                            +0.8083         +0.8083             5             8             5             8
baseline/analyse/micro/zeros_0xffff_mean                              +0.1475         +0.1475            23            26            23            26
baseline/execute/micro/zeros_0xffff_mean                              +0.1590         +0.1590            23            27            23            27

@chfast
Copy link
Member Author

chfast commented Jun 1, 2021

AMD Zen3, GCC-9

Comparing o/b-master to o/b-padded
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
baseline/execute/main/blake2b_huff/empty_mean                         +0.0166         +0.0165            21            22            21            22
baseline/execute/main/blake2b_huff/2805nulls_mean                     -0.0229         -0.0229           364           356           364           356
baseline/execute/main/blake2b_huff/5610nulls_mean                     +0.0003         +0.0003           702           702           701           702
baseline/execute/main/blake2b_huff/8415nulls_mean                     +0.0165         +0.0165          1017          1034          1017          1033
baseline/execute/main/blake2b_huff/65536nulls_mean                    -0.0313         -0.0313          7964          7714          7962          7713
baseline/analyse/main/blake2b_shifts_mean                             +0.0203         +0.0204             3             3             3             3
baseline/execute/main/blake2b_shifts/2805nulls_mean                   -0.0331         -0.0331          3464          3350          3463          3349
baseline/execute/main/blake2b_shifts/5610nulls_mean                   -0.0173         -0.0173          6988          6867          6986          6865
baseline/execute/main/blake2b_shifts/8415nulls_mean                   +0.0079         +0.0080         10440         10522         10436         10520
baseline/execute/main/blake2b_shifts/65536nulls_mean                  +0.0137         +0.0136         81730         82851         81702         82814
baseline/analyse/main/sha1_divs_mean                                  +0.1506         +0.1506             0             0             0             0
baseline/execute/main/sha1_divs/empty_mean                            +0.0048         +0.0048            62            62            62            62
baseline/execute/main/sha1_divs/1351_mean                             +0.0503         +0.0503          1225          1287          1225          1287
baseline/execute/main/sha1_divs/2737_mean                             -0.0202         -0.0202          2528          2477          2527          2476
baseline/execute/main/sha1_divs/5311_mean                             +0.0218         +0.0218          4763          4867          4762          4865
baseline/execute/main/sha1_divs/65536_mean                            +0.0189         +0.0189         57662         58750         57648         58736
baseline/analyse/main/sha1_shifts_mean                                +0.0789         +0.0789             0             0             0             0
baseline/execute/main/sha1_shifts/empty_mean                          -0.0820         -0.0820            42            39            42            39
baseline/execute/main/sha1_shifts/1351_mean                           -0.0306         -0.0306           816           791           816           791
baseline/execute/main/sha1_shifts/2737_mean                           -0.0421         -0.0422          1636          1567          1636          1567
baseline/execute/main/sha1_shifts/5311_mean                           -0.0703         -0.0703          3263          3033          3262          3033
baseline/execute/main/sha1_shifts/65536_mean                          -0.0854         -0.0854         39815         36414         39805         36406
baseline/analyse/main/weierstrudel_mean                               +0.0393         +0.0393             6             6             6             6
baseline/execute/main/weierstrudel/0_mean                             +0.0061         +0.0061           183           184           183           184
baseline/execute/main/weierstrudel/1_mean                             -0.0151         -0.0150           384           378           383           378
baseline/execute/main/weierstrudel/3_mean                             -0.0270         -0.0270           599           582           599           582
baseline/execute/main/weierstrudel/9_mean                             -0.0310         -0.0310          1265          1226          1265          1225
baseline/execute/main/weierstrudel/14_mean                            +0.0151         +0.0150          1740          1766          1739          1766
baseline/analyse/micro/beginsub_push1s_0xffff_mean                    +0.0484         +0.0483            47            49            47            49
baseline/execute/micro/beginsub_push1s_0xffff_mean                    +0.0317         +0.0317            47            49            47            49
baseline/analyse/micro/beginsubs_0xffff_mean                          -0.1700         -0.1700            29            24            29            24
baseline/execute/micro/beginsubs_0xffff_mean                          -0.1863         -0.1864            29            23            29            23
baseline/analyse/micro/jumpdests_0xffff_mean                          +0.1298         +0.1298            52            58            52            58
baseline/execute/micro/jumpdests_0xffff_mean                          +0.3098         +0.3098           136           178           136           178
baseline/analyse/micro/loop_with_many_jumpdests_mean                  +0.1358         +0.1358            20            22            20            22
baseline/execute/micro/loop_with_many_jumpdests_mean                  -0.0678         -0.0679         15723         14656         15720         14653
baseline/analyse/micro/push1s_0xffff_mean                             +0.0289         +0.0289            50            51            50            51
baseline/execute/micro/push1s_0xffff_mean                             +0.0382         +0.0382            50            52            50            52
baseline/analyse/micro/push32s_0xffff_mean                            +0.5021         +0.5021             3             5             3             5
baseline/execute/micro/push32s_0xffff_mean                            +0.5170         +0.5169             3             5             3             5
baseline/analyse/micro/zeros_0xffff_mean                              -0.1850         -0.1851            29            23            29            23
baseline/execute/micro/zeros_0xffff_mean                              -0.1842         -0.1842            29            24            29            24

@chfast
Copy link
Member Author

chfast commented Jun 1, 2021

AMD Zen3, clang-12

Comparing o/b-master to o/b-padded
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
baseline/execute/main/blake2b_huff/empty_mean                         -0.0370         -0.0370            19            19            19            19
baseline/execute/main/blake2b_huff/2805nulls_mean                     -0.0683         -0.0683           302           281           302           281
baseline/execute/main/blake2b_huff/5610nulls_mean                     -0.0627         -0.0627           589           552           589           552
baseline/execute/main/blake2b_huff/8415nulls_mean                     -0.0531         -0.0531           849           804           849           804
baseline/execute/main/blake2b_huff/65536nulls_mean                    -0.0865         -0.0866          6762          6176          6760          6175
baseline/analyse/main/blake2b_shifts_mean                             +0.1486         +0.1485             3             3             3             3
baseline/execute/main/blake2b_shifts/2805nulls_mean                   -0.0525         -0.0525          3040          2880          3039          2879
baseline/execute/main/blake2b_shifts/5610nulls_mean                   -0.0184         -0.0184          5966          5856          5964          5854
baseline/execute/main/blake2b_shifts/8415nulls_mean                   -0.0462         -0.0462          9185          8761          9182          8758
baseline/execute/main/blake2b_shifts/65536nulls_mean                  -0.0633         -0.0633         71635         67103         71606         67076
baseline/analyse/main/sha1_divs_mean                                  +0.0992         +0.0993             0             0             0             0
baseline/execute/main/sha1_divs/empty_mean                            -0.0591         -0.0591            52            49            52            49
baseline/execute/main/sha1_divs/1351_mean                             -0.0138         -0.0138          1053          1038          1052          1038
baseline/execute/main/sha1_divs/2737_mean                             -0.0530         -0.0530          2053          1945          2053          1944
baseline/execute/main/sha1_divs/5311_mean                             -0.0611         -0.0611          4058          3810          4057          3809
baseline/execute/main/sha1_divs/65536_mean                            -0.0602         -0.0601         49490         46512         49475         46500
baseline/analyse/main/sha1_shifts_mean                                +0.1668         +0.1668             0             0             0             0
baseline/execute/main/sha1_shifts/empty_mean                          -0.0005         -0.0005            30            30            30            30
baseline/execute/main/sha1_shifts/1351_mean                           -0.0641         -0.0641           653           611           653           611
baseline/execute/main/sha1_shifts/2737_mean                           -0.0561         -0.0560          1264          1193          1263          1193
baseline/execute/main/sha1_shifts/5311_mean                           -0.0489         -0.0489          2438          2318          2437          2318
baseline/execute/main/sha1_shifts/65536_mean                          +0.0386         +0.0386         28738         29848         28730         29840
baseline/analyse/main/weierstrudel_mean                               +0.2038         +0.2038             5             6             5             6
baseline/execute/main/weierstrudel/0_mean                             +0.0210         +0.0209           147           150           147           150
baseline/execute/main/weierstrudel/1_mean                             -0.0305         -0.0305           340           330           340           330
baseline/execute/main/weierstrudel/3_mean                             -0.0107         -0.0107           505           500           505           500
baseline/execute/main/weierstrudel/9_mean                             -0.0164         -0.0164          1032          1015          1032          1015
baseline/execute/main/weierstrudel/14_mean                            -0.0370         -0.0370          1508          1452          1507          1451
baseline/analyse/micro/beginsub_push1s_0xffff_mean                    +0.0256         +0.0256            47            49            47            49
baseline/execute/micro/beginsub_push1s_0xffff_mean                    +0.1753         +0.1752            48            56            48            56
baseline/analyse/micro/beginsubs_0xffff_mean                          +0.3575         +0.3576            22            30            22            30
baseline/execute/micro/beginsubs_0xffff_mean                          +0.0418         +0.0418            29            31            29            31
baseline/analyse/micro/jumpdests_0xffff_mean                          +0.0521         +0.0521            89            94            89            94
baseline/execute/micro/jumpdests_0xffff_mean                          +0.0916         +0.0916           182           199           182           198
baseline/analyse/micro/loop_with_many_jumpdests_mean                  +0.0563         +0.0563            34            36            34            36
baseline/execute/micro/loop_with_many_jumpdests_mean                  -0.0137         -0.0137         12623         12450         12620         12447
baseline/analyse/micro/push1s_0xffff_mean                             +0.0290         +0.0291            50            51            50            51
baseline/execute/micro/push1s_0xffff_mean                             +0.1634         +0.1635            51            59            51            59
baseline/analyse/micro/push32s_0xffff_mean                            +0.4993         +0.4993             3             5             3             5
baseline/execute/micro/push32s_0xffff_mean                            +0.5775         +0.5776             4             6             4             6
baseline/analyse/micro/zeros_0xffff_mean                              +0.3677         +0.3678            22            30            22            30
baseline/execute/micro/zeros_0xffff_mean                              +0.0356         +0.0357            30            31            30            31

lib/evmone/baseline.cpp Outdated Show resolved Hide resolved
uint8_t buffer[Len];
// This cannot overflow code buffer because code ends with valid STOP instruction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it's not relevant that it's a STOP?

Suggested change
// This cannot overflow code buffer because code ends with valid STOP instruction.
// This cannot overflow code buffer because code is padded with 0s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original comment is more correct, as the loop is looking for STOP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the comment here is supposed to explain why memcpy cannot overflow.

auto pc = code;
while (pc != code_end)
while (true) // Guaranteed to terminate because code must end with STOP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (true) // Guaranteed to terminate because code must end with STOP.
while (true) // Guaranteed to terminate because padded code ends with STOP.

test/bench/helpers.hpp Show resolved Hide resolved
@chfast chfast force-pushed the baseline_padded_code branch 3 times, most recently from ea36730 to 76a3fb9 Compare June 2, 2021 14:26
return CodeAnalysis{std::move(map)};

// i is the needed code size including the last push data (can be bigger than code_size).
std::unique_ptr<uint8_t[]> padded_code{new uint8_t[i + 1]};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_unique would zero-initialize, as opposed to this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe add a comment that this leaves it unitialized.

Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a random side-thought: but with a static +33 bytes always allocated it potentially could be optimized to do a single allocation in analyze for both JumpdestMap and padded code together.

chfast added 4 commits June 4, 2021 14:25
Guaranteed to terminate because padded code ends with STOP.
This cannot overflow code buffer because code ends with valid STOP
instruction.
@chfast chfast force-pushed the baseline_padded_code branch from 76a3fb9 to 6538e6a Compare June 4, 2021 12:25
@chfast
Copy link
Member Author

chfast commented Jun 4, 2021

Just a random side-thought: but with a static +33 bytes always allocated it potentially could be optimized to do a single allocation in analyze for both JumpdestMap and padded code together.

Yes, this was suppose to be TODO, so I have added it now.

This was my original plan, but there are additional complications:

  • We need custom bitmap implementation or custom allocator for std::vector<bool>.
  • The data for bitmap must be additionally 4 of 8 byte aligned. Now I can see it may be easier to put the jumpdest bitmap at the from of the allocated buffer.

@chfast chfast merged commit 9182e3d into master Jun 4, 2021
@chfast chfast deleted the baseline_padded_code branch June 4, 2021 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants