Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New nightly shows mysterious changes of binary size #214

Closed
matklad opened this issue Jul 14, 2018 · 16 comments
Closed

New nightly shows mysterious changes of binary size #214

matklad opened this issue Jul 14, 2018 · 16 comments

Comments

@matklad
Copy link

matklad commented Jul 14, 2018

For https://github.com/matklad/tom/tree/f81bede5244eda62dac6b362784b6529d17f26e6, I observe an interesting difference between nightly-2018-07-07 and nightly-2018-07-13.

On 07-07, I see a huge unused data section in WASM. On 07-13 the section is gone, but the total size of the binary is nonetheless slightly bigger. This is after wasm-opt (see build.sh)

STR:

% git clone https://github.com/matklad/tom/ && cd tom && git checkout f81bede5244eda62dac6b362784b6529d17f26e6

% twiggy --version
twiggy-opt 0.2.0

% rustup override add nightly-2018-07-07
info: using existing install for 'nightly-2018-07-07-x86_64-unknown-linux-gnu'
info: override toolchain for '/home/matklad/projects/tom' set to 'nightly-2018-07-07-x86_64-unknown-linux-gnu'

  nightly-2018-07-07-x86_64-unknown-linux-gnu unchanged - rustc 1.29.0-nightly (e06c87544 2018-07-06)

% rustc -vV
rustc 1.29.0-nightly (e06c87544 2018-07-06)
binary: rustc
commit-hash: e06c875442e91cc2c597135d1e807a69e73eee26
commit-date: 2018-07-06
host: x86_64-unknown-linux-gnu
release: 1.29.0-nightly
LLVM version: 6.0

% ./build.sh
    Finished release [optimized] target(s) in 0.03s

% ls main_bg.wasm
.rw-r--r-- 553k matklad 15 Jul  0:31 main_bg.wasm

% twiggy garbage main_bg.wasm
 Bytes  │ Size % │ Garbage Item
────────┼────────┼─────────────
 161804 ┊ 29.22% ┊ data[35] // OUCH!!!
  55289 ┊  9.98% ┊ data[8]
  28765 ┊  5.19% ┊ code[107]
  17948 ┊  3.24% ┊ data[34]
  14409 ┊  2.60% ┊ code[74]
   7761 ┊  1.40% ┊ code[183]
   7685 ┊  1.39% ┊ code[116]
   7531 ┊  1.36% ┊ code[362]
   4938 ┊  0.89% ┊ code[142]
   4197 ┊  0.76% ┊ code[112]


// Switching to newer nightly...

% rustup override add nightly-2018-07-13
info: using existing install for 'nightly-2018-07-13-x86_64-unknown-linux-gnu'
info: override toolchain for '/home/matklad/projects/tom' set to 'nightly-2018-07-13-x86_64-unknown-linux-gnu'

  nightly-2018-07-13-x86_64-unknown-linux-gnu unchanged - rustc 1.29.0-nightly (64f7de921 2018-07-12)

% rustc -vV
rustc 1.29.0-nightly (64f7de921 2018-07-12)
binary: rustc
commit-hash: 64f7de92166f1f7d940575feff57ac33bc384550
commit-date: 2018-07-12
host: x86_64-unknown-linux-gnu
release: 1.29.0-nightly
LLVM version: 7.0 // newer LLVM

% ./build.sh
   Compiling tom_wasm v0.0.0 (file:///home/matklad/projects/tom/tom_wasm)
    Finished release [optimized] target(s) in 4.36s
    
% ls main_bg.wasm
.rw-r--r-- 574k matklad 15 Jul  0:32 main_bg.wasm

% twiggy garbage main_bg.wasm
 Bytes │ Size % │ Garbage Item
───────┼────────┼─────────────
 52691 ┊  9.18% ┊ data[8] // Hey, we've lost that big of data. However, total size (^) is bigger
 28875 ┊  5.03% ┊ code[107]
 14194 ┊  2.47% ┊ code[74]
  8048 ┊  1.40% ┊ code[361]
  7708 ┊  1.34% ┊ code[183]
  7642 ┊  1.33% ┊ code[116]
  6456 ┊  1.12% ┊ data[3420]
  5720 ┊  1.00% ┊ data[3492]
  5696 ┊  0.99% ┊ data[302]
  5664 ┊  0.99% ┊ data[3434]

@matklad
Copy link
Author

matklad commented Jul 14, 2018

The suspect is upgrade to LLVM 7.0: rust-lang/rust#51966

@matklad
Copy link
Author

matklad commented Jul 14, 2018

Hm, so maybe wasm-opt just introduces noise here? In the example above, I use wasm-opt after bindgen, and that actually produces a module with empty exports.

If I try to use wasm-opt before bindgen, then it fails with 07-07 [parse exception: duplicate function name: core::ptr::drop_in_place::h8605fda05156bf8b]Fatal: error in parsing input, but works with 07-13.

If I completely remove wasm-opt from the picture, I still get the same picture: smaller file with old toolchain with 30% garbage. The absolute file size is slightly bigger though.

@alexcrichton
Copy link
Contributor

This is likely due to new debuginfo sections, but if you reinstall wasm-bindgen-cli it'll automatically remove them in non-debug mode. Does that account for the increase in size?

@matklad
Copy link
Author

matklad commented Jul 15, 2018

I've just tried wasm-bindgen[-cli] from master, and I still get 574k after wasm-opt.

@matklad
Copy link
Author

matklad commented Jul 15, 2018

Here's another experiment, sans wasm-opt this time:

% rustup override add nightly-2018-07-07
info: using existing install for 'nightly-2018-07-07-x86_64-unknown-linux-gnu'
info: override toolchain for '/home/matklad/projects/tom' set to 'nightly-2018-07-07-x86_64-unknown-linux-gnu'

  nightly-2018-07-07-x86_64-unknown-linux-gnu unchanged - rustc 1.29.0-nightly (e06c87544 2018-07-06)

% ./build.sh
+ cargo build --manifest-path ./tom_wasm/Cargo.toml --target wasm32-unknown-unknown --release
   Compiling tom_wasm v0.0.0 (file:///home/matklad/projects/tom/tom_wasm)
    Finished release [optimized] target(s) in 4.35s
+ TARGET_DIR=tom_wasm/target/wasm32-unknown-unknown/release
+ cp tom_wasm/target/wasm32-unknown-unknown/release/tom_wasm.wasm tom_wasm/target/wasm32-unknown-unknown/release/main.wasm
+ wasm-bindgen --no-modules --no-typescript tom_wasm/target/wasm32-unknown-unknown/release/main.wasm --out-dir .

% ls main_bg.wasm
.rw-r--r-- 621k matklad 15 Jul  9:31 main_bg.wasm

% twiggy top -n 20 main_bg.wasm
 Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        195479 ┊    31.46% ┊ data[1]
         59148 ┊     9.52% ┊ data[0]
         46340 ┊     7.46% ┊ "function names" subsection
         31039 ┊     5.00% ┊ <regex_syntax::ast::parse::ParserI<'s, P>>::parse_with_comments::h84231033ad49714c
         15439 ┊     2.48% ┊ m_lexer::LexerBuilder::rule::hbc202faabb884591
         15079 ┊     2.43% ┊ regex::re_unicode::Regex::find_at::h3056ffa2c67ce47c
          8230 ┊     1.32% ┊ <regex_syntax::hir::translate::TranslatorI<'t, 'p> as regex_syntax::ast::visitor::Visitor>::visit_post::h670d5fe9dc58bed8
          8067 ┊     1.30% ┊ regex::compile::Compiler::c::h12a3db04236ec27c
          7900 ┊     1.27% ┊ <regex_syntax::ast::parse::ParserI<'s, P>>::parse_escape::h1d6b379ec15db3e8
          5567 ┊     0.90% ┊ regex::exec::ExecNoSync::find_nfa::hd01c761c4b6e01df
          5075 ┊     0.82% ┊ regex::literal::Matcher::new::hca44ed6364b1e10e
          4896 ┊     0.79% ┊ parse
          4495 ┊     0.72% ┊ regex_syntax::ast::visitor::HeapVisitor::visit_class_post::he0a5d7714cc29b10
          4133 ┊     0.67% ┊ regex_syntax::hir::literal::suffixes::h27141ba2cc17b3a5
          4055 ┊     0.65% ┊ regex_syntax::hir::literal::prefixes::hf5ce36098e39e4c3
          3562 ┊     0.57% ┊ <&'a T as core::fmt::Display>::fmt::h6e620a657a832f4a
          3511 ┊     0.57% ┊ regex_syntax::hir::literal::Literals::unambiguous_prefixes::h01bfcd8f738c9e5f
          3446 ┊     0.55% ┊ dlmalloc::dlmalloc::Dlmalloc::malloc::hed95c61c3b177d80
          2967 ┊     0.48% ┊ regex::compile::Compiler::compile::h3df653bfcf1fa7cd
          2625 ┊     0.42% ┊ regex_syntax::unicode::class::h1ffff106b6605e65


% rustup override add nightly-2018-07-13
info: using existing install for 'nightly-2018-07-13-x86_64-unknown-linux-gnu'
info: override toolchain for '/home/matklad/projects/tom' set to 'nightly-2018-07-13-x86_64-unknown-linux-gnu'

  nightly-2018-07-13-x86_64-unknown-linux-gnu unchanged - rustc 1.29.0-nightly (64f7de921 2018-07-12)

% ./build.sh
+ cargo build --manifest-path ./tom_wasm/Cargo.toml --target wasm32-unknown-unknown --release
    Finished release [optimized] target(s) in 0.02s
+ TARGET_DIR=tom_wasm/target/wasm32-unknown-unknown/release
+ cp tom_wasm/target/wasm32-unknown-unknown/release/tom_wasm.wasm tom_wasm/target/wasm32-unknown-unknown/release/main.wasm
+ wasm-bindgen --no-modules --no-typescript tom_wasm/target/wasm32-unknown-unknown/release/main.wasm --out-dir .

% ls main_bg.wasm
.rw-r--r-- 658k matklad 15 Jul  9:30 main_bg.wasm

% twiggy top -n 20 main_bg.wasm
 Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         56849 ┊     8.63% ┊ "function names" subsection
         56274 ┊     8.54% ┊ data[0]
         31132 ┊     4.73% ┊ _$LT$regex_syntax..ast..parse..ParserI$LT$$u27$s$C$$u20$P$GT$$GT$::parse_with_comments::hf16ef911303afec5 (.llvm.5613980570953111795)
         15474 ┊     2.35% ┊ m_lexer::LexerBuilder::tokens::he6b17c52fc4bc827
         14940 ┊     2.27% ┊ regex::re_unicode::Regex::find_at::h066e95c0974f568e
          8416 ┊     1.28% ┊ _$LT$regex_syntax..ast..parse..ParserI$LT$$u27$s$C$$u20$P$GT$$GT$::parse_escape::hab30c508fe412a28
          8197 ┊     1.24% ┊ _$LT$regex_syntax..hir..translate..TranslatorI$LT$$u27$t$C$$u20$$u27$p$GT$$u20$as$u20$regex_syntax..ast..visitor..Visitor$GT$::visit_post::h8093a239f5cfbc33
          7999 ┊     1.21% ┊ regex::compile::Compiler::c::hab908c1e03973ed5
          6457 ┊     0.98% ┊ data[3407]
          5721 ┊     0.87% ┊ data[3479]
          5697 ┊     0.87% ┊ data[289]
          5665 ┊     0.86% ┊ data[3421]
          5549 ┊     0.84% ┊ regex::exec::ExecNoSync::find_nfa::h2c64f172cab8f537
          5417 ┊     0.82% ┊ data[3379]
          5297 ┊     0.80% ┊ data[3632]
          5273 ┊     0.80% ┊ data[3657]
          5169 ┊     0.78% ┊ data[3431]
          5113 ┊     0.78% ┊ data[3616]
          5097 ┊     0.77% ┊ data[3473]
          5057 ┊     0.77% ┊ data[3659]

% 

@matklad
Copy link
Author

matklad commented Jul 15, 2018

And, to further rule-out debug symbols hypothesis, adding --keep-debug flag to wasm-bindgen increases the size to 900k.

@fitzgen
Copy link
Member

fitzgen commented Jul 16, 2018

Aside: we should really get wasm code size into https://perf.rust-lang.org/ -- @alexcrichton do you know who I would talk to about that?

@alexcrichton
Copy link
Contributor

I'm trying to drill into what's going on here and it definitely seems like it's static data related, although I can't quite figure out how. I've been able to roughly reproduce the numbers that @matklad is getting, although not exactly. I may be using a different version of tom, not sure!

In any case I first tried twiggy diff but something is up with the symbols which makes this not super useful. After that I ran wasm-opt over both binaries (before and after). In both cases about 80k was shaved off and they're ~25k different (larger after LLVM 7). Executing twiggy diff again yields:

 Delta Bytes │ Item
─────────────┼───────────
     -162164 ┊ data[38]
      -28560 ┊ code[106]
      +28461 ┊ code[107]
      +21634 ┊ <total>
      -17928 ┊ data[37]
      -14639 ┊ code[101]
      +14425 ┊ code[102]
      +14157 ┊ code[74]
      -13522 ┊ code[73]
       -7563 ┊ code[115]
       +7497 ┊ code[116]
       -7476 ┊ code[182]
       +6496 ┊ code[183]
       +6456 ┊ data[3420]
       +5720 ┊ data[3492]
       +5696 ┊ data[302]
       +5664 ┊ data[3434]
       +5416 ┊ data[3392]
       +5292 ┊ data[3645]
       +5272 ┊ data[3670]
       +5168 ┊ data[3444]
       -5161 ┊ code[310]
       +5150 ┊ code[309]
       +5112 ┊ data[3629]
       +5096 ┊ data[3486]
       +5056 ┊ data[3672]
       +4896 ┊ data[3408]
       +4888 ┊ data[3406]
       +4864 ┊ data[3400]

Interpreting this looks like 21k bytes were added. Each pair of code outputs is pretty similar, and I'm assumign that these roughly match up to the same function in both executables (the symbols are tweaked slightly). The "code" segments, however, are all over the place. It looks like LLVM 7 took the one massive data chunk and split it up into a lot of little pieces (which I think makes sense?). Somehow though 21k got added on, and I'm not sure how to account for that.

It's worth remembering though that we started passing the --stack-first option to LLD which means that all offsets into global data are larger than before, so take a bit more to encode in ULEB-128 encoding. If I remove this option the final binary is 5k smaller, so that doesn't account for everything here though.

Overall I'm not sure where the other 16k came from. I'd probably just chalk it up to "LLVM changed thousands of lines of code" and something got lost in the middle.

@alexcrichton
Copy link
Contributor

@fitzgen I believe simulacrum is the one you want to ping for that! (or perhaps rust-lang/rustc-perf#145)

@matklad
Copy link
Author

matklad commented Jul 16, 2018

Turns out I've initially misinterpreted results of twiggy garbage: I thought that it shows all garbage (while it actually shows top 10 by default). So, it is not the case that "we've lost chunk of garbage weighing 30% of the binary (yay!), but gained these presents elsewhere", it is "the big chunk of garbage is split into the number of smaller ones, and we've gained some weight elsewhere", which is much less mysterious.

That is, twiggy garbage -n 99999 main_bg.wasm | wc -l went from 1082 to 5557.

@matklad
Copy link
Author

matklad commented Jul 17, 2018

Having looked at all the garbage, twiggy reports that approx 78% of the file is garbage (for both rustc versions). That does not sound reasonable at all, and I suspect that maybe its twiggy's bug?

@fitzgen
Copy link
Member

fitzgen commented Jul 17, 2018

Almost certainly, yeah.

Without relocations, we can't precisely know which data segments are truly used where, which is a big source of blindness.

@Hywan
Copy link
Contributor

Hywan commented Jul 17, 2018

My WASM binary went from 101kb to 1.4Mb. Gloups :-).(https://github.com/Hywan/gutenberg-parser-rs)

The binary (as zip for Github).

@Hywan
Copy link
Contributor

Hywan commented Jul 17, 2018

Here are some insights:

$ twiggy top gutenberg_post_parser.debug.wasm | more                                                    
 Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        497896 ┊    34.55% ┊ custom section '.debug_str'
        401840 ┊    27.88% ┊ custom section '.debug_info'
        186106 ┊    12.91% ┊ custom section '.debug_line'
        121625 ┊     8.44% ┊ custom section '.debug_ranges'
         94344 ┊     6.55% ┊ custom section '.debug_pubnames'
         69273 ┊     4.81% ┊ custom section '.debug_pubtypes'
         26935 ┊     1.87% ┊ custom section '.debug_abbrev'
         23197 ┊     1.61% ┊ custom section '.debug_loc'
          4373 ┊     0.30% ┊ gutenberg_post_parser::parser::block::h083d81593f7d2e78
          3310 ┊     0.23% ┊ "function names" subsection
          1368 ┊     0.09% ┊ gutenberg_post_parser::parser::block_list::hf1229adfce36159c
          1328 ┊     0.09% ┊ gutenberg_post_parser_wasm::into_bytes::hbf62699fc3d62afe
           994 ┊     0.07% ┊ root
           917 ┊     0.06% ┊ gutenberg_post_parser::parser::namespaced_block_name::hcf3c434aea2086d0

@Hywan
Copy link
Contributor

Hywan commented Jul 17, 2018

Note: Tested with the latest rustc and latest wasm-opt version.

I've updated wasm-gc because I remember a patch about .debug_ section, and boom, 16kb. Sixteen kilobytes. How is it possible? That's crazy! 🎉

So… 101kb to 16kb. Nothing to say :-).

@alexcrichton
Copy link
Contributor

I think the time for debugging this has since passed, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants