Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

major regression in rustc memory usage #6637

Closed
thestinger opened this issue May 20, 2013 · 17 comments
Closed

major regression in rustc memory usage #6637

thestinger opened this issue May 20, 2013 · 17 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@thestinger
Copy link
Contributor

You can compare the stage0 and stage1 memory usage at 3a481c0. The peak RES usage compiling librustc has gone from ~1800MiB to ~2400-2500MiB.

@graydon: you'll probably be interested in this

@graydon
Copy link
Contributor

graydon commented May 21, 2013

I am interested in that! But comparing stage0 and stage1 is not much to go on: they differ by whatever has happened since the last snapshot. If you (or someone -- I can try to set up a run if nobody else wants to) bisect from the previous snapshot-registering commit to this point, you should find the culprit.

@bstrie
Copy link
Contributor

bstrie commented May 21, 2013

Does this have to do with the fact that stage0 is built with optimizations, but stage1 isn't?

@graydon
Copy link
Contributor

graydon commented May 23, 2013

No, both versions are opt.

I bisected down to either of 1daaf78 or 030c666 : the latter won't build and the former is definitely bad. Doesn't matter, they're both the same pull req.

Paging @alexcrichton ?

@catamorphism
Copy link
Contributor

@graydon It seems unlikely that either of those commits could explain all of it, though? I noticed huge slowdowns in, for example, typeck and trans, and those commits only appear to affect lint.

@alexcrichton
Copy link
Member

I'd be surprised if either of those were the root cause of this. @catamorphism is right in that those should only affect linting predominately.

I can try to investigate this though to make sure that it's not those commits.

@graydon
Copy link
Contributor

graydon commented May 23, 2013

@catamorphism This is about a memory regression, not a speed regression. It only takes one typo to cause a cycle-leak. I'm not certain it's this rev, but it's the one bisect found. As in: the rev before uses 1.8gb to build stage1, this rev uses 2.5gb. I'm not sure what else to say about that; I'll try re-confirming or slicing the history graph differently tomorrow. Gotta do some laundry now.

I'm not aware of this being directly connected to a speed regression. Can you find a pick out a time when the build was fast, for you, so that I can bisect that?

@alexcrichton
Copy link
Member

@graydon one thing that would be useful if possible would be differences in the heap graphs over time. If I knew in what stage of the build the memory started deviating largely from what it was before, I could probably see what the problem was.

@graydon
Copy link
Contributor

graydon commented May 23, 2013

@alexcrichton massif can produce those graphs. If you like I can make a pair of them tomorrow; if you have it installed locally it will work the same for you or me.

@graydon
Copy link
Contributor

graydon commented May 23, 2013

that is, valgrind --tool=massif <command> followed by ms_print massif.out.<pid>

@alexcrichton
Copy link
Member

I ran out of time tonight, but I did manage to get one dump of the bad version of the compiler (https://gist.github.com/alexcrichton/5633336). I was having trouble making either heads or tails of the data...

@jdm
Copy link
Contributor

jdm commented May 23, 2013

From profile 55, which showed 2,283,406,322B of memory usage, we see:

  • 22% under middle::astencode::decode_side_tables
  • 12% under llvm::User::operator new, all from various places regarding trans calling into LLVM
  • 8% from various LLVM things related to modules passes
  • 7% under things that massif thinks are too small to care about profiling, so we should change some default values there
  • 7% from the LLVM slab allocator for symbols
  • 4% under fold::extensions::meth_18501::fold_expr
  • 3% from the ast_map hashmap from map_expr
  • 3% under fold::noop_fold_path

etc. The massif docs have a nice clear explanation of how to read the results. The peak snapshot is usually the most interesting one to analyze.

@graydon
Copy link
Contributor

graydon commented May 23, 2013

When reading, it's important to differentiate local and exchange heap allocations. They are separately reported.

@alexcrichton
Copy link
Member

OK, so here's the result of some more investigation.

Before this patch: (rev 918bfa7)


    GB
1.748^                                                                       :
     |                                      @@                            @:#:
     |                                    @@@                            :@:#:
     |                                    @@@                        :::::@:#:
     |                                    @@@                    :::@:::::@:#:
     |                                    @@@               :::@@:::@:::::@:#:
     |                                  ::@@@         :::::::: @ :::@:::::@:#:
     |                                :@: @@@     :::::: : ::: @ :::@:::::@:#:
     |                             @@::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |                             @ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |                            @@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |                            @@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |                    ::@:@@::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |          ::::::@:::::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |    ::::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |    : ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |    : ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |   @: ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |   @: ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |  @@: ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   172.6

After this patch (rev 1daaf78)

    GB
2.359^                                                                       :
     |                                                                    @:#:
     |                                                                @:@:@:#:
     |                                                          ::::@@@:@:@:#:
     |                                                    ::::::: : @@@:@:@:#:
     |                                             @@@@::@::: ::: : @@@:@:@:#:
     |                                     @@:::@@@@ @ : @::: ::: : @@@:@:@:#:
     |                                    @@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                                    @@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                                    @@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                                @@@@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                              @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                              @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                            ::@@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                     :::::::: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |       :::::::@::@@::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |   @@:::: ::: @::@ ::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |   @@: :: ::: @::@ ::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |   @@: :: ::: @::@ ::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |   @@: :: ::: @::@ ::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   172.8

Can you spot the difference? As it turns out, about 700MB was free'd in that drop in the before graph, which is the difference between the before and after. Of this, nearly all of it was from smallintmap::insert during middle::astencode::decode_side_tables. So, from what I can tell, this just means that something which is being decoded isn't being deallocated?

I'm still having trouble understanding how this relates to the lint patch. This definitely didn't touch anything in trans, much less anything in ast encoding/decoding. Any ideas?

@graydon
Copy link
Contributor

graydon commented May 28, 2013

Looking at your before/after logs, it looks like the sidetable decoding costs are present in both:

pre:

...
#-----------
snapshot=32
#-----------
time=99635562745
mem_heap_B=1706284326
mem_heap_extra_B=93736826
mem_stacks_B=0
heap_tree=detailed
n7: 1706284326 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 633714405 0x2E37219: rust_exchange_alloc::realloc(void*, unsigned long) (rust_exchange_alloc.cpp:32)
  n1: 633714405 0x2E2D67C: vec_reserve_shared (rust_util.h:75)
   n3: 633714405 0x1659D: vec::rustrt::vec_reserve_shared::_fd87ae5bf33686b::_07pre
    n2: 536870960 0x1375924: smallintmap::__extensions__::insert_60798::_4ecd04f78c24ce::_07pre
     n1: 536870960 0x1592486: middle::astencode::decode_side_tables::anon::expr_fn_79872
      n1: 536870960 0x50C70D: ebml::reader::docs::_29cfc8699845a3a::_07pre
       n1: 536870960 0xDA1A8F: middle::astencode::decode_inlined_item::_cd72efb8a779f80::_07pre
        n1: 536870960 0xD9FC48: middle::trans::inline::maybe_instantiate_inline::anon::expr_fn_19981
...

post:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 60 180,036,106,317    2,511,642,448    2,394,469,533   117,172,915            0
 61 181,210,964,676    2,524,188,480    2,406,285,136   117,903,344            0
95.33% (2,406,285,136B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->24.66% (622,355,697B) 0x2E2D219: rust_exchange_alloc::realloc(void*, unsigned long) (rust_exchange_alloc.cpp:32)
| ->24.66% (622,355,697B) 0x2E2367C: vec_reserve_shared (rust_util.h:75)
|   ->24.66% (622,355,697B) 0x1659D: vec::rustrt::vec_reserve_shared::_bcc196118a2caa62::_07pre
|     ->21.27% (536,870,960B) 0x1373874: smallintmap::__extensions__::insert_60769::_4ecd04f78c24ce::_07pre
|     | ->21.27% (536,870,960B) 0x1588B36: middle::astencode::decode_side_tables::anon::expr_fn_80017
|     | | ->21.27% (536,870,960B) 0x50B71D: ebml::reader::docs::_29cfc8699845a3a::_07pre
|     | |   ->21.27% (536,870,960B) 0xDA053F: middle::astencode::decode_inlined_item::_2fc881d8e6edb7c7::_07pre

So I don't think it's that (curious why the logs differ between runs though -- different ms_print binaries?)
It's very interesting to know we're spending so much on decoded ASTs. We should fix that! 600mb there is absurd enough on its own.

Anyway, the "drop off" you see in the first graph but not the second is supposed to be the place where we drop all pre-LLVM state we can. If you look in the librustc/driver/driver.rs you'll see something like:

    let (llmod, link_meta) = {
    ...
    }
    ...
    link::write::run_passes(sess, llmod, sess.opts.output_type, &outputs.obj_filename));

The stuff in the block is supposed to get dropped, leaving only llmod and link_meta to pass to LLVM. This is not happening, so the result is we retain it too long.

This is what I expected was happening. As to why, I'd guess we're making a new cycle that keeps everything alive. We don't have a tracing GC at this point (working on it!), so even one bad cycle will keep very large things alive-and-leaked until task death.

I'll keep reading through the code and trying to find such a new cycle, but I'd bet that's what we're seeing. Any further hints from your knowledge of "what the code does" would be appreciated.

@graydon
Copy link
Contributor

graydon commented May 28, 2013

Update: talked to @alexcrichton on IRC and it seems likely this is a cycle between the visitors and the lint context. Meanwhile I made a small patch (#6776) that removes the smallintmap pathology in decoding ASTs, saves 400mb.

Please keep looking on this one though!

bors added a commit that referenced this issue May 28, 2013
Via my super-accurate timing (`/usr/bin/time -l`), I get 2431119360 => 1884430336 == ~520MB

Closes #6637.
@alexcrichton
Copy link
Member

Just confirmed this after 6264df5. The stage0 peak is around 2.2GB, and the stage1 peak is around 1.5GB

@thestinger
Copy link
Contributor Author

Yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

6 participants