major regression in rustc memory usage #6637

thestinger · 2013-05-20T07:13:03Z

You can compare the stage0 and stage1 memory usage at 3a481c0. The peak RES usage compiling librustc has gone from ~1800MiB to ~2400-2500MiB.

@graydon: you'll probably be interested in this

The text was updated successfully, but these errors were encountered:

graydon · 2013-05-21T03:45:02Z

I am interested in that! But comparing stage0 and stage1 is not much to go on: they differ by whatever has happened since the last snapshot. If you (or someone -- I can try to set up a run if nobody else wants to) bisect from the previous snapshot-registering commit to this point, you should find the culprit.

bstrie · 2013-05-21T12:27:12Z

Does this have to do with the fact that stage0 is built with optimizations, but stage1 isn't?

graydon · 2013-05-23T05:30:49Z

No, both versions are opt.

I bisected down to either of 1daaf78 or 030c666 : the latter won't build and the former is definitely bad. Doesn't matter, they're both the same pull req.

Paging @alexcrichton ?

catamorphism · 2013-05-23T05:32:28Z

@graydon It seems unlikely that either of those commits could explain all of it, though? I noticed huge slowdowns in, for example, typeck and trans, and those commits only appear to affect lint.

alexcrichton · 2013-05-23T05:52:37Z

I'd be surprised if either of those were the root cause of this. @catamorphism is right in that those should only affect linting predominately.

I can try to investigate this though to make sure that it's not those commits.

graydon · 2013-05-23T05:56:33Z

@catamorphism This is about a memory regression, not a speed regression. It only takes one typo to cause a cycle-leak. I'm not certain it's this rev, but it's the one bisect found. As in: the rev before uses 1.8gb to build stage1, this rev uses 2.5gb. I'm not sure what else to say about that; I'll try re-confirming or slicing the history graph differently tomorrow. Gotta do some laundry now.

I'm not aware of this being directly connected to a speed regression. Can you find a pick out a time when the build was fast, for you, so that I can bisect that?

alexcrichton · 2013-05-23T06:22:44Z

@graydon one thing that would be useful if possible would be differences in the heap graphs over time. If I knew in what stage of the build the memory started deviating largely from what it was before, I could probably see what the problem was.

graydon · 2013-05-23T06:26:23Z

@alexcrichton massif can produce those graphs. If you like I can make a pair of them tomorrow; if you have it installed locally it will work the same for you or me.

graydon · 2013-05-23T06:27:40Z

that is, valgrind --tool=massif <command> followed by ms_print massif.out.<pid>

alexcrichton · 2013-05-23T07:41:40Z

I ran out of time tonight, but I did manage to get one dump of the bad version of the compiler (https://gist.github.com/alexcrichton/5633336). I was having trouble making either heads or tails of the data...

jdm · 2013-05-23T14:27:58Z

From profile 55, which showed 2,283,406,322B of memory usage, we see:

22% under middle::astencode::decode_side_tables
12% under llvm::User::operator new, all from various places regarding trans calling into LLVM
8% from various LLVM things related to modules passes
7% under things that massif thinks are too small to care about profiling, so we should change some default values there
7% from the LLVM slab allocator for symbols
4% under fold::extensions::meth_18501::fold_expr
3% from the ast_map hashmap from map_expr
3% under fold::noop_fold_path

etc. The massif docs have a nice clear explanation of how to read the results. The peak snapshot is usually the most interesting one to analyze.

graydon · 2013-05-23T14:38:44Z

When reading, it's important to differentiate local and exchange heap allocations. They are separately reported.

alexcrichton · 2013-05-23T18:43:54Z

OK, so here's the result of some more investigation.

Before this patch: (rev 918bfa7)


    GB
1.748^                                                                       :
     |                                      @@                            @:#:
     |                                    @@@                            :@:#:
     |                                    @@@                        :::::@:#:
     |                                    @@@                    :::@:::::@:#:
     |                                    @@@               :::@@:::@:::::@:#:
     |                                  ::@@@         :::::::: @ :::@:::::@:#:
     |                                :@: @@@     :::::: : ::: @ :::@:::::@:#:
     |                             @@::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |                             @ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |                            @@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |                            @@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |                    ::@:@@::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |          ::::::@:::::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |    ::::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |    : ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |    : ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |   @: ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |   @: ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
     |  @@: ::::: ::::@:: ::@:@ ::@@ ::@: @@@ :::@: :::: : ::: @ :::@:::::@:#:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   172.6

After this patch (rev 1daaf78)

    GB
2.359^                                                                       :
     |                                                                    @:#:
     |                                                                @:@:@:#:
     |                                                          ::::@@@:@:@:#:
     |                                                    ::::::: : @@@:@:@:#:
     |                                             @@@@::@::: ::: : @@@:@:@:#:
     |                                     @@:::@@@@ @ : @::: ::: : @@@:@:@:#:
     |                                    @@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                                    @@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                                    @@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                                @@@@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                              @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                              @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                            ::@@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |                     :::::::: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |       :::::::@::@@::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |   @@:::: ::: @::@ ::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |   @@: :: ::: @::@ ::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |   @@: :: ::: @::@ ::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
     |   @@: :: ::: @::@ ::: ::: :: @@@ @@@@ : :@@ @ @ : @::: ::: : @@@:@:@:#:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   172.8

Can you spot the difference? As it turns out, about 700MB was free'd in that drop in the before graph, which is the difference between the before and after. Of this, nearly all of it was from smallintmap::insert during middle::astencode::decode_side_tables. So, from what I can tell, this just means that something which is being decoded isn't being deallocated?

I'm still having trouble understanding how this relates to the lint patch. This definitely didn't touch anything in trans, much less anything in ast encoding/decoding. Any ideas?

graydon · 2013-05-28T01:21:38Z

Looking at your before/after logs, it looks like the sidetable decoding costs are present in both:

pre:

...
#-----------
snapshot=32
#-----------
time=99635562745
mem_heap_B=1706284326
mem_heap_extra_B=93736826
mem_stacks_B=0
heap_tree=detailed
n7: 1706284326 (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
 n1: 633714405 0x2E37219: rust_exchange_alloc::realloc(void*, unsigned long) (rust_exchange_alloc.cpp:32)
  n1: 633714405 0x2E2D67C: vec_reserve_shared (rust_util.h:75)
   n3: 633714405 0x1659D: vec::rustrt::vec_reserve_shared::_fd87ae5bf33686b::_07pre
    n2: 536870960 0x1375924: smallintmap::__extensions__::insert_60798::_4ecd04f78c24ce::_07pre
     n1: 536870960 0x1592486: middle::astencode::decode_side_tables::anon::expr_fn_79872
      n1: 536870960 0x50C70D: ebml::reader::docs::_29cfc8699845a3a::_07pre
       n1: 536870960 0xDA1A8F: middle::astencode::decode_inlined_item::_cd72efb8a779f80::_07pre
        n1: 536870960 0xD9FC48: middle::trans::inline::maybe_instantiate_inline::anon::expr_fn_19981
...

post:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 60 180,036,106,317    2,511,642,448    2,394,469,533   117,172,915            0
 61 181,210,964,676    2,524,188,480    2,406,285,136   117,903,344            0
95.33% (2,406,285,136B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->24.66% (622,355,697B) 0x2E2D219: rust_exchange_alloc::realloc(void*, unsigned long) (rust_exchange_alloc.cpp:32)
| ->24.66% (622,355,697B) 0x2E2367C: vec_reserve_shared (rust_util.h:75)
|   ->24.66% (622,355,697B) 0x1659D: vec::rustrt::vec_reserve_shared::_bcc196118a2caa62::_07pre
|     ->21.27% (536,870,960B) 0x1373874: smallintmap::__extensions__::insert_60769::_4ecd04f78c24ce::_07pre
|     | ->21.27% (536,870,960B) 0x1588B36: middle::astencode::decode_side_tables::anon::expr_fn_80017
|     | | ->21.27% (536,870,960B) 0x50B71D: ebml::reader::docs::_29cfc8699845a3a::_07pre
|     | |   ->21.27% (536,870,960B) 0xDA053F: middle::astencode::decode_inlined_item::_2fc881d8e6edb7c7::_07pre

So I don't think it's that (curious why the logs differ between runs though -- different ms_print binaries?)
It's very interesting to know we're spending so much on decoded ASTs. We should fix that! 600mb there is absurd enough on its own.

Anyway, the "drop off" you see in the first graph but not the second is supposed to be the place where we drop all pre-LLVM state we can. If you look in the librustc/driver/driver.rs you'll see something like:

    let (llmod, link_meta) = {
    ...
    }
    ...
    link::write::run_passes(sess, llmod, sess.opts.output_type, &outputs.obj_filename));

The stuff in the block is supposed to get dropped, leaving only llmod and link_meta to pass to LLVM. This is not happening, so the result is we retain it too long.

This is what I expected was happening. As to why, I'd guess we're making a new cycle that keeps everything alive. We don't have a tracing GC at this point (working on it!), so even one bad cycle will keep very large things alive-and-leaked until task death.

I'll keep reading through the code and trying to find such a new cycle, but I'd bet that's what we're seeing. Any further hints from your knowledge of "what the code does" would be appreciated.

graydon · 2013-05-28T02:52:37Z

Update: talked to @alexcrichton on IRC and it seems likely this is a cycle between the visitors and the lint context. Meanwhile I made a small patch (#6776) that removes the smallintmap pathology in decoding ASTs, saves 400mb.

Please keep looking on this one though!

Via my super-accurate timing (`/usr/bin/time -l`), I get 2431119360 => 1884430336 == ~520MB Closes #6637.

alexcrichton · 2013-05-28T06:57:44Z

Just confirmed this after 6264df5. The stage0 peak is around 2.2GB, and the stage1 peak is around 1.5GB

thestinger · 2013-05-28T13:56:54Z

Yay!

alexcrichton mentioned this issue May 28, 2013

Prevent refcount cycles during linting #6777

Merged

bors added a commit that referenced this issue May 28, 2013

auto merge of #6777 : alexcrichton/rust/less-memory, r=graydon

6264df5

Via my super-accurate timing (`/usr/bin/time -l`), I get 2431119360 => 1884430336 == ~520MB Closes #6637.

alexcrichton closed this as completed May 28, 2013

pietroalbini mentioned this issue Feb 10, 2019

Bump cargo to 865cb70 #58358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

major regression in rustc memory usage #6637

major regression in rustc memory usage #6637

thestinger commented May 20, 2013

graydon commented May 21, 2013

bstrie commented May 21, 2013

graydon commented May 23, 2013

catamorphism commented May 23, 2013

alexcrichton commented May 23, 2013

graydon commented May 23, 2013

alexcrichton commented May 23, 2013

graydon commented May 23, 2013

graydon commented May 23, 2013

alexcrichton commented May 23, 2013

jdm commented May 23, 2013

graydon commented May 23, 2013

alexcrichton commented May 23, 2013

graydon commented May 28, 2013

graydon commented May 28, 2013

alexcrichton commented May 28, 2013

thestinger commented May 28, 2013

major regression in rustc memory usage #6637

major regression in rustc memory usage #6637

Comments

thestinger commented May 20, 2013

graydon commented May 21, 2013

bstrie commented May 21, 2013

graydon commented May 23, 2013

catamorphism commented May 23, 2013

alexcrichton commented May 23, 2013

graydon commented May 23, 2013

alexcrichton commented May 23, 2013

graydon commented May 23, 2013

graydon commented May 23, 2013

alexcrichton commented May 23, 2013

jdm commented May 23, 2013

graydon commented May 23, 2013

alexcrichton commented May 23, 2013

graydon commented May 28, 2013

graydon commented May 28, 2013

alexcrichton commented May 28, 2013

thestinger commented May 28, 2013