Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: a macro never hit recursion limit, cargo build run... forever! #51754

Open
limira opened this issue Jun 24, 2018 · 25 comments
Open

Bug: a macro never hit recursion limit, cargo build run... forever! #51754

limira opened this issue Jun 24, 2018 · 25 comments
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@limira
Copy link
Contributor

limira commented Jun 24, 2018

I know what is my mistake, but I think there is a bug in Rust!
Edit: We just need a better help from the compiler to easily spot out mistake like this.

Edit: the problem seems just involve only the first rule. So the new minimal buggy code is:

macro_rules! there_is_a_bug {
    ( $id:ident: $($tail:tt)* ) => {
        there_is_a_bug! { $($tail:tt)* } // I dit it wrong here `:tt`
    };
}
fn main() {
    there_is_a_bug! { something: more {} }
}

cargo build will run forever! (well, I just wait for it for about ten minutes!)

rustup show
stable-x86_64-unknown-linux-gnu (default)
rustc 1.27.0 (3eda71b00 2018-06-19)
@jonas-schievink
Copy link
Contributor

there_is_a_bug! { $($tail:tt)* } will triple the size of $tail on each invocation, you might be hitting exponential blowup here which the recursion limit can't catch.

@limira
Copy link
Contributor Author

limira commented Jun 24, 2018

... I think I could just partially understand what you say! My impression is that the macro-expansion go depth-first, so I think it should hit recursion limit very soon.

I tried with trace_macros!(true); but cargo build just keep running without any output. In the actual code, which is much bigger than there_is_a_bug!, it is really hard to find where cause the problem because the compiler is unable to help in this case!

@limira limira changed the title Bug: a macro never hit recursion limit Bug: a macro never hit recursion limit, cargo build run... forever! Jun 24, 2018
@stokhos stokhos added A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-build labels Jun 24, 2018
@ExpHP
Copy link
Contributor

ExpHP commented Jun 29, 2018

So, expansion does go depth first. What's blowing up is the argument list:

there_is_a_bug! { }
// expands to 
there_is_a_bug! { :tt }
// expands to 
there_is_a_bug! { : :tt tt :tt }
// expands to 
there_is_a_bug! { : :tt : :tt tt :tt tt :tt : :tt tt:tt }
// ...

In order to emit a recursion error here without first blowing the heap, I guess it would have to produce the inner token stream lazily. I'm not sure how much this could impact performance?

I tried with trace_macros!(true); but cargo build just keep running without any output.

Arrhhh! :) The old trace_macros circa 1.0 could have helped here (the one which simply dumped the invocations directly to stdout, as opposed to the new one which goes through the diagnostic APIs).

@limira
Copy link
Contributor Author

limira commented Jun 29, 2018

Oh, thank you very much! What is still remain as a question for me is that: why the memory usage increase so slow!? When run cargo build the minimal code above, I open System monitor and watch rustc. I expect that its memory usage must increase drastically, but in contrast to my expectation, it increase... slowly. But I am glad that it does not eat out all my memory, because when my system (Fedora 27) is out of memory I don't know what to do except cutting it power.

@ExpHP
Copy link
Contributor

ExpHP commented Jun 30, 2018

What is still remain as a question for me is that: why the memory usage increase so slow!?

This is an interesting question, so I tried taking a look into it using perf. Unfortunately there isn't much I can say with any confidence!

amateur hour theatre: `perf` edition

perf stat shows:

 Performance counter stats for 'rustc src/main.rs':

    30,455,642,992      instructions              #    0.61  insn per cycle           (50.05%)
    50,185,555,208      cycles                                                        (29.97%)
    17,998,100,630      L1-dcache-loads                                               (39.92%)
       610,971,632      L1-dcache-load-misses     #    3.39% of all L1-dcache hits    (49.93%)
        77,208,094      dTLB-load-misses                                              (49.93%)
     1,831,023,187      cache-references                                              (39.98%)
     5,621,259,040      branch-instructions                                           (50.00%)
         9,967,961      branch-misses             #    0.18% of all branches          (50.05%)
     8,744,794,469      L1-icache-loads                                               (40.08%)
        59,622,536      L1-icache-misses          #    0.68% of all L1-icache hits    (40.04%)

3.3% cache misses sounds like maybe a lot for what is mostly a bunch of similar code running without allocating memory at a fast rate; but I'm not sure.


The three hottest functions identified by a basic perf record are:

% samples  object      demangled name
47.93%    libsyntax   <Vec<T> as Clone>::clone
24.47%    libsyntax   <Vec<T> as Drop>::drop
16.79%    libsyntax   <Option<&'a T>>::cloned

What are the Ts? Well, to be honest, I don't know. Looking at the disassembly output my best guess is that in all three cases, T is ext::tt::macro_parser::NamedMatch.

Aside: I see calls to a function called <Span as Clone>::clone@plt, which point to this funny looking stub in the .plt section. It's hard for me to understand why <Span as Clone>::clone has any presence in an optimized binary!


What's calling those three functions so often?

Unfortunately, perf record -g is giving me useless callstacks with no information, even on an (optimized) compiler built with debug info. That said, there aren't very many clones in macro_parser.rs. One thing does stand out to me which is

impl<'a> MatcherPos<'a> {                                                                           
    /// Add `m` as a named match for the `idx`-th metavar.                                          
    fn push_match(&mut self, idx: usize, m: NamedMatch) {                                           
        let matches = Rc::make_mut(&mut self.matches[idx]);                                         
        matches.push(m);                                                                            
    }                                                                                               
}

which potentially clones a Vec<NamedMatch> (in Rc::make_mut) just to push a single item. However I can't find any "smoking gun" evidence that this is the problem.

@ExpHP
Copy link
Contributor

ExpHP commented Jun 30, 2018

By the way, when I mentioned "the old trace_macros," this is actually a thing you can do:

rustup toolchain add 1.5.0
cargo +1.5.0 build

If your crate begins with

#![feature(trace_macros)]
trace_macros!(true);

then it will do the tracing to stdout, even though #![feature] isn't supposed to work on stable compilers. (what happens is that macro expansion occurs before the compiler checks whether you are actually allowed to use #[feature] flags!)

@ExpHP
Copy link
Contributor

ExpHP commented Jun 30, 2018

One last thing: Here's a flamegraph generated from a stage 1 debug compiler. This is a non-optimized compiler since the optimized debug compiler was producing callstacks that provided zero context for any of the functions of intrest (the second item in the callstack was <unknown> at some raw address).

Flamegraph: perf.svg.gz

It suggests the following:

  • As I suspected, push_match is frequently being called in places where it must perform an O(n) clone (34661 samples for Vec::clone) just to push a single item onto a vec (46 samples for Vec::push).
  • macro_parser::parse very frequently drops an ArrayVec<MatcherPos> which requires several layers of deinitialization code to decrement refcounts in NamedMatches.

What I do not know, however, is whether these are performance bugs that can be fixed, or if they are concessions that were made in favor of optimizing hotter codepaths.

@limira
Copy link
Contributor Author

limira commented Jul 1, 2018

By the way, when I mentioned "the old trace_macros," this is actually a thing you can do

Thanks for the info! Maybe the current trace_macros!(true); need to be improved to help cases like this without having to go back time.

@ExpHP
Copy link
Contributor

ExpHP commented Jul 1, 2018

I'm going to create a separate issue for the lack of trace_macros! output, so that this issue can be about the performance of this macro's expansion (which I think ought to hit some sort of error more quickly)

@ExpHP
Copy link
Contributor

ExpHP commented Jul 1, 2018

(I think this should be tagged with I-slow or something similar, can anyone add that?)

@oli-obk oli-obk added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jul 1, 2018
@oli-obk
Copy link
Contributor

oli-obk commented Jul 1, 2018

This seems like a perfect fit for immutable datastructures. Only cloning if something is actually modified, otherwise just keep adding stuff to the end

@ExpHP
Copy link
Contributor

ExpHP commented Jul 2, 2018

Sounds like a good idea at least from a textbook perspective; though I wonder about cache locality, and whether it really is necessary to have all of these intermediate values hanging around with one less item in the first place. The multiple levels of recursively-called drops in the flame-graph indicate to me that refcounts drop to zero on a regular basis.

(that said, I myself don't really understand what the method is doing or why it is written this way)

@ExpHP
Copy link
Contributor

ExpHP commented Jul 4, 2018

I still have a perverse fascination with why there's so much wasted sharing here, but still don't really understand what matches contains even with all of the nice documentation there. (I'd like to do some "println reverse engineering" to help ground my interpretations, but don't currently have a machine that isn't busy running some heavy computations!).

cc @mark-i-m who added much of this documentation recently and probably has a much better grasp of what the code is doing. Any thoughts?

@mark-i-m
Copy link
Member

mark-i-m commented Jul 4, 2018

@ExpHP I assume you mean this?

matches: Vec<Rc<Vec<NamedMatch>>>,

This field contains token trees that match against metavariables. Consider the following example:

// Macro definition
macro_rules! foo {
    ($($id:ident),*) => {}
}

// Invocation
let w;
let x;
let y;
let z;
foo!(w, x, y, z);

In this example $id is a metavariable. It will match against four different token trees: w, x, y, and z, so the contents of matches would be those token trees.

Does that make sense?


Regarding the inefficiencies of the code, it's been a while since I looked at it, so I'm not sure off the top of my head if/why such things might be necessary. I don't think the original authors made too much effort to make it efficient. Recently, @nnethercote when through and reduced the number of allocations in #50855.

@ExpHP
Copy link
Contributor

ExpHP commented Jul 5, 2018

Thanks for responding!

I think my greater confusion is with respect to when the Rc<Vec<_>>s are ever meaningfully shared. Are we sharing them between multiple branching candidate paths? Are we building a DAG? (this is what seems to be suggested by the Rc<Vec<NamedMatch>> recursively contained within NamedMatch).

The comments suggest[^1] there is sharing between the subparser state in a repetition and the top level state, yet it seems that everything in match_lo..=match_hi is guaranteed to be modified (and thus replicated if the refcount exceeds 1).


[^1] I say this due to the phrase

the matches field is shared (hence the Rc) among all "nested" matchers

though the next paragraph seems to contradict it:

Also, note that while we are descending into a sequence, matchers are given their own matches vector.

@mark-i-m
Copy link
Member

mark-i-m commented Jul 7, 2018

Ah, I see what you mean.

I'm not 100% sure, but I think there are a two different reasons why that exists:

  1. MatcherPoss are often cloned to produce the next position. We want to avoid cloning lots of Vecs and TokenTrees. Those clones appear in these two locations:

let mut new_pos = item.up.clone().unwrap();

let mut new_item = item.clone();

  1. NamedMatch::MatchedSeq can contain a collection of matches for a single metavariable that are shared by at least two possible matcher positions.

In particular, consider the following example. We want to match foo foo foo against $($x:ident)+. After processing the first foo, the matcher position looks like this (where o is the "dot"): $($x:ident o)+. There are two possible next matcher positions:
1. Start a new repetition: $(o $x:ident)+
2. Done repeating: $($x:ident)+ o

Both of these new matcher positions are added to cur_items. In the first case, we just reuse item after updating it:

next_items.push(item);

and

cur_items.push(item);

In the second case, we clone the up matcher (the context right before we entered the sequence), modify it so that the "dot" is after the sequence, and share the new matches we found with the new position. That occurs here:

let mut new_pos = item.up.clone().unwrap();
// Add matches from this repetition to the `matches` of `up`
for idx in item.match_lo..item.match_hi {
let sub = item.matches[idx].clone();
let span = span.with_lo(item.sp_lo);
new_pos.push_match(idx, MatchedSeq(sub, span));
}
// Move the "dot" past the repetition in `up`
new_pos.match_cur = item.match_hi;
new_pos.idx += 1;
cur_items.push(new_pos);

Does that make sense?

@ExpHP
Copy link
Contributor

ExpHP commented Jul 7, 2018

Thanks @mark-i-m, that was a huge help!

I have identified the problem. Furthermore, I now realize that this is a much broader performance issue than recursive macro calls (which is actually unrelated). For instance, it also affects the following:

macro_rules! idents {
	($($ident:ident)*) => { };
}

idents!{
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
 //     ...  65536 copies of "a" in total ...
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
}

My current understanding of the issue is as follows:

  • Any repetition will incur O(n^2) work as it matches n copies of the repetition, if it meets both of the following conditions:

    • the first thing in the repetition is a named matcher
    • the repetition is at the end of the macro pattern

    ...in other words, most (though not all) tt munchers!

The reason:

  • Every time inner_parse_loop reaches the end of the repetition (Graphic: $($t:tt o)*), it yields two items:
    1. The EOF position, in eof_items. (Graphic: $($t:tt)* o)
    2. The "repeating" position, in bb_items. (Graphic: $(o $t:tt)*)
  • Back in fn parse(), it checks for EOF. Assuming this fails, eof_items is no longer needed (but it is not yet dropped!)
  • It parses the nonterminal and pushes the match onto the bb_item:
    item.push_match(

    However, the Rc<Vec<NamedMatch>> is shared with eof_item, so this implicitly clones the Vec.
  • Then, it finally drops eof_items at the end of the loop { }:

With a small hack to drop eof_items before modifying bb_item, @limira's code sample now only takes 10 or so seconds to eat all of my memory and send my hard drive thrashing.

...hopefully, some working code compiles faster as well!

@limira
Copy link
Contributor Author

limira commented Jul 8, 2018

This is both good and bad news! The bad news is that if I do something like this in the future, my system (Fedora) will completely not responding (I really miss Windows went it come to this situation). Maybe it's time to learn how to use OOM Killer!

When the memory usage increase drastically, maybe we should assume that there is something that is incorrect (then stop/report an error)?

@ExpHP
Copy link
Contributor

ExpHP commented Jul 8, 2018

@limira I personally use the limits built into linux. People call me crazy for doing this, but having a guard rail is nice. (so thankfully, I was joking when I said the code snippet sent my hard drive thrashing!)

I don't know if this is the smartest way to do it, but:

/etc/security/limits.conf

    lampam           hard    as              8000000

When I need to temporarily override it, I switch to root to issue a ulimit -v command.

$ ulimit -v
8000000
$ sudo su
# ulimit -v unlimited
# su lampam   # su again to create a nested session; 'exit' would undo the new limit
$ ulimit -v
unlimited

The reason this saves your system from locking up is because it limits how much can be moved to swap (and in my (limited) experience it is heavy usage of swap that is usually responsible for the disk thrashing and the apparent system lockup when too much memory is used by a single application in linux)

@mark-i-m
Copy link
Member

mark-i-m commented Jul 8, 2018

Maybe it's time to learn how to use OOM Killer!

OOM killer is a feature of the Linux kernel, not the compiler. When the kernel can't allocate any more physical memory, it begins swapping (which is the reason your system becomes largely unresponsive). (Control-c or killing the rustc process some other way would help). If the kernel also runs out of swap space, it chooses some process to "OOM kill", usually the one with the most memory usage.

I think so far the policy of rustc has been to allow the compiler's memory usage to be unbounded, though there are some efforts to detect infinite loops in const fn evaluation.

@limira
Copy link
Contributor Author

limira commented Jul 8, 2018

@ExpHP: Thanks for the tip! I will try it.
@mark-i-m: Thanks for your info!

As in this comment (above), my buggy macro there_is_a_bug! suffers exponential blowup (triple its size on each call). Lets talk about the smallest exponential blowup, that it just double its size. In recursive macro, each invoke of the macro consume c (constant) tokens, but itself double the number of tokens that it is passing to the next call. If the first number of involved token is just n. After (about?) 32 recursive calls, the number of tokens that is required to be process could be: n * 2^32 - 32*c. This is an unreasonable number compare to the first number of tokens: n. Therefore, I think the solution can be very simple:

  1. Define a constant (like recursion_limit), maybe its name is: max_macro_grow_factor = 1024.
  2. Record the first number of tokens: n
  3. At each recursive call, check the new number of tokens: m
  4. If m > n * max_macro_grow_factor then
  • stop the macro-expansion process
  • report error to user
  • suggest them to check the involving rules
  • suggest them to increase max_macro_grow_factor = 2048 if they are sure there is no problem with their macro.

I think this is better than just let the compiler blindly run into the exponential blowup

@Ixrec
Copy link
Contributor

Ixrec commented Jul 8, 2018

Food for the bikeshed: The current recursion_limit seems like a limit on the depth of the macro call stack, while this new suggestion seems like a limit on the breadth of the macro call stack (where "breadth" is not number of calls, but total number of tokens in those calls).

@limira
Copy link
Contributor Author

limira commented Jul 8, 2018

A mistake like what I did here is rare, but it could happen to anyone. Before the fix by ExpHP, I wait for it more than 10 minutes (on my system), cargo build just keep running. After the fix, ExpHP build my code and it go out of memory about 10 seconds (on his system). Both cases, there is no message from the compiler. With a max_macro_grow_factor = 1024, and assume that the macro just simply double its size. The compiler will stop after just about 10 calls, then trace_macros!(true) can jump in to give some help.

bors pushed a commit that referenced this issue Jul 8, 2018
For a fuller description of the performance issue fixed by this:

#51754 (comment)
bors added a commit that referenced this issue Jul 8, 2018
Fix macro parser quadratic complexity in small repeating groups

Observed in #51754, and more easily demonstrated with the following:

```rust
macro_rules! stress {
    ($($t:tt)+) => { };
}

fn main() {
    stress!{
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        //    ... 65536 copies of "a" total ...
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
    }
}
```
which takes 50 seconds to compile prior to the fix and <1s after.

I hope this has a visible impact on the compile times for real code.  (I think it is most likely to affect incremental TT munchers that deal with large inputs, though it depends on how they are written)

For a fuller description of the performance issue:  #51754 (comment)

---

There is no test (yet) because I'm not sure how easily to measure this for regressions.
@ExpHP
Copy link
Contributor

ExpHP commented Jul 8, 2018

Note that, even prior to my PR, there are still other macros that experience exponential blowup:

(do not compile this!)

macro_rules! blowup {
  // (the brackets work around the performance bug)
  ([$($t:tt)+]) => { blowup!{[$($t:tt)+]};
}

fn main() {
  blowup!([a]);
}

I started a thread on internals for discussion on possible ways the compiler could help with such rogue macros:

https://internals.rust-lang.org/t/thought-saving-users-from-themselves-in-macros-that-explode/7907

@limira
Copy link
Contributor Author

limira commented Jul 9, 2018

We definitely need help from the compiler in these situation! Thanks for the push!

bors added a commit that referenced this issue Jul 17, 2018
Fix macro parser quadratic complexity in small repeating groups

Observed in #51754, and more easily demonstrated with the following:

```rust
macro_rules! stress {
    ($($t:tt)+) => { };
}

fn main() {
    stress!{
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        //    ... 65536 copies of "a" total ...
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
    }
}
```
which takes 50 seconds to compile prior to the fix and <1s after.

I hope this has a visible impact on the compile times for real code.  (I think it is most likely to affect incremental TT munchers that deal with large inputs, though it depends on how they are written)

For a fuller description of the performance issue:  #51754 (comment)

---

There is no test (yet) because I'm not sure how easily to measure this for regressions.
@nikic nikic added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels Dec 15, 2018
@varkor varkor added the I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. label Dec 20, 2018
@jonas-schievink jonas-schievink added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) and removed T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) A-build labels Apr 21, 2019
@Mark-Simulacrum Mark-Simulacrum removed the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Sep 2, 2019
@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 14, 2020
@JohnTitor JohnTitor added the C-bug Category: This is a bug. label Mar 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests