Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash Encountered With Emscripten 1.36.3 #1

Open
awtcode opened this issue Dec 5, 2016 · 12 comments
Open

Crash Encountered With Emscripten 1.36.3 #1

awtcode opened this issue Dec 5, 2016 · 12 comments

Comments

@awtcode
Copy link

awtcode commented Dec 5, 2016

Hi,

I tried using the optimizer and saw a crash in Emscripten 1.36.3. Here's part of the error log:

thread 'main' has overflowed its stack
DEBUG:root:EMCC_WASM_BACKEND tells us to use asm.js backend

splitting up js optimization into 124 chunks, using 40 cores (total: 301.94 MB)

Traceback (most recent call last):

File "E:\emsdk\emscripten\1.36.3\em++", line 13, in

emcc.run()

File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1731, in run

JSOptimizer.flush()

File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1643, in flush

run_passes(chunks[i], 'js_opts_' + str(i), just_split='receiveJSON' in chunks[i], just_concat='emitJSON' in chunks[i])

File "E:\emsdk\emscripten\1.36.3\emcc.py", line 1613, in run_passes

final = shared.Building.js_optimizer(final, passes, debug_level >= 4, JSOptimizer.extra_info, just_split=just_split, just_concat=just_concat)

File "E:\emsdk\emscripten\1.36.3\tools\shared.py", line 1741, in js_optimizer

ret = js_optimizer.run(filename, passes, NODE_JS, debug, extra_info, just_split, just_concat)

File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 544, in run

return temp_files.run_and_clean(lambda: run_on_js(filename, passes, js_engine, source_map, extra_info, just_split, just_concat))

File "E:\emsdk\emscripten\1.36.3\tools\tempfiles.py", line 64, in run_and_clean

return func()

File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 544, in

return temp_files.run_and_clean(lambda: run_on_js(filename, passes, js_engine, source_map, extra_info, just_split, just_concat))

File "E:\emsdk\emscripten\1.36.3\tools\js_optimizer.py", line 446, in run_on_js

filenames = pool.map(run_on_chunk, commands, chunksize=1)

File "E:\emsdk\python\2.7.5.3_64bit\lib\multiprocessing\pool.py", line 250, in map

return self.map_async(func, iterable, chunksize).get()

File "E:\emsdk\python\2.7.5.3_64bit\lib\multiprocessing\pool.py", line 557, in get

raise self._value

AssertionError: Error in optimizer (return code 255):

ninja: build stopped: subcommand failed.

@aidanhs
Copy link
Owner

aidanhs commented Dec 6, 2016

Very interesting, the rust issue appears to be this line: thread 'main' has overflowed its stack.

I'm 99% sure I know what the issue is (a very deeply nested set of asm.js statements means that the naive freeing mechanism overflows the stack when attempting to recurse and free everything) so I'll fix that and get back to you.

@awtcode
Copy link
Author

awtcode commented Dec 8, 2016

Thanks, do you know if there is a quick way for me to identify the particular set of asm.js statements that is causing the problem? The reason why I would like to do this is because I see the 'callstack size exceeded' problem on Chrome and I would like to know the code path that triggers this problem.

@aidanhs
Copy link
Owner

aidanhs commented Dec 12, 2016

Interesting! The ayzim problem will be because of an incredibly deeply nested expression/set of conditionals in a function, and I assumed that a 'callstack size exceeded' would be due to recursion. But I've just checked, and a deeply nested expression in a single function with no recursion can also cause it (code example at bottom) so they probably are the same problem!

I think omitting the eliminate pass would help (comment out JSOptimizer.queue += [get_eliminate()] in emcc.py - the EMCC_DEBUG=1 line starting with DEBUG:root:applying js optimization passes: should no longer contain eliminate) as that's the thing that creates incredibly deeply nested expressions. Unfortunately it'll make your code quite a bit bigger.

I'll have a think about a way to identify a deep nesting. I think it should be relatively easy to add a small trace in ayzim to give you some information once I've fixed the recursive destructor.


Generate nonrecursive js to cause callstack size exceeded:

n = 6000
print 'function xx(x) {'
for i in range(n):
    print 'if (x === 6000) { return x } else { x = x*2;'
for i in range(n):
    print '}'
print 'return false; }'
print 'console.log(xx(4))'

@kripken
Copy link

kripken commented Dec 12, 2016

I'm curious, why is deep nesting worse here (rust/ayzim vs c++/old optimizer)?

@aidanhs
Copy link
Owner

aidanhs commented Dec 12, 2016

The old optimizer doesn't actually free memory, it just leaks it. So given something like x+y which becomes something like BinaryOp("+", Name("x"), Name("y")), the old optimizer will do nothing, whereas ayzim will clear up three AST nodes, recursing to get down to the Name nodes.

In general there isn't huge amounts of freeing...except at program exit, when everything must go! So I could probably fix this the lazy way by adding a process exit just before the end of main.

@kripken
Copy link

kripken commented Dec 12, 2016

But the old optimizer will still recurse those nodes, not to free them, but to do all the other stuff it does? Why is freeing special here?

@aidanhs
Copy link
Owner

aidanhs commented Dec 12, 2016

There's very little recursion in the old optimizer, courtesy of the traverse* functions in simpleast which keeps function node traversal to a fixed shallow depth. The one place that I can think of recursion actually happening when traversing nodes is traverseInOrder, a function inside the eliminate pass which calls itself. However, because it's executed early on inside the eliminate pass (for information gathering), it runs before eliminate has created the extremely deeply nested expressions so gets away with it.

@kripken
Copy link

kripken commented Dec 12, 2016

Oh right, thanks, I forgot we de-recursed traversal in C++. I had the earlier JS version in mind which is recursive.

@aidanhs
Copy link
Owner

aidanhs commented Dec 17, 2016

Ok, I found a couple more places that recurse (parsing and emitting output) both in ayzim and the emscripten optimizer. But it appears the emscripten optimizer has a workaround for stack size problems on Windows!

@awtcode I'd like to get to the bottom of this and waste as little of your time as possible, so I've built a new binary: https://aidanhs.com/ayzim-opt-awtcode.exe

  1. I've bumped up the size of the stack in ayzim to 10MB (like the emscripten optimizer), so it may all just work fine
  2. I've enabled optimization step timing, so if it still crashes it'd be great if you could run with EMCC_CORES=1 (to avoid interleaved output) and paste the last 20 lines or so of the debugging output - this will hopefully tell me what operation crashed
  3. to help you out with the 'callstack size exceeded' problem, part of the debugging output will now tell you the function with the deepest nesting of AST nodes in that optimization 'chunk'. Here's me manually running this binary on the whole sqlite llvm-generated asm.js file for demonstration purposes:
aidanhs@a-3 MINGW64 ~/Desktop/rust/ayzim (aphs-stack-debug)
$ ./target/release/ayzim-opt.exe ../../sqlite.js asm eliminate >/dev/null
starting reading and parsing
    reading and parsing took 641 milliseconds
starting asm
    asm took 0 milliseconds
starting eliminate
    EL stages: a:292196 fe:442499 vc:38026 se:202936 (ss:465066) cv:370165 r:292639
    eliminate took 2342 milliseconds
==== DEPTH 69 FOR _sqlite3VdbeHalt
starting final emit
    final emit took 372 milliseconds

The line saying ==== DEPTH 69 FOR _sqlite3VdbeHalt reports that the _sqlite3VdbeHalt function was the 'deepest' function after optimization, with a depth of 69. You'll see 124 printouts like these (one per chunk) - I struggle to imagine anything above 1000, but my experiments suggest that something was blowing past 1500 to cause a stack overflow the originally released ayzim! (the new one can handle over 10x that it appears)

If anything jumps out, you can use DEAD_FUNCTIONS to kill suspiciously large functions and see if that helps Chrome at least parse your JS.

If you get a chance at some point to take a look at any of the above I'd be very interested to hear back. It'd be easy to extend the reporting to be on every function (rather than one per chunk, which I just did to stop it being overwhelming), so let me know if you have any thoughts on something that you might find helpful.

@awtcode
Copy link
Author

awtcode commented Dec 18, 2016

Thanks @aidanhs , will try it out and let you know the results asap :)

@awtcode
Copy link
Author

awtcode commented Dec 20, 2016

Hi, I reran my code with the new binary and this is what I saw:

starting reading and parsing
thread 'worker' panicked at 'called Result::unwrap() on an Err value: Error { repr: xxx(xxx { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }) }', C:\bot\slave\nightly-dist-rustc-win-msvc-64\build\src\libcore\result.rs:837
note: Run with RUST_BACKTRACE=1 for a backtrace.
Traceback (most recent call last):

This seems to be a new error though.

@aidanhs
Copy link
Owner

aidanhs commented Dec 20, 2016

Hmm, that's unexpected. Your file clearly has something non-utf8 in it. I didn't think non-ascii asm.js would be generated, let alone non-utf8, so clearly I need to examine these assumptions. It could be something withing the generated asm.js itself, or something to do with encoding on windows. Do you have anything non-ascii in your program at all? Function names, variables, strings? Alternatively, do you use non-default-US encoding on Windows?

As a compounding bit of confusion, it's actually not even getting as far as it was last time! The place this error comes from is before anywhere the stack could have possibly overflowed, so either something changed on the machine ayzim is being run on, me rebuilding ayzim on a different machine had adverse effects, or the Rust updates and dependency updates had adverse effects.

Just to double check I'm correct in my assumption in where this is coming from, could you run with RUST_BACKTRACE=1 as an environment variable if you get a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants