Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in JuliaParser #14929

Closed
Keno opened this issue Feb 4, 2016 · 17 comments
Closed

Segfault in JuliaParser #14929

Keno opened this issue Feb 4, 2016 · 17 comments
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@Keno
Copy link
Member

Keno commented Feb 4, 2016

@carnaval and I were dissecting a segfault in JuliaParser (+my custom changes) after pulling latest julia master. This is what I've found since:

The cause is a corrupt pointer value being passed to jl_type_error_rt.

Using, rr, the execution trace leading up to this point is:

movabs $0x7f57ab3f76d0,%rax
# rcx is 0 here
callq  *%rax #julia_take_token_23569
# rcx is 0xddacc4a6af0d4800
mov    -0x1f0(%rbp),%rax
cmp    $0x0,%rax
jne <>
mov    %rbx,%rdx
add    $0xffffffffffffb880,%rdx
movabs $0x7f57ab3cc00c,%rdi
movabs $0x7f57ab50e00d,%rsi 
movabs $0x7f59b1554802,%rax
callq *%rax # to jl_type_error_rt

At this point got in jl_type_rror_rt is 0xddacc4a6af0d4800. So far I haven't looked at anything else, so it may still be that rcx is somehow used by llvm to pass the return value, meaning the problem is elsewhere, but the assembly trace does look suspicious.

@Keno
Copy link
Member Author

Keno commented Feb 4, 2016

Relevant LLVM IR: https://gist.github.com/Keno/cb751c61e3fc7b3037c9

@Keno
Copy link
Member Author

Keno commented Feb 4, 2016

My working theory is that we're in

if258:                                            ; preds = %ok.260
  %598 = call %jl_value_t* @julia_take_token_23684(%jl_value_t* %1) #1
  %599 = load volatile %jl_value_t*, %jl_value_t** %9, align 8
  %600 = icmp eq %jl_value_t* %599, null
  br i1 %600, label %err262, label %ok.263

and are jumping to the wrong basic block

err262:                                           ; preds = %if258
  call void @jl_undefined_var_error(%jl_value_t* inttoptr (i64 140220473766056 to %jl_value_t*))
  unreachable

@carnaval
Copy link
Contributor

carnaval commented Feb 4, 2016

yep it's definitely not a type check. it's still weird since the nullcheck should happen on the predecessor BB so as soon as it does the cmp it's already wrong, even before the branch.

the code generator got really confused there

@Keno
Copy link
Member Author

Keno commented Feb 4, 2016

yep it's definitely not a type check. it's still weird since the nullcheck should happen on the predecessor BB so as soon as it does the cmp it's already wrong, even before the branch.

Not sure what you mean

@carnaval
Copy link
Contributor

carnaval commented Feb 4, 2016

I just don't think it can be a labeling problem (~ got confused somehow between the two jump targets) since the cmp instruction is already wrong there

@carnaval
Copy link
Contributor

carnaval commented Feb 4, 2016

nevermind, read the first block wrong. I thought the typecheck was the intended thing to do.

carry on :)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 4, 2016

if you suspect llvm, it seems worthwhile to me to try implementing -O0 mode

@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

I updated the gist with the annotated assembly code of the object file. Specifically, we jump to .Ltmp744 from LBB27_142. Now to track this through the LLVM IR and figure out when when this happens.

@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

I'm having a hard time reproducing the generated assembly outside of julia itself. I tried:

cat backup2-after.bc | ./julia-master/usr/bin/llc -debug-pass=Structure -O3 -mcpu=westmere -mattr=-sse4a,-avx512bw,+cx16,-tbm,-adx,-fma4,-avx512vl,-prfchw,-bmi2,-avx512pf,-fsgsbase,-avx,-avx512cd,-rtm,+popcnt,-fma,-bmi,+aes,-rdrnd,+sse4.1,+sse4.2,-avx2,-avx512er,+sse,-lzcnt,+pclmul,-avx512f,-f16c,+ssse3,+mmx,+cmov,-xop,-rdseed,-movbe,-hle,-sha,+sse2,+sse3,-avx512dq -code-model=large -relocation-model=pic -fast-isel -O3 -o - -enable-tail-merge=0 - > indirect2.s

I can however reproduce by loading the module into julia and then writing it to assembly. Any ideas why that might be?

@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

Hmm, running our passes, dumping out to bitcode and then running the rest of the passes does seem to not reproduce the bug. Very odd!

@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

Turns out it's our fault not the backend. The IR was misleading because it generates the correct thing when we regenerate it, but generates undefs the first time around. Will have to figure out why.

@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

Ok, this is starting to look like a type inference bug, probably somewhat complicated by the fact that the function in question calls itself. When I ran code_typed though, everything inferred correctly, it's only during codegen, that things go bad. cc @JeffBezanson

@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

https://gist.github.com/Keno/9103e343bbc9a66231c2 compares the inferred AST we're looking at during codegen and the one obtained by starting a fresh julia session and calling code_typed.

@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

@JeffBezanson The function in question is here: https://github.com/JuliaLang/JuliaParser.jl/blob/kf/loctrack/src/parser.jl#L1685. You should be able to reproduce everything by checking out the kf/loctrack branch of julia parser. You might also need various other packages of mine, but they should all be on github in appropriate versions.

@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

Also, you might want to apply #14967, before trying to reproduce, otherwise it's very difficult to actually see the bug where it appears.

@Keno Keno added the bug Indicates an unexpected problem or unintended behavior label Feb 7, 2016
@Keno
Copy link
Member Author

Keno commented Feb 7, 2016

The suspicious behavior seems to start at

:ex = Expr(:call, JuliaParser.Parser.parse_RtoL, ps::JuliaParser.Parser.ParseState, ts::JuliaParser.Lexer.TokenStream{JuliaParser.Lexer.SourceLocToken}, JuliaParser.Parser.parse_cond, JuliaParser.Parser.EQ_OPS)::Union{},

@Keno
Copy link
Member Author

Keno commented Mar 5, 2016

Fixed by #15300.

@Keno Keno closed this as completed Mar 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants