-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM ERROR: Broken function found, compilation aborted! when precompiling Base.permutedims #31156
Comments
I see a crash in libuv, which makes me think you're trying to print something without libuv being initialized properly.
|
I thought the error you showed me was solved by calling diff --git a/precompile_wrapper.jl b/precompile_wrapper.jl
index 9ebfb2e..a8e8554 100644
--- a/precompile_wrapper.jl
+++ b/precompile_wrapper.jl
@@ -1,8 +1,8 @@
atexit_hook_copy = copy(Base.atexit_hooks) # make backup
# clean state so that any package we use can carelessly call atexit
empty!(Base.atexit_hooks)
-Base.__init__()
-Sys.__init__() #fix https://github.com/JuliaLang/julia/issues/30479
+# Base.__init__()
+# Sys.__init__() #fix https://github.com/JuliaLang/julia/issues/30479
using REPL
Base.REPL_MODULE_REF[] = REPL I uploaded the scripts I'm using here: https://gist.github.com/tkf/bb020c3e2d64d049696c7e549f0120ad/ab16ec9f94598a64f9e242f5d8b7ba1a9f7fc94e They are the same as the one included in the first post but just in case I had copy-and-paste mistake. I can reproduce
Note also that this script can generate $ git diff
diff --git a/precompile_wrapper.jl b/precompile_wrapper.jl
index 9ebfb2e..f6789aa 100644
--- a/precompile_wrapper.jl
+++ b/precompile_wrapper.jl
@@ -6,7 +6,7 @@ Sys.__init__() #fix https://github.com/JuliaLang/julia/issues/30479
using REPL
Base.REPL_MODULE_REF[] = REPL
-Base.precompile(Tuple{typeof(Base.permutedims), Array{Bool, 3}, Array{Int64, 1}})
+# Base.precompile(Tuple{typeof(Base.permutedims), Array{Bool, 3}, Array{Int64, 1}})
Base._atexit() # run all exit hooks we registered during precompile
empty!(Base.atexit_hooks) # don't serialize the exit hooks we run + added
$ rm -f sys.a
$ ./compile.bash
+ julia --output-o=sys.a -g1 --startup-file=no --code-coverage=none --history-file=yes --inline=yes --math-mode=ieee --handle-signals=yes --startup-file=no --warn-overwrite=no --compile=yes --depwarn=yes --cpu-target=native --track-allocation=none --sysimage-native-code=yes --sysimage=/home/takafumi/opt/julia/julia-1.1.0/lib/julia/sys.so --compiled-modules=yes --optimize=2 ./precompile_wrapper.jl
$ file sys.a
sys.a: current ar archive FYI $ julia -e 'using InteractiveUtils; versioninfo()'
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
$ julia-1.0 -e 'using InteractiveUtils; versioninfo()'
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
$ ~/repos/watch/julia/usr/bin/julia -e 'using InteractiveUtils; versioninfo()'
Julia Version 1.2.0-DEV.339
Commit 00f257d603 (2019-02-16 01:47 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake) |
Bump. I still can reproduce the bug with current |
It is still reproducible with 1.4.0-DEV.297 (a68237f). |
I'm seeing this when using the new PackageCompiler on aarch64 on 1.3.1
|
I can reproduce the error with the first example
|
Some further testing.
These don't fail:
|
Stating the obvious to be complete... in the regular REPL the equivalent methods work. i.e. And I noticed that the blame on the permutedims functions shows no change in at least 2 years |
@timholy I wondered if you had any insight given you seem to have driven the Does the fact that this happens for |
On MacOS I see the same as Keno:
|
Maybe some error happens before/during |
Note that it is enough to only have a bare
in the |
and |
Yeah, it returns true when using PackageCompiler as well. It isn't until the the code is getting written to the object file that LLVM asserts. |
What happens when you use different optimization flags? -O0 vs -O1 vs -O2 vs -O3 |
With
|
@tkf Just out of interest, how did you figure out this was cause by I'm trying to figure out a way to narrow in on other causes of this, as it seems to not just be limited to |
I just tried in a build of 1.3.1 with
Now trying with |
@ianshmean Can you do the same thing (running with -O0 vs -O1 vs -O2 vs -O3) in Julia 1.0, 1.1, 1.2, and 1.3 to see if we can figure out when this bug was introduced? |
From https://discourse.julialang.org/t/debugging-aot-compile-errors/20829/2 it looks like it started in 1.1 I'll try to cover the versions |
Also, you said that this bug does occur for Can you see if it occurs for |
Indeed. Bool, UInt8 = bad |
With Bad
Good
|
And I don't see any more from LLVM with |
Given |
I dumped the bad IR, on Julia master, as well as the IR that's normally generated at run time: https://gist.github.com/maleadt/f93ba85a91ba0860e00d883ff4052a8c |
I'm using JuliaLang/PackageCompiler.jl#333 to blacklist anything that causes this issue, and after blacklisting
|
After @vtjnash suggested in slack that it could be the vectorizer optimizations, @KristofferC suggested I disable those lines and build master, so I commented out Line 234 in 6d86384
Line 237 in 6d86384
And it worked. The example now passes with |
Unfortunately it still hit the |
@ianshmean You said that this is architecture-dependent, right? Can you post which architectures it works on, and which architectures you get the error on? |
The MWE in this thread errors on both aarch64 and amd64 for me. That's not platform specific. However, the package I'm PackageCompiling only hits this error on aarch64.
And frustratingly neither set contains a permutedims on an 8-bit array type |
To summarize. My take is that there's a bug in either/both of these that fixes the permutedims 8-bit example (I haven't build julia with these independently disabled, but can do that if it's helpful):
And another bug somewhere else that I tried to use If anyone has a suggestion, I'm happy to explore |
Pipe it to a file instead, so that you don;t get slowed done by your terminal. |
I haven't yet piped the output out.. But, I figured out the 2 out of 6406 precompile statements that were invoking the
Details on the bisector approach I used: JuliaLang/PackageCompiler.jl#295 (comment) |
Of the three known:
Fails on Ubuntu aarch64: [1,2,3] |
Here's a pure LLVM reproducer: https://gist.github.com/Keno/60d900bf197bfda75e2f9f72dec4411f Reproduce with |
For those wondering how to do that, the steps are basically:
|
Awesome. Also, for the precompile statements failing on aarch64, I just got IR dumps (and posted how I did each at the top). Note that each is caused by a differently numbered
https://gist.github.com/ianshmean/18202bed7aa6ecc433f344bbce1d8dd2
https://gist.github.com/ianshmean/79de0928d534eaee5c68edb168bb6f92 |
This is what comes out of bugpoint, BTW. It's usually easiest to do further debugging on the bugpoint reduced output, because it reduces the amount of code that gets executed, so debugging with
|
Candidate patch: https://reviews.llvm.org/D75072 |
This imports the patch I put up in https://reviews.llvm.org/D75072 and should fix #31156. We should probably hold off on merging this for a few days while upstream review is ongoing. In the meantime, this branch should be convenient to try. Make sure to remember to build LLVM from source, not BB.
This imports the patch I put up in https://reviews.llvm.org/D75072 and should fix #31156. We should probably hold off on merging this for a few days while upstream review is ongoing. In the meantime, this branch should be convenient to try. Make sure to remember to build LLVM from source, not BB.
Backport to LLVM9 is in #34860 for your trial convenience. |
It worked! 🎉🎉🎉🎉 I built #34860 with Thank you so much @Keno ! Notes:
Test precomp statements
|
@ianshmean Were you able to fix the stdlib problem? |
Yes. Deleting I posted a comment over on #34860 (comment) |
This imports the patch I put up in https://reviews.llvm.org/D75072 and should fix JuliaLang#31156. We should probably hold off on merging this for a few days while upstream review is ongoing. In the meantime, this branch should be convenient to try. Make sure to remember to build LLVM from source, not BB.
* Add patch for 31156 This imports the patch I put up in https://reviews.llvm.org/D75072 and should fix JuliaLang#31156. We should probably hold off on merging this for a few days while upstream review is ongoing. In the meantime, this branch should be convenient to try. Make sure to remember to build LLVM from source, not BB. * set LLVM BB build to release 3 update LLVM checksums delete old LLVM.v9.0.1-1 checksums Co-authored-by: Keno Fischer <keno@juliacomputing.com>
* Add patch for 31156 This imports the patch I put up in https://reviews.llvm.org/D75072 and should fix #31156. We should probably hold off on merging this for a few days while upstream review is ongoing. In the meantime, this branch should be convenient to try. Make sure to remember to build LLVM from source, not BB. * set LLVM BB build to release 3 update LLVM checksums delete old LLVM.v9.0.1-1 checksums Co-authored-by: Keno Fischer <keno@juliacomputing.com>
This fixes a case where loop-reduce introduces ptrtoint/inttoptr for non-integral address space pointers. Over the past several years, we have gradually improved the SCEVExpander to actually do something sensible for non-integral pointer types. However, that obviously relies on the expander knowing what the type of the SCEV expression is. That is usually the case, but there is one important case where it's not: The type of an add expression is just the type of the last operand, so if the non-integral pointer is not the last operand, later uses of that SCEV may not realize that the given add expression contains non-integral pointers and may try to expand it as integers. One interesting observation is that we do get away with this scheme in shockingly many cases. The reason for this is that SCEV expressions often have an `scUnknown` pointer base, which our sort order on the operands of add expressions sort behind basically everything else, so it usually ends up as the last operand. One situation where this fails is included as a test case. This test case was bugpoint-reduced from the issue reported at JuliaLang/julia#31156. What happens here is that the pointer base is an scAddRec from an outer loop, plus an scUnknown integer offset. By our sort order, the scUnknown gets sorted after the scAddRec pointer base, thus making an add expression of these two operands have integer type. This then confuses the expander, into attempting to expand the whole thing as integers, which will obviously fail when reaching the non-integral pointer. I considered a few options to solve this, but here's what I ended up settling on: The AddExpr class gains a new subclass that explicitly stores the type of the expression. This subclass is used whenever one of the operands is a non-integral pointer. To reduce the impact for the regular case (where the SCEV expression contains no non-integral pointers), a bit flag is kept in each flag expression to indicate whether it is of non-integral pointer type (this should give the same answer as asking if getType() is non-integral, but performing that query may involve a pointer chase and requires the DataLayout). For add expressions that flag is also used to indicate whether we're using the subclass or not. This is slightly inefficient, because it uses the subclass even in the (not uncommon) case where the last operand does actually accurately reflect the non-integral pointer type. However, it didn't seem worth the extra flag bit and complexity to do this micro-optimization. I had hoped that we could additionally restrict mul exprs from containing any non-integral pointers, and also require add exprs to only have one operand containg such pointers (but not more), but this turned out not to work. The reason for this is that SCEV wants to form differences between pointers, which it represents as `A + B*-1`, so we need to allow both multiplication by `-1` and addition with multiple non-integral pointer arguments. I'm not super happy with that situation, but I think it exposes a more general problem with non-integral pointers in LLVM. We don't actually have a way to express the difference between two non-integral pointers at the IR level. In theory this is a problem for SCEV, because it means that we can't materialize such SCEV expression. However, in practice, these expressions generally have the same base pointer, so SCEV will appropriately simplify them to just the integer components. Nevertheless it is a bit unsatisfying. Perhaps we could have an intrinsic that takes the byte difference between two pointers to the same allocated object (in the same sense as is used in getelementptr), which should be a sensible operation even for non-integral pointers. However, given the practical considerations above, that's a project for another time. For now, simply allowing the existing pointer-diff pattern for non-integral pointers seems to work ok. Differential Revision: https://reviews.llvm.org/D75072
I get
LLVM ERROR: Broken function found, compilation aborted!
when includingBase.precompile(Tuple{typeof(Base.permutedims), Array{Bool, 3}, Array{Int64, 1}})
for system image compilation. I can reproduce this in Julia 1.0 to 1.2.First I created
compile.bash
:and
./precompile_wrapper.jl
:Then run:
(
precompile_wrapper.jl
is taken fromrun_julia_code.jl
generated by PackageCompiler.jl)The text was updated successfully, but these errors were encountered: