-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openblas segfault with Cbc #50
Comments
Thanks for reporting this. A few things to try that will help diagnose the problem:
and let us know if there are any segfaults. Also, within Julia:
If the Homebrew package is installed,
|
julia linprog.jl No error. Pkg.status()
julia> Homebrew.installed("coinmp") Also, here is a copy of ENV |
Ok, so it looks like it's an issue with the MIP solver, Cbc. I don't have direct access to a 10.8 mac to try to reproduce this. @staticfloat, any chance you could test and see if you get the same crash? Is there a way to force homebrew to compile from source instead of using the binary tap? For good measure, you should remove the CoinMP package, which was renamed to Cbc (just delete the directory in ~/.julia), but I don't think this will fix the issue. |
Actually the version in METADATA wasn't using homebrew, so it's strange that homebrew reports it as installed. I've just bumped it. I would try:
If you have a system installation of homebrew:
The coinmp package in homebrew-science is not compatible with the version we use; we have a number of patches. To reinstall using the latest build file, let's clear any existing binaries:
Then
|
Cbc works just fine for me on 10.8, using the binaries provided by Homebrew.jl. Could it be that you have a CoinMP package installed by homebrew-science and that is getting loaded? @mlubin is there any way we can (inside of Julia) detect that the configuration of CoinMP is incorrect? We have hooks inside of BinDeps now that lets us reject a binary after loading it if it fails some test |
That seems like a plausible explanation. If @glennfulford confirms the issue, I can add this hook by checking for the bugs that I've patched :). |
Hi, I tried your suggestions (deleted coinmp, deleted reinstalled Cbc as The Cbc binary is definitely there in .julia/Homebrew/deps/usr/bin, it runs On Tue, Oct 8, 2013 at 10:57 AM, Miles Lubin notifications@gh.neting.ccwrote:
|
What does this output: julia> run(`which Cbc`) |
julia> using MathProgBase julia> run( On Tue, Oct 8, 2013 at 3:17 PM, Glenn Fulford glenn.fulford@gmail.comwrote:
|
That's strange. Do you have an installation of coinmp anywhere else on the system? The binary search path may not be the same as the library search path. Try |
Here is some of my output from gdb. I have cut out a lot of similar (gdb) run
warning: Could not find object file
warning: Could not find object file
warning: Could not find object file
warning: Could not find object file
warning: Could not find object file
--SNIP== lots of warnings like this warning: Could not find object file
warning: Could not find object file
warning: Could not find object file
. done command line - CoinMP -log 1 -solve -quit (default strategy 1) Program received signal SIGSEGV, Segmentation fault. On Tue, Oct 8, 2013 at 3:58 PM, Miles Lubin notifications@gh.neting.ccwrote:
|
Do you have a compiler set up? We can try compiling Cbc from source: |
Do you have a compiler set up?Yes, I normally buid julia from source. OK, I did what you suggested, its compiling now. Seems to be compiling OK, On Tue, Oct 8, 2013 at 4:41 PM, Miles Lubin notifications@gh.neting.ccwrote:
|
Ok, I think it compiled ok and it put new binaries into I still get the that test on linprog.jl appears to run ok, but Welcome to the CBC MILP Solver command line - CoinMP -log 1 -solve -quit (default strategy 1) On Tue, Oct 8, 2013 at 5:07 PM, Glenn Fulford glenn.fulford@gmail.comwrote:
|
The fact that it prints out the welcome lines really makes me think that it's finding some other version of coinmp on your system, because by default I've disabled that output in the patched version. Could you check your system for libcoinmp.dylib/libcoinmp.so in common library locations like /usr/lib and /usr/local/lib? |
Yeah, I tried looking using find / -name "coinmp" but it was taking ages, On Tue, Oct 8, 2013 at 5:26 PM, Miles Lubin notifications@gh.neting.ccwrote:
|
I looked in /usr/lib and /usr/local/lib, but didn't find any files with ALso, I tried /Library/Caches/Homebrew/coinmp-1.6.0.tgz On Tue, Oct 8, 2013 at 5:45 PM, Glenn Fulford glenn.fulford@gmail.comwrote:
|
Hmm, we'll have to keep digging then. On line 47 of deps.jl, could you insert After this is done, go into Cbc/deps/src/CoinMP-1.7.0 and run By the way, I realize this is a lot of work to get JuMP working. I'd like to figure out the issue with Cbc which could affect other users, but you could also try installing GLPK and use that as the solver in the meantime. If Cbc is also installed, it will be chosen by default, but you can explicitly set the solver when building a model object: |
This is strangem, which i tried to build using Pkg.fixup("Cbc") with WARNING: An exception occured while building binary dependencies. in build at pkg/entry.jl:434 On Wed, Oct 9, 2013 at 1:17 AM, Miles Lubin notifications@gh.neting.ccwrote:
|
I went ahead and tried running gdb anyway, since the binaries were created, IHBI-M-052237:test fulford$ gdb --args ~/julia/julia mixintprog.jl (gdb) run
SNIP---lots of warnings Reading symbols for shared libraries . done Program exited with code 01. On Wed, Oct 9, 2013 at 10:25 AM, Glenn Fulford glenn.fulford@gmail.comwrote:
|
Could you clear the Cbc/deps/src directory again and paste the complete output of Pkg.fixup? There's not enough to tell what went wrong. |
Dear Miles, sorry about that. https://gist.github.com/glennfulford/6895971 Hopefully you can read it, otherwise let me know and I will post it here. Glenn https://gist.github.com/glennfulford* On Wed, Oct 9, 2013 at 12:40 PM, Miles Lubin notifications@gh.neting.ccwrote:
|
Thanks for this. I'm going to try reproducing the error on a mac here. Could you reply with the output of the following commands:
In parallel, I have one more variation to try with the configure line: |
which g++ On Wed, Oct 9, 2013 at 3:48 PM, Miles Lubin notifications@gh.neting.ccwrote:
|
The output to running Pkg.fixup("Cbc") with configure option https://gist.github.com/glennfulford/6896963I've also included the full On Wed, Oct 9, 2013 at 3:54 PM, Glenn Fulford glenn.fulford@gmail.comwrote:
|
Thanks. It looks like the library compiled fine this time. After the SIGSEGV line at the end, could you run the command |
output of backtrace... logLevel was changed from 1 to 1
Continuous objective value is -16.5 - 0.00 seconds
Cgl0004I processed model has 1 rows, 5 columns (5 integer) and 5 elements
Objective coefficients multiple of 1
Cutoff increment increased from 1e-05 to 0.999
Program received signal SIGSEGV, Segmentation fault.
0x0000000105a7f1e4 in .L11 ()
(gdb) bt
0 0x0000000105a7f1e4 in .L11 ()
1 0x0000000104b2119a in dgetf2_k ()
2 0x0000000104b24ff2 in dgetrf_parallel ()
3 0x0000000104b2479f in dgetrf_parallel ()
4 0x0000000104b2479f in dgetrf_parallel ()
5 0x0000000104b2479f in dgetrf_parallel ()
6 0x0000000104b2479f in dgetrf_parallel ()
7 0x0000000104b2479f in dgetrf_parallel ()
8 0x0000000104b2479f in dgetrf_parallel ()
9 0x0000000104813984 in dgetrf_ ()
10 0x000000010e02fc99 in CoinDenseFactorization::factor (this=0x100000000)
at CoinDenseFactorization.cpp:202
11 0x000000010cd3622d in ClpFactorization::factorize (this=0x10eaad160,
model=0x103a27000, solveType=0, valuesPass=false) at
ClpFactorization.cpp:1588
12 0x000000010cde6066 in ClpSimplex::internalFactorize (this=0x103a27000,
solveType=0) at ClpSimplex.cpp:1717
13 0x000000010cdfc07a in ClpSimplex::startup (this=0x103a27000,
ifValuesPass=0, startFinishOptions=7) at ClpSimplex.cpp:8271
14 0x000000010ce0dd29 in ClpSimplexDual::startupSolve (this=0x103a27000,
ifValuesPass=0, saveDuals=0x0, startFinishOptions=7) at
ClpSimplexDual.cpp:239
15 0x000000010ce1a63e in ClpSimplexDual::dual (this=0x103a27000,
ifValuesPass=0, startFinishOptions=7) at ClpSimplexDual.cpp:588
16 0x000000010cdf1d93 in ClpSimplex::dual (this=0x103a27000,
ifValuesPass=0, startFinishOptions=7) at ClpSimplex.cpp:5235
17 0x000000010c94c105 in OsiClpSolverInterface::crunch (this=0x10eab20c0)
at OsiClpSolverInterface.cpp:6688
18 0x000000010c94a2b3 in OsiClpSolverInterface::resolve (this=0x10eab20c0)
at OsiClpSolverInterface.cpp:1116
19 0x000000010e617b5c in CbcModel::resolve (this=0x10fb83600,
solver=0x10eab2358) at CbcModel.cpp:12139
20 0x000000010e5f9a57 in CbcModel::resolve (this=0x10fb83600, parent=0x0,
whereFrom=0, saveSolution=0x0, saveLower=0x0, saveUpper=0x0) at
CbcModel.cpp:8932
21 0x000000010e5ef86c in CbcModel::branchAndBound (this=0x10fb83600,
doStatistics=0) at CbcModel.cpp:1993
22 0x000000010e492431 in CbcMain1 (argc=5, argv=0x7fff5fbfd830,
model=@0x10f0a8600, callBack=0x10e476770 <dummyCallBack(CbcModel*, int)>)
at CbcSolver.cpp:5594
23 0x000000010e4a738a in CbcMain1 (argc=5, argv=0x7fff5fbfd830,
model=@0x10f0a8600) at CbcSolver.cpp:1116
24 0x000000010c9250e7 in CbcSolveProblem (hCbc=0x10ea95ff0,
pProblem=0x10cae62f0, pOption=0x10ea18de0, Method=0) at CoinCbc.cpp:948
25 0x000000010c92576d in CbcOptimizeProblem (pProblem=0x10cae62f0,
pResult=0x10ea20400, pSolver=0x10ea18c90, pOption=0x10ea18de0, Method=0) at
CoinCbc.cpp:1047
26 0x000000010c91bf7f in CoinOptimizeProblem (hProb=0x10ca91820, Method=0)
at CoinMP.cpp:590
27 0x000000010a551358 in ?? ()
28 0x000000010a551241 in ?? ()
29 0x000000010a551173 in ?? ()
30 0x000000010006f41c in jl_apply (f=0x10fc78f40, args=0x7fff5fbfdc18,
nargs=1) at julia.h:1026
31 0x0000000100071d0c in jl_trampoline (F=0x10fc78f40,
args=0x7fff5fbfdc18, nargs=1) at builtins.c:820
32 0x00000001000638bc in jl_apply (f=0x10fc78f40, args=0x7fff5fbfdc18,
nargs=1) at julia.h:1026
33 0x00000001000657f8 in jl_apply_generic (F=0x103d48dc0,
args=0x7fff5fbfdc18, nargs=1) at gf.c:1355
34 0x000000010a54b78e in ?? ()
35 0x000000010a54aa52 in ?? ()
36 0x000000010006f41c in jl_apply (f=0x10f168fa0, args=0x7fff5fbfde88,
nargs=8) at julia.h:1026
37 0x0000000100071d0c in jl_trampoline (F=0x10f168fa0,
args=0x7fff5fbfde88, nargs=8) at builtins.c:820
38 0x00000001000638bc in jl_apply (f=0x10f168fa0, args=0x7fff5fbfde88,
nargs=8) at julia.h:1026
39 0x00000001000657f8 in jl_apply_generic (F=0x10f106860,
args=0x7fff5fbfde88, nargs=8) at gf.c:1355
40 0x00000001000dd14c in jl_apply (f=0x10f106860, args=0x7fff5fbfde88,
nargs=8) at julia.h:1026
41 0x00000001000dd00e in do_call (f=0x10f106860, args=0x10f07e5b8,
nargs=8, locals=0x7fff5fbfe7a0, nl=3) at interpreter.c:57
42 0x00000001000dae4c in eval (e=0x10f147100, locals=0x7fff5fbfe7a0, nl=3)
at interpreter.c:175
43 0x00000001000daed5 in eval (e=0x10f1470e0, locals=0x7fff5fbfe7a0, nl=3)
at interpreter.c:182
44 0x00000001000dca57 in eval_body (stmts=0x10f07e500,
locals=0x7fff5fbfe7a0, nl=3, start=0, toplevel=1) at interpreter.c:447
45 0x00000001000dcd00 in jl_interpret_toplevel_thunk_with
(lam=0x10f10b8c0, loc=0x0, nl=3) at interpreter.c:483
46 0x00000001000dcd83 in jl_interpret_toplevel_thunk (lam=0x10f10b8c0) at
interpreter.c:490
47 0x00000001000f1e4c in jl_toplevel_eval_flex (e=0x10f146f80, fast=1) at
toplevel.c:406
48 0x00000001000f2b08 in jl_parse_eval_all (fname=0x1037e3800
"/Users/fulford/.julia/MathProgBase/test/mixintprog.jl") at toplevel.c:439
49 0x00000001000f2d19 in jl_load (fname=0x1037e3800
"/Users/fulford/.julia/MathProgBase/test/mixintprog.jl") at toplevel.c:472
50 0x00000001000f2ddb in jl_load_ (str=0x102256cf0) at toplevel.c:483
51 0x00000001028598a6 in ?? ()
52 0x000000010006f41c in jl_apply (f=0x1037947c0, args=0x7fff5fbfede8,
nargs=1) at julia.h:1026
53 0x0000000100071d0c in jl_trampoline (F=0x1037947c0,
args=0x7fff5fbfede8, nargs=1) at builtins.c:820
54 0x00000001000638bc in jl_apply (f=0x1037947c0, args=0x7fff5fbfede8,
nargs=1) at julia.h:1026
55 0x00000001000657f8 in jl_apply_generic (F=0x1037946c0,
args=0x7fff5fbfede8, nargs=1) at gf.c:1355
56 0x000000010285900d in ?? ()
57 0x000000010006f41c in jl_apply (f=0x103435100, args=0x7fff5fbff1b8,
nargs=1) at julia.h:1026
58 0x0000000100071d0c in jl_trampoline (F=0x103435100,
args=0x7fff5fbff1b8, nargs=1) at builtins.c:820
59 0x00000001000638bc in jl_apply (f=0x103435100, args=0x7fff5fbff1b8,
nargs=1) at julia.h:1026
60 0x00000001000657f8 in jl_apply_generic (F=0x1034350c0,
args=0x7fff5fbff1b8, nargs=1) at gf.c:1355
61 0x0000000102855ab1 in ?? ()
62 0x000000010006f41c in jl_apply (f=0x10366a6a0, args=0x7fff5fbff3e8,
nargs=1) at julia.h:1026
63 0x0000000100071d0c in jl_trampoline (F=0x10366a6a0,
args=0x7fff5fbff3e8, nargs=1) at builtins.c:820
64 0x00000001000638bc in jl_apply (f=0x10366a6a0, args=0x7fff5fbff3e8,
nargs=1) at julia.h:1026
65 0x00000001000657f8 in jl_apply_generic (F=0x10366a660,
args=0x7fff5fbff3e8, nargs=1) at gf.c:1355
66 0x0000000102800f04 in ?? ()
67 0x0000000102800bf0 in ?? ()
68 0x000000010006f41c in jl_apply (f=0x103155060, args=0x0, nargs=0) at
julia.h:1026
69 0x0000000100071d0c in jl_trampoline (F=0x103155060, args=0x0, nargs=0)
at builtins.c:820
70 0x00000001000638bc in jl_apply (f=0x103155060, args=0x0, nargs=0) at
julia.h:1026
71 0x00000001000657f8 in jl_apply_generic (F=0x103154fe0, args=0x0,
nargs=0) at gf.c:1355
72 0x0000000100001aec in jl_apply (f=0x103154fe0, args=0x0, nargs=0) at
julia.h:1026
73 0x0000000100001d9e in true_main (argc=2, argv=0x7fff5fbff970) at
repl.c:252
74 0x00000001000e62df in julia_trampoline (argc=2, argv=0x7fff5fbff970,
pmain=0x100001bd0 <true_main>) at init.c:814
75 0x000000010000226b in main (argc=2, argv=0x7fff5fbff970) at repl.c:292
(gdb)```
On Wed, Oct 9, 2013 at 11:59 PM, Miles Lubin notifications@gh.neting.ccwrote:
> Thanks. It looks like the library compiled fine this time. After the
> SIGSEGV line at the end, could you run the command bt on the gdb command
> line and paste the output?
>
> —
> Reply to this email directly or view it on GitHubhttps://github.com/IainNZ/JuMP.jl/issues/50#issuecomment-25973091
> . |
Very interesting! It seems like an openblas issue, not a Cbc issue. Try running the the test like |
It doesn't crash. julia> lufact(rand(10,10)) On Thu, Oct 10, 2013 at 10:53 AM, Miles Lubin notifications@gh.neting.ccwrote:
|
Just to double check: this is performed on a Julie executable that does not
|
If not sure if you wanted the output of IHBI-M-052237:~ fulford$ vmmap julia | grep dylib On Thu, Oct 10, 2013 at 10:57 AM, Glenn Fulford glenn.fulford@gmail.comwrote:
|
Elliott, I am not sure, but i think it does include Miles' fixes. How can I On Thu, Oct 10, 2013 at 10:58 AM, Elliot Saba notifications@gh.neting.ccwrote:
|
The combination Lock example is now wooking. that is with the version of On Thu, Oct 10, 2013 at 11:03 AM, Glenn Fulford glenn.fulford@gmail.comwrote:
|
Yes, we need you to test the lufact call on a "bad" version of Julia, one
|
Elliott, do you mean the bad version of Cbc? My julia version should be Going back to the version of Cbc compiled without the --enable-lapack flag Hope that makes sense. On Thu, Oct 10, 2013 at 11:20 AM, Elliot Saba notifications@gh.neting.ccwrote:
|
I'm working on generating an input file for cbc so we can try testing it stand-alone without julia. |
Ok, try putting the following text in a file called
With the cbc version compiled without the
We'll see if this crashes or not. |
No crash for this case. $ ~/.julia/Cbc/deps/usr/bin/cbc test.mps (I made sure this was the one withOUT the --without-lapack fix (ie. it DID Result - Optimal solution found Objective value: -16.00000000 Total time (CPU seconds): 0.01 (Wallclock seconds): 0.05 On Thu, Oct 10, 2013 at 12:05 PM, Miles Lubin notifications@gh.neting.ccwrote:
|
Thanks for checking. It seems like we can only reproduce the crash with cbc through julia. I'm not sure how to test it any further without spending too much of your time chasing down possible causes. I'll try to reproduce the issue locally on a recent Macbook Pro. In the meantime feel free to actually use JuMP now with the |
I've renamed the issue now that we've determined the culprit. |
I was able to reproduce the issue on an Intel i5 macbook pro with a julia build from source. Thanks for sticking with this, @glennfulford. @staticfloat, @xianyi, how do we debug this from here? Can I build a debug version of openblas with julia? |
This is quite a lot for xianyi to wade through. I would suggest opening an
|
Good point. On the julia side though, how can I get more debug info from openblas? |
I'm pretty sure you can just add DEBUG=1 to OpenBLAS's make incantation. On Thu, Oct 10, 2013 at 10:57 AM, Miles Lubin notifications@gh.neting.ccwrote:
|
OK, good it just wasn't me doing something stupid. I am happy to test on my On Fri, Oct 11, 2013 at 4:00 AM, Elliot Saba notifications@gh.neting.ccwrote:
|
Hi Guys, Could you try to build OpenBLAS with USE_OPENMP=1? Xianyi |
I get this segfault too on 10.9. |
@mlubin Can we avoid this whole thing by distributing a bottle where Cbc is compiled without lapack support? |
This has been disabled in the OS X binaries. @ViralBShah @glennfulford, could you test? I'd suggest running
|
Note that in a future |
Yes it worked. I also had to delete MathProgBase and reinstall it thanks, Glenn. On Sun, Nov 3, 2013 at 8:35 AM, Elliot Saba notifications@gh.neting.ccwrote:
|
I am in the process of migrating to a new mac - so its gonna take me a little bit to get everything up and running. |
Thanks again @glennfulford! I consider this resolved now. |
Hi,
I've tried to run the combination lock example
http://iaindunning.com/2013/combination-locks.html
require("comblock")
Segmentation fault: 11
Further investigation shows the segmentation fault is occuring somewhere in the
solve command.
julia> versioninfo()
Julia Version 0.2.0-prerelease+3937
Commit 4666764* 2013-10-05 23:49:45 UTC
Platform Info:
System: Darwin (x86_64-apple-darwin12.5.0)
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
LAPACK: libopenblas
LIBM: libopenlibm
The text was updated successfully, but these errors were encountered: