Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CSDP_jll #56

Merged
merged 21 commits into from
May 2, 2020
Merged

Use CSDP_jll #56

merged 21 commits into from
May 2, 2020

Conversation

blegat
Copy link
Member

@blegat blegat commented Apr 10, 2020

Closes #50

@coveralls
Copy link

coveralls commented Apr 10, 2020

Coverage Status

Coverage remained the same at 47.122% when pulling 65e19bb on bl/jll into 958078c on master.

@blegat
Copy link
Member Author

blegat commented Apr 12, 2020

The symbols do not seem to be available in the symbolic library:

$ nm -D /home/blegat/.julia/artifacts/ebf551cab93728f370e779c804d83c37388e0717/lib/libcsdp.so
00000000002007a0 B __bss_start
                 w __cxa_finalize
00000000002007a0 D _edata
00000000002007b0 B _end
00000000000005c8 T _fini
                 w __gmon_start__
00000000000003f8 T _init
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 w _Jv_RegisterClasses
$ objdump -T /home/blegat/.julia/artifacts/ebf551cab93728f370e779c804d83c37388e0717/lib/libcsdp.so

/home/blegat/.julia/artifacts/ebf551cab93728f370e779c804d83c37388e0717/lib/libcsdp.so:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
00000000000003f8 l    d  .init	0000000000000000              .init
0000000000000000  w   D  *UND*	0000000000000000              _ITM_deregisterTMCloneTable
00000000002007a0 g    D  .data	0000000000000000  Base        _edata
00000000000005c8 g    DF .fini	0000000000000000  Base        _fini
0000000000000000  w   D  *UND*	0000000000000000              __gmon_start__
00000000002007b0 g    D  .bss	0000000000000000  Base        _end
00000000002007a0 g    D  .bss	0000000000000000  Base        __bss_start
0000000000000000  w   D  *UND*	0000000000000000              _Jv_RegisterClasses
0000000000000000  w   D  *UND*	0000000000000000              _ITM_registerTMCloneTable
0000000000000000  w   DF *UND*	0000000000000000  GLIBC_2.2.5 __cxa_finalize
00000000000003f8 g    DF .init	0000000000000000  Base        _init

https://travis-ci.org/github/JuliaOpt/CSDP.jl/jobs/674044667#L708-L709

Might be fixed by JuliaPackaging/Yggdrasil#835

@ViralBShah
Copy link

Can you try again?

@blegat
Copy link
Member Author

blegat commented Apr 14, 2020

It works on Julia v1.4. I have tried making it work on Julia v1.0 with BinaryProvider but the generated script does not work.
First, the line

LibraryProduct(["libcsdp"], :libcsdp)

produced the error

┌ Error: Error building `CSDP`: 
│ ERROR: LoadError: MethodError: no method matching LibraryProduct(::String, ::Symbol)
│ Closest candidates are:LibraryProduct(::AbstractString, !Matched::AbstractString, !Matched::Symbol) at /home/blegat/.julia/packages/BinaryProvider/kcGxO/src/Products.jl:101LibraryProduct(::AbstractString, !Matched::Array{S<:AbstractString,1}, !Matched::Symbol) where S<:AbstractString at /home/blegat/.julia/packages/BinaryProvider/kcGxO/src/Products.jl:106
│ Stacktrace:
│  [1] top-level scope at none:0
│  [2] include at ./boot.jl:317 [inlined]
│  [3] include_relative(::Module, ::String) at ./loading.jl:1044
│  [4] include(::Module, ::String) at ./sysimg.jl:29
│  [5] include(::String) at ./client.jl:392
│  [6] top-level scope at none:0in expression starting at /home/blegat/.julia/dev/CSDP/deps/build.jl:6

with BinaryProvider v0.5.8.
So I added prefix, as first argument and on my local computer, I get:

/home/blegat/Documents/Install/julia-1.0.3/bin/julia: symbol lookup error: /home/blegat/.julia/dev/CSDP/deps/usr/lib/libcsdp.so: undefined symbol: dpotrf_

and

$ ldd /home/blegat/.julia/dev/CSDP/deps/usr/lib/libcsdp.so
	linux-vdso.so.1 (0x00007ffe5cdf3000)
	libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007fe24d8cb000)
	libopenblas.so => /usr/lib/libopenblas.so (0x00007fe24c707000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007fe24c5c1000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fe24c59f000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007fe24c3d9000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fe24c3d4000)
	/usr/lib64/ld-linux-x86-64.so.2 (0x00007fe24db82000)

Is it expected that the blas library used is the system one ?
On Travis, the build does not work as prefix seems to be nothing:
https://travis-ci.org/github/JuliaOpt/CSDP.jl/jobs/674762571#L467

@ViralBShah
Copy link

I am not familiar much with BP scripts. Maybe @odow can help?

@ViralBShah
Copy link

See how Clp.jl does it in: jump-dev/Clp.jl#77

Worst case, can always make it Julia 1.3+ and drop BinaryProvider support - but it would be good to try a little more to see if Julia 1.0 support can be retained.

@odow
Copy link
Member

odow commented Apr 18, 2020

I would just leave the existing builder in-place. But throw an error on Windows and say use Julia 1.3+.

@blegat
Copy link
Member Author

blegat commented Apr 18, 2020

I'd rather have all Julia versions using the same binaries if possible, Windows is failing on Julia v1.4 to anyway, see AppVeyor log.
For Julia pre v1.3, it seems I forgot about the following line in YggDrasil README:

Remember that you will also need the build.jl files for all direct and indirect dependencies.

I added an OpenBLAS32 build file but it segfaults both on my local computer and on Travis:
https://travis-ci.org/github/JuliaOpt/CSDP.jl/jobs/676547136#L455-L461
@staticfloat Any idea what is wrong with the build file ? Is there any package using the OpenBLAS32 dll with BinaryProvider ?

# We also added `prefix, ` after `LibraryProduct(`.
using BinaryProvider # requires BinaryProvider 0.3.0 or later
# Needed to add this as this method was not defined.
function BinaryProvider.CompilerABI(; libgfortran_version=nothing)
Copy link
Member Author

@blegat blegat Apr 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staticfloat Any idea why the generated build.jl uses this method ? I don't see it defined in any version of BinaryProvider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because that's the signature of Pkg.BinaryPlatforms.CompilerABI

@ViralBShah
Copy link

Personally, I don't think it is worthwhile to get the new binaries running on the older versions. Maybe at least get a release done leaving the old things as is and Julia 1.3+ using the new binaries?

@blegat
Copy link
Member Author

blegat commented Apr 18, 2020

Currently, I don't think that having a Windows binary that segfault is better for the user than having a Windows installer that does not work so the EXCEPTION_ACCESS_VIOLATION on Windows with Julia v1.4 is what is causing this PR to be blocked from merging. The issue for Julia v1.0 would is indeed not blocking as we can just roll back to BinDeps.

@ViralBShah
Copy link

Wait I misunderstood. Didn't realize that the new windows binaries are segfaulting. Yes, if so, we should figure it out. I thought the old ones were segfaulting.

@ViralBShah
Copy link

How were the old windows binaries built? Is the build script available somewhere?

@blegat
Copy link
Member Author

blegat commented Apr 18, 2020

@EQt built them and uploaded them in https://github.com/EQt/winlapack. I don't know whether he still has the script he used for generating them.

@blegat
Copy link
Member Author

blegat commented Apr 18, 2020

@staticfloat Note that if I comment out the line https://github.com/JuliaPackaging/BinaryProvider.jl/blob/v0.5.8/src/Products.jl#L174
then ] build works and ] test works too on Julia v1.0.3.

@ViralBShah
Copy link

@EQt The build recipe is here, in case you notice something we are not doing right on windows: https://github.com/JuliaPackaging/Yggdrasil/blob/master/C/Coin-OR/CSDP/build_tarballs.jl

@ViralBShah
Copy link

ViralBShah commented Apr 18, 2020

Here are the build logs for windows:

https://dev.azure.com/JuliaPackaging/Yggdrasil/_build/results?buildId=2936&view=logs&jobId=602716fc-3b5e-5c3d-c4ed-bfa60db0fb79&j=230eaf81-f6df-5bf9-0e5e-386ceb49c8a5&t=6df7ab18-181c-5cec-157d-5914cb36d787

@giordano I don't see any complaints for libgfortran, but perhaps we need to add it and do the expansion? Any thoughts if that might help on Windows?

Appears it might help:
https://ci.appveyor.com/project/JuliaOpt/csdp-jl/builds/32269990/job/7dwe9806drnj53lb#L113

Error: Error building `CSDP`: 
│ [ Info: Downloading https://github.com/JuliaBinaryWrappers/OpenBLAS32_jll.jl/releases/download/OpenBLAS32-v0.3.9+2/OpenBLAS32.v0.3.9.x86_64-w64-mingw32-libgfortran5.tar.gz to C:\projects\csdp-jl\deps\usr\downloads\OpenBLAS32.v0.3.9.x86_64-w64-mingw32-libgfortran5.tar.gz...
│ ERROR: LoadError: LoadError: Unrecognied libgfortran version nothing.
│ Stacktrace:
│  [1] error(::String) at .\error.jl:33
│  [2] CompilerABI(; libgfortran_version::Nothing) at C:\projects\csdp-jl\deps\build_OpenBLAS32.v0.3.9.jl:17
│  [3] CompilerABI() at C:\projects\csdp-jl\deps\build_OpenBLAS32.v0.3.9.jl:10
│  [4] top-level scope at C:\projects\csdp-jl\deps\build_CSDP.v6.2.0.jl:19
│  [5] include(::String) at .\client.jl:439```

@giordano
Copy link
Contributor

The error is coming from this file that tries to install OpenBLAS32, not CSDP

@odow
Copy link
Member

odow commented Apr 18, 2020

Actually maybe we can just drop 1.0 support. It's not a widely used solver, so we can aim for minimal maintenance.

@blegat
Copy link
Member Author

blegat commented Apr 18, 2020

The error is coming from this file that tries to install OpenBLAS32, not CSDP

What do you mean ? OpenBLAS32 is a dependency of CSDP and so it is installed similarly to the FFMPEG and Cairo examples from the Yggdrasil README.

Actually maybe we can just drop 1.0 support. It's not a widely used solver, so we can aim for minimal maintenance.

In my experience, CSDP is still the most reliable SDP solver so it's used on Travis to test packages that rely on solving SDP. Having the same binary for all versions is appealing in that regards. Figuring out how to do this for CSDP has a fixed cost but once it does not seem to difficult to replicate the same think for all solvers once we understand it for CSDP. It's just one line to generate the build.jl's but they are a few caveats that are the reason for this fixed cost.

@ViralBShah
Copy link

Should I merge the libgfortran expansion of CSDP in oder to see whether it works on Windows for Julia 1.3+?

@giordano
Copy link
Contributor

I think there is something wrong in the new scripts, I don't see how building CSDP for libgfortran would help

@ViralBShah
Copy link

ViralBShah commented Apr 18, 2020

I thought there are two issues: 1) the issue with the scripts for older Julia releases prior to 1.3 and 2) Windows failures with the new binaries in Julia 1.4 with the JLL. I was hoping to fix the second with gfortran expansion - but it is possible I have completely misunderstood what is going on here.

@blegat
Copy link
Member Author

blegat commented Apr 18, 2020

Thanks ! Note that the segfault is thrown when you call locate whether it is from write_deps or not. So for instance, if the libraries are already there (e.g. it's the second time you call ] build), it will segfault earlier when satisfied is called, before we call write_deps.

@giordano
Copy link
Contributor

it will segfault earlier when satisfied is called, before we call write_deps.

Which satisfied are you referring to?

@blegat
Copy link
Member Author

blegat commented Apr 19, 2020

Which satisfied are you referring to?

The one for which you added , isolate = true in this PR. I checked the log of JuliaPackaging/BinaryProvider.jl#192 and I understand, we are making sure that isolate = true is passed to every call of locate.
Out of curiosity, why do we pass , ignore_platform=true to install ?

@blegat
Copy link
Member Author

blegat commented Apr 19, 2020

Using the unreleased v0.5.9 of BinaryProvider, it now works with Julia v1.0 and Linux and we now have details on why dlopen fails on Mac OS thanks to JuliaPackaging/BinaryProvider.jl#164 which is included in the backport:

│ [ Info: Found a valid dl path libcsdp.dylib while looking for libcsdp
│ [ Info: /Users/travis/build/JuliaOpt/CSDP.jl/deps/usr/lib/libcsdp.dylib matches our search criteria of libcsdp
│ ┌ Info: /Users/travis/build/JuliaOpt/CSDP.jl/deps/usr/lib/libcsdp.dylib cannot be dlopen'ed
│ │   dlopen_result =
│ │    could not load library "/Users/travis/build/JuliaOpt/CSDP.jl/deps/usr/lib/libcsdp.dylib"
│ │    dlopen(/Users/travis/build/JuliaOpt/CSDP.jl/deps/usr/lib/libcsdp.dylib, 1): Library not loaded: @rpath/libgomp.1.dylib
│ │      Referenced from: /Users/travis/build/JuliaOpt/CSDP.jl/deps/usr/lib/libcsdp.dylib
│ └      Reason: image not found
│ [ Info: Found a valid dl path libopenblas.0.3.9.dylib while looking for libcsdp
│ [ Info: Found a valid dl path libopenblas.0.dylib while looking for libcsdp
│ [ Info: Found a valid dl path libopenblas.dylib while looking for libcsdp
│ [ Info: Could not locate libcsdp inside /Users/travis/build/JuliaOpt/CSDP.jl/deps/usr/lib
│ ERROR: LoadError: LibraryProduct(nothing, ["libcsdp"], :libcsdp, "Prefix(/Users/travis/build/JuliaOpt/CSDP.jl/deps/usr)") is not satisfied, cannot generate deps.jl!

Any idea why libgomp is not found ?

EDIT: It seems to be because we need to add the dep CompilerSupportLibraries

@blegat
Copy link
Member Author

blegat commented Apr 19, 2020

Travis is green for both Mac and Linux and both Julia v1.0 and v1.4 with JuliaPackaging/BinaryProvider.jl#192 🎉
Only issue left is the failure on Windows.

@giordano
Copy link
Contributor

Did this ever work on Windows?

@blegat
Copy link
Member Author

blegat commented Apr 19, 2020

It used to with the dll manually built by @EQt

@giordano
Copy link
Contributor

@ViralBShah you see why it's so useful that BinaryBuilder tarballs come with the log? 😉

@ViralBShah
Copy link

Is this the same look as what we see on AZP during building? Or something else?

@giordano
Copy link
Contributor

Logs in Azure Pipelines are not persistent and finding the log corresponding to a very specific build is cumbersome

@ViralBShah
Copy link

ViralBShah commented Apr 19, 2020

We could tarball them separately and upload them to JuliaBinaryWrappers along with the release. Should move this discussion to the other issue.

@giordano
Copy link
Contributor

v0.5.9 of BinaryProvider is now tagged

Project.toml Outdated
@@ -14,7 +15,8 @@ SemidefiniteModels = "169818f4-1a3d-53bf-95b3-11177825b1e3"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[compat]
BinDeps = "0.8, 0.9, 1"
BinaryProvider = "0.5"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to require v0.5.9 specifically:

Suggested change
BinaryProvider = "0.5"
BinaryProvider = "0.5.9"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be good to bump BP to 0.6.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That version requires Julia v1.3, which is not very helpful here. Yes, history of BinaryProvider is a bit messy 🙂

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - that is a bit messy.

@ViralBShah
Copy link

@giordano Do you see anything in the CSDP build file that might be leading to the windows failure? It's a fairly simple C library.

https://github.com/JuliaPackaging/Yggdrasil/blob/master/C/Coin-OR/CSDP/build_tarballs.jl

@giordano
Copy link
Contributor

giordano commented Apr 29, 2020

Not sure, maybe we need @odow's level of debugging expertise to figure out what's the problem 🙂

@ViralBShah
Copy link

ViralBShah commented Apr 29, 2020

I started a build without openmp on windows, just in case it helps. CSDP build docs claim openmp works (but they are specifically talking only about mingw64)

JuliaPackaging/Yggdrasil#950

@ViralBShah
Copy link

Crashes without openmp. Reverted that PR on Yggdrasil.

@blegat
Copy link
Member Author

blegat commented May 2, 2020

I found a Windows computer and tried to chase the segfault. It seems to only occur on examples/example.jl which means it might be the same segfault as #39. Here is what happened:

  1. pre-Julia v1.0: CSDP was working fine on all OS (using @EQt's dll on Windows)
  2. pre-Julia v1.0: At some point, the dll stopped working, Windows stopped working.
  3. Julia v1.0: Sporadic segfaults on occur, see Sporadic malloc errors with CSDP 4.1 #39. It's tricky to reproduce.
  4. To avoid the segfault, we add the functions we are missing in the API (Refactory readprob with functions usable for Julia wrapper coin-or/Csdp#12) to rewrite the MOI wrapper so that it's easily verifiable to be free of segfaults: Simplify thanks for new API in CSDP #48. The old examples/example.jl is not updated though even if it uses the old API that was causing sporadic segfaults.
  5. On Windows, all tests pass except this example. This means that on Windows, this sporadic segfault seem to reliable happen which explains that we have the segfault even for this simple example. So the issue should be fixed by Simplify thanks for new API in CSDP #48, we probably just need to remove the old API.

@codecov-io
Copy link

Codecov Report

Merging #56 into master will decrease coverage by 0.06%.
The diff coverage is 17.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
- Coverage   54.50%   54.43%   -0.07%     
==========================================
  Files          10       10              
  Lines         655      654       -1     
==========================================
- Hits          357      356       -1     
  Misses        298      298              
Impacted Files Coverage Δ
src/CSDP.jl 100.00% <ø> (ø)
src/debug-mat.jl 0.00% <0.00%> (ø)
src/declarations.h.jl 19.84% <18.57%> (ø)
src/blockmat.jl 60.31% <0.00%> (-0.32%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 958078c...65e19bb. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Cant build on windows
6 participants