Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Clp_jll #77

Merged
merged 21 commits into from
Apr 18, 2020
Merged

Add Clp_jll #77

merged 21 commits into from
Apr 18, 2020

Conversation

odow
Copy link
Member

@odow odow commented Apr 3, 2020

But can confirm that I have passing locally!

x-ref: JuliaPackaging/Yggdrasil#509

Closes #73
Closes #70

@giordano
Copy link

giordano commented Apr 3, 2020

Note that this should fix #73, and probably #70 too

@odow odow closed this Apr 3, 2020
@odow odow reopened this Apr 3, 2020
@codecov-io
Copy link

codecov-io commented Apr 3, 2020

Codecov Report

Merging #77 into master will increase coverage by 4.93%.
The diff coverage is 77.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #77      +/-   ##
==========================================
+ Coverage   40.52%   45.45%   +4.93%     
==========================================
  Files           3        4       +1     
  Lines         575      660      +85     
==========================================
+ Hits          233      300      +67     
- Misses        342      360      +18     
Impacted Files Coverage Δ
src/Clp.jl 100.00% <ø> (ø)
src/ClpCInterface.jl 22.68% <61.53%> (+3.76%) ⬆️
src/MOIWrapper.jl 76.44% <78.17%> (+5.85%) ⬆️
src/ClpSolverInterface.jl 62.76% <100.00%> (+2.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6121a5...ea97c12. Read the comment docs.

@ViralBShah
Copy link

ViralBShah commented Apr 4, 2020

Given how sensitive the solvers are to having the right versions of everything, I think we should tightly bind versions for all the jll dependencies. The reason being that we will want to start upgrading the dependencies in the artifacts, but not yet want them to be pulled by Clp.jl and GLPK.jl.

Edit: I think this binding of dependencies that Clp needs should be done in Clp_jll. This package is doing the right thing by depending on the exact version of libClp.

@ViralBShah
Copy link

I assume that even if Clp 1.17 is not compatible with 1.16, we should not be getting the double free kind of malloc related errors in Julia.

@odow
Copy link
Member Author

odow commented Apr 5, 2020

we should not be getting the double free kind of malloc related errors in Julia.

Agreed.

On the plus side, Cbc is green: jump-dev/Cbc.jl#133. Which suggests that it might be an API issue regarding how we're calling Clp.

I couldn't see any obvious offenders if you scan Clp_C_Interface.cpp and Clp_C_Interface.h:
https://github.com/coin-or/Clp/compare/releases%2F1.16.11...releases%2F1.17.5?diff=split

@ViralBShah
Copy link

ViralBShah commented Apr 6, 2020

Right, but cbc was already using Clp 1.17 even before. So it is good to know it works and suggests our libraries are correctly built.

I will take a look at the clp interface, which before was using 1.16 and see if I can discover what changed.

@ViralBShah
Copy link

I see CoinBigIndex being used. Do you know if this is set to 64-bit on 64-bit architectures? It is unclear if this affects the API

I see this comment in Clp_C_Interface.h:

/* accidentally used a bool for Clp_modifyCoefficient, so need to include stdbool.h
 * Clp_modifyCoefficient signature will change to use int with Clp 1.18

@odow
Copy link
Member Author

odow commented Apr 6, 2020

Yes, I saw the int->CoinBigIndex changes. It's hard coded to const CoinBigIndex = Int32 on the Julia wrapper. I think that is the default unless you compile with a different flag. Presumably, if it was actually Int64 Mac would also fail? I can try and see what happens.
https://github.com/JuliaOpt/Clp.jl/blob/c6121a52c6c980afa231d88944b71d90884494c0/src/ClpCInterface.jl#L210-L213

For now, we don't wrap Clp_modifyCoefficient, so that isn't a problem for us.

@ViralBShah
Copy link

Reading through CoinUtils, it seems like CoinBigIndex should be Int32 since I don't see any indication of setting a different value.

@ViralBShah
Copy link

ViralBShah commented Apr 6, 2020

Do you think that using our own ASL instead of the one Coin-OR wants to use could be an issue here?

Edit: It looks like Coin-OR is using the same ASL from netlib.

@mtanneau
Copy link
Contributor

mtanneau commented Apr 6, 2020

I tried running the tests locally; I can pin down the failure to occurring withing a call to Clp_deleteModel.

However, it does not reliably occur during a particular testset.
Initially, it occurred during MOIT.solve_constant_obj. I excluded it from the tests, and the same error occurred, simply on another testset.
I was also not able to reproduce the bug by running these testsets individually.

@odow
Copy link
Member Author

odow commented Apr 6, 2020

@mtanneau I was zeroing in on deleteModel as well. Can you try this diff:

diff --git a/src/MOIWrapper.jl b/src/MOIWrapper.jl
index e3be6ba..8c00a0f 100644
--- a/src/MOIWrapper.jl
+++ b/src/MOIWrapper.jl
@@ -70,22 +70,13 @@ end
 
 function MOI.empty!(model::Optimizer)
     old_model = model.inner
-
-    # Create new Clp object
     model.inner = Clp.ClpModel()
-
     # Copy parameters from old model into new model
-    for (option, (getter, setter)) in CLP_OPTION_MAP
-        value = getter(old_model)
-        setter(model.inner, value)
+    for (_, (getter, setter)) in CLP_OPTION_MAP
+        setter(model.inner, getter(old_model))
     end
-
     model.optimize_called = false
-
-    # Free old Clp object
-    Clp.ClpCInterface.delete_model(old_model)
-
-    return nothing
+    return
 end

@mtanneau
Copy link
Contributor

mtanneau commented Apr 6, 2020

Still happening, rarely in the same place. Sometimes a dozen testsets can run, sometimes only a few.

I tried this diff

diff --git a/src/ClpCInterface.jl b/src/ClpCInterface.jl
index 6ca604b..82bb7ca 100644
--- a/src/ClpCInterface.jl
+++ b/src/ClpCInterface.jl
@@ -225,17 +225,21 @@ mutable struct ClpModel
     function ClpModel()
         p = @clp_ccall newModel Ptr{Cvoid} ()
         prob = new(p)
-        finalizer(delete_model, prob)
+        # finalizer(delete_model, prob)
         return prob
     end
 end

since Clp_deleteModel kept being called, even though I had removed the explicit call to delete_model in MOI.empty!.

No real change, but I'm wondering whether the GC might be involved in all this.

@mtanneau
Copy link
Contributor

mtanneau commented Apr 6, 2020

finalizer(delete_model, prob)

(also I don't know what this line out actually does, but I'm guessing it's GC-related)

@odow
Copy link
Member Author

odow commented Apr 6, 2020

This is suspicious: https://travis-ci.org/github/JuliaOpt/Clp.jl/jobs/671755091#L431-L434

7f394344d000-7f39436b0000 r-xp 00000000 08:01 2055349                    /home/travis/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so.1.14.5
7f39436b0000-7f39438af000 ---p 00263000 08:01 2055349                    /home/travis/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so.1.14.5
7f39438af000-7f39438b7000 rw-p 00262000 08:01 2055349                    /home/travis/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so.1.14.5
7f39438b7000-7f3943922000 rw-p 002ab000 08:01 2055349                    /home/travis/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so.1.14.5

Looks like it built 1.14.5 and not 1.17.5?

@ViralBShah
Copy link

In one of the intermediate builds of Clp_jll I had a bad libClp version accidentally. But that should not be happening anymore.

@odow
Copy link
Member Author

odow commented Apr 6, 2020

Well, this seems to be the culprit:

Oscars-MBP:artifacts oscar$ ls */lib/libClp*
ls: */lib/libClp*: No such file or directory
Oscars-MBP:artifacts oscar$ ~/julia1.3 --project=/tmp/clp_jll
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.3.1 (2019-12-30)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(clp_jll) pkg> add Clp_jll
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
  Updating `/private/tmp/clp_jll/Project.toml`
  [06985876] + Clp_jll v1.17.5+2
  Updating `/private/tmp/clp_jll/Manifest.toml`
  [ae81ac8f] + ASL_jll v0.1.1+2
  [06985876] + Clp_jll v1.17.5+2
  [be027038] + CoinUtils_jll v2.11.4+1
  [e66e0078] + CompilerSupportLibraries_jll v0.3.3+0
  [d00139f3] + METIS_jll v4.0.3+0
  [656ef2d0] + OpenBLAS32_jll v0.3.9+1
  [7da25872] + Osi_jll v0.108.6+1
  [2a0f44e3] + Base64 
  [ade2ca70] + Dates 
  [8ba89e20] + Distributed 
  [b77e0a4c] + InteractiveUtils 
  [76f85450] + LibGit2 
  [8f399da3] + Libdl 
  [56ddb016] + Logging 
  [d6f4376e] + Markdown 
  [44cfe95a] + Pkg 
  [de0858da] + Printf 
  [3fa0cd96] + REPL 
  [9a3f8284] + Random 
  [ea8e919c] + SHA 
  [9e88b42a] + Serialization 
  [6462fe0b] + Sockets 
  [8dfed614] + Test 
  [cf7118a7] + UUIDs 
  [4ec0a83e] + Unicode 

(clp_jll) pkg> st Clp_jll
    Status `/private/tmp/clp_jll/Project.toml`
  [06985876] Clp_jll v1.17.5+2

julia> exit()
Oscars-MBP:artifacts oscar$ ls */lib/libClp*
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.1.14.5.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.1.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.la
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.1.14.5.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.1.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.la
Oscars-MBP:artifacts oscar$ 

Although, weirdly

(clp_jll) pkg> add Clp_jll@1.16
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
 Installed Clp_jll ─ v1.16.11+1
  Updating `/private/tmp/clp_jll/Project.toml`
  [06985876]  Clp_jll v1.17.5+2  v1.16.11+1
  Updating `/private/tmp/clp_jll/Manifest.toml`
  [06985876]  Clp_jll v1.17.5+2  v1.16.11+1

(clp_jll) pkg> st Clp_jll
    Status `/private/tmp/clp_jll/Project.toml`
  [06985876] Clp_jll v1.16.11+1

julia> exit()
Oscars-MBP:artifacts oscar$ ls */lib/libClp*
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.1.14.5.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.1.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.la
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.1.14.5.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.1.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.la
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClp.1.13.11.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClp.1.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClp.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClp.la
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClpSolver.1.13.11.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClpSolver.1.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClpSolver.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClpSolver.la
Oscars-MBP:artifacts oscar$ 

and even

(clp_jll) pkg> st Cbc_jll
    Status `/private/tmp/clp_jll/Project.toml`
  [38041ee0] Cbc_jll v2.10.5+1
  [06985876] Clp_jll v1.17.5+2

julia> exit()
Oscars-MBP:artifacts oscar$ ls */lib/libCbc*
88899b2937994046dd51b993acccdd5ab6aa5079/lib/libCbc.3.10.5.dylib
88899b2937994046dd51b993acccdd5ab6aa5079/lib/libCbc.3.dylib
88899b2937994046dd51b993acccdd5ab6aa5079/lib/libCbc.dylib
88899b2937994046dd51b993acccdd5ab6aa5079/lib/libCbc.la
88899b2937994046dd51b993acccdd5ab6aa5079/lib/libCbcSolver.3.10.5.dylib
88899b2937994046dd51b993acccdd5ab6aa5079/lib/libCbcSolver.3.dylib
88899b2937994046dd51b993acccdd5ab6aa5079/lib/libCbcSolver.dylib
88899b2937994046dd51b993acccdd5ab6aa5079/lib/libCbcSolver.la
Oscars-MBP:artifacts oscar$ 

@ViralBShah should the .X.Y.Z.dylib versions line up with the Git releases?

@ViralBShah
Copy link

I am using this commit: coin-or/Clp@29a3d29

Isn't that the right one?

@odow
Copy link
Member Author

odow commented Apr 6, 2020

Certainly looks like the right commit. Any explanation for the weird filename versioning?

@odow
Copy link
Member Author

odow commented Apr 17, 2020

I had printing turned on locally. It is time inside the solver (you can see the iterations take noticeably longer). Will test other instances.

@ViralBShah
Copy link

ViralBShah commented Apr 17, 2020

CoinBLAS literally just packages the reference BLAS. Are we using openblas with only 1 thread? That seems to be important for many Coin-OR libraries. In fact, I think they even try to detect it and change it internally (which probably will cause other issues for us).

@ViralBShah
Copy link

I think the old Builders also had -DNDEBUG and I wonder if disabling all debugging makes a huge difference. I can try it out after we look into things a bit more here in the next round.

@mlubin
Copy link
Member

mlubin commented Apr 17, 2020

A quick search through the Clp source shows a pretty liberal use of assert. No idea if that could cause a 2x slowdown, however.

@ViralBShah
Copy link

Yeah that can do it

@odow
Copy link
Member Author

odow commented Apr 17, 2020

2x performance gap holds pretty consistently across instances. For example, on "ftp://ftp.numerical.rl.ac.uk/pub/cuter/netlib/80BAU3B.SIF" (this one solved in presolve, however)

Oscars-MBP:Clp oscar$ ~/julia1.3 --project=/tmp/clp ~/Desktop/bench_clp.jl
    Status `/private/tmp/clp/Project.toml`
  [e2554f3b] Clp v0.7.1
Coin0506I Presolve 1779 (-483) rows, 8572 (-1227) columns and 18802 (-2200) elements
Clp0006I 0  Obj 495454.62 Primal inf 83504.592 (650) Dual inf 249967.53 (1077)
Clp0006I 110  Obj -1.5738521e+13 Primal inf 1.0640635e+13 (832)
Clp0006I 220  Obj -5.7821056e+11 Primal inf 8.7888851e+11 (1015)
Clp0006I 330  Obj -460880.02 Primal inf 421961.28 (983)
Clp0006I 440  Obj 50199.373 Primal inf 151279.89 (911)
Clp0006I 550  Obj 201455.24 Primal inf 103045.14 (885)
Clp0006I 660  Obj 294304.42 Primal inf 75760.282 (868)
Clp0006I 770  Obj 408741.76 Primal inf 56909.697 (835)
Clp0006I 880  Obj 458784.44 Primal inf 65138.018 (820)
Clp0006I 990  Obj 486771.23 Primal inf 57126.469 (817)
Clp0006I 1100  Obj 533777.08 Primal inf 49571.77 (790)
Clp0006I 1210  Obj 576732.85 Primal inf 80090.654 (780)
Clp0006I 1320  Obj 619384.97 Primal inf 62168.075 (777)
Clp0006I 1430  Obj 653545.23 Primal inf 34004.801 (763)
Clp0006I 1540  Obj 714717.59 Primal inf 46151.088 (751)
Clp0006I 1650  Obj 757672.53 Primal inf 21918.045 (686)
Clp0006I 1760  Obj 787526.67 Primal inf 16891.015 (659)
Clp0006I 1870  Obj 810076.62 Primal inf 20391.899 (653)
Clp0006I 1980  Obj 860401.58 Primal inf 19470.978 (585)
Clp0006I 2090  Obj 891417.6 Primal inf 10094.134 (563)
Clp0006I 2200  Obj 913805.98 Primal inf 9835.4044 (512)
Clp0006I 2310  Obj 926009.94 Primal inf 5813.686 (487)
Clp0006I 2420  Obj 938313.33 Primal inf 3970.137 (453)
Clp0006I 2530  Obj 953457.54 Primal inf 2856.4934 (424)
Clp0006I 2640  Obj 962422 Primal inf 11275.913 (393)
Clp0006I 2750  Obj 970689.75 Primal inf 1224.0817 (350)
Clp0006I 2860  Obj 977424.81 Primal inf 910.74223 (318)
Clp0006I 2970  Obj 983027.01 Primal inf 256.42084 (254)
Clp0006I 3080  Obj 986609.46 Primal inf 73.92618 (209)
Clp0006I 3190  Obj 987165.06 Primal inf 3.4073631 (142)
Clp0006I 3300  Obj 987224.11 Primal inf 0.014131438 (64)
Clp0006I 3380  Obj 987224.19
Clp0000I Optimal - objective value 987224.19
Coin0511I After Postsolve, objective 987224.19, infeasibilities - dual 351.40318 (7), primal 0 (0)
Coin0512I Presolved model was optimal, full model needs cleaning up
Clp0000I Optimal - objective value 987224.19
Clp0032I Optimal objective 987224.1924 - 3380 iterations time 0.152, Presolve 0.04
  0.150701 seconds
  3.800487 seconds (13.55 M allocations: 625.098 MiB, 8.18% gc time)
Oscars-MBP:Clp oscar$ ~/julia1.3 --project=. ~/Desktop/bench_clp.jl 1.17.6+1
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
  Updating git-repo `https://github.com/JuliaBinaryWrappers/Clp_jll.jl.git`
 Resolving package versions...
  Updating `~/.julia/dev/Clp/Project.toml`
 [no changes]
  Updating `~/.julia/dev/Clp/Manifest.toml`
 [no changes]
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
  Updating git-repo `https://github.com/JuliaBinaryWrappers/Clp_jll.jl.git`
 Resolving package versions...
  Updating `~/.julia/dev/Clp/Project.toml`
 [no changes]
  Updating `~/.julia/dev/Clp/Manifest.toml`
 [no changes]
Project Clp v0.7.1
    Status `~/.julia/dev/Clp/Project.toml`
  [b99e7846] BinaryProvider v0.5.8
  [06985876] Clp_jll v1.17.6+1 #01c4b8a (https://github.com/JuliaBinaryWrappers/Clp_jll.jl.git)
  [b8f27783] MathOptInterface v0.9.13
  [fdba3010] MathProgBase v0.7.8
  [8f399da3] Libdl 
  [37e2e46d] LinearAlgebra 
  [2f01184e] SparseArrays 
Coin0506I Presolve 1779 (-483) rows, 8572 (-1227) columns and 18802 (-2200) elements
Clp0006I 0  Obj 493096.66 Primal inf 83620.824 (650) Dual inf 249760.54 (1076)
Clp0006I 110  Obj -1.5738521e+13 Primal inf 1.0640635e+13 (832)
Clp0006I 220  Obj -5.7821056e+11 Primal inf 8.7888851e+11 (1015)
Clp0006I 330  Obj -460880.02 Primal inf 421961.28 (983)
Clp0006I 440  Obj 50199.373 Primal inf 151279.89 (911)
Clp0006I 550  Obj 201455.24 Primal inf 103045.14 (885)
Clp0006I 660  Obj 294304.42 Primal inf 75760.282 (868)
Clp0006I 770  Obj 408741.76 Primal inf 56909.697 (835)
Clp0006I 880  Obj 458784.44 Primal inf 65138.018 (820)
Clp0006I 990  Obj 486771.23 Primal inf 57126.469 (817)
Clp0006I 1100  Obj 533777.08 Primal inf 49571.77 (790)
Clp0006I 1210  Obj 576732.85 Primal inf 80090.654 (780)
Clp0006I 1320  Obj 619384.97 Primal inf 62168.075 (777)
Clp0006I 1430  Obj 653545.23 Primal inf 34004.801 (763)
Clp0006I 1540  Obj 714717.59 Primal inf 46151.088 (751)
Clp0006I 1650  Obj 757672.53 Primal inf 21918.045 (686)
Clp0006I 1760  Obj 787526.67 Primal inf 16891.015 (659)
Clp0006I 1870  Obj 810076.62 Primal inf 20391.899 (653)
Clp0006I 1980  Obj 860401.58 Primal inf 19470.978 (585)
Clp0006I 2090  Obj 891417.6 Primal inf 10094.134 (563)
Clp0006I 2200  Obj 913805.98 Primal inf 9835.4044 (512)
Clp0006I 2310  Obj 926009.94 Primal inf 5813.686 (487)
Clp0006I 2420  Obj 938313.33 Primal inf 3970.137 (453)
Clp0006I 2530  Obj 953457.54 Primal inf 2853.2144 (423)
Clp0006I 2640  Obj 962247 Primal inf 5660.8752 (392)
Clp0006I 2750  Obj 970685.22 Primal inf 1234.7429 (353)
Clp0006I 2860  Obj 977351.64 Primal inf 936.45569 (323)
Clp0006I 2970  Obj 983021.28 Primal inf 261.07734 (256)
Clp0006I 3080  Obj 986583.44 Primal inf 80.201399 (211)
Clp0006I 3190  Obj 987165.06 Primal inf 3.4805738 (145)
Clp0006I 3300  Obj 987224.11 Primal inf 0.015285705 (66)
Clp0006I 3382  Obj 987224.19
Clp0000I Optimal - objective value 987224.19
Coin0511I After Postsolve, objective 987224.19, infeasibilities - dual 351.40318 (7), primal 0 (0)
Coin0512I Presolved model was optimal, full model needs cleaning up
Clp0006I 0  Obj 987224.19
Clp0000I Optimal - objective value 987224.19
Clp0032I Optimal objective 987224.1924 - 3382 iterations time 0.352, Presolve 0.07
  0.358659 seconds
  4.204421 seconds (13.42 M allocations: 618.203 MiB, 3.48% gc time)

@ViralBShah
Copy link

New binaries published. Retry?

@ViralBShah
Copy link

ViralBShah commented Apr 17, 2020

On my laptop, it still takes about 50 seconds with the new binaries, and about 30 seconds with the older ones. The mystery deepens...

Could it be that Clp 1.17 is slower than 1.16?

@ViralBShah
Copy link

ViralBShah commented Apr 17, 2020

I tried 1.16, and it is even slower. 65 seconds. So definitely something about the way we are building Clp.

@ViralBShah
Copy link

ViralBShah commented Apr 17, 2020

@giordano @staticfloat I am seeing the binaries produced by BinaryBuilder are running at half speed compared to the ones produced by the old ClpBuilder.

Is this likely to be because of architecture specific optimizations in the compiler in the old builder? The other thing the old ClpBuilder did was link everything statically. Can that explain this difference? Any other ideas for why the performance is so different? If anything, we are linking against a fast BLAS in the BB binaries.

@giordano
Copy link

I didn't follow the development of the old version of BinaryBuilder, but I'm not sure it was optimising more than the current version. One thing that we do now, though, is to not allow the use of flags like -Ofast, -ffast-math, etc, which in addition to make the code running faster are also unsafe and introduce nasty bugs. However, I don't think this is the case for the libraries involved here.

@ViralBShah
Copy link

Yes, you can see the old builder script I linked above which I assume is how the current binaries are produced. It pretty much does not do doing anything like -Ofast. Do compilers do something like -mtune=native automatically which we are disabling?

@juan-pablo-vielma Any insights you might have here would be valuable.

@juan-pablo-vielma
Copy link
Contributor

Took a quick look.
I see a missing -DCOIN_USE_MUMPS_MPI_H, but maybe it's on the MUMPS builder?
Howe about ASL? Is the new builder using it? I am not sure how Clp uses it though.
Do you have pairs of logs for problems that take a few minutes? One crazy idea is static v/s dynamic linking.

@ViralBShah
Copy link

Yes, we do not need the MUMPS MPI thing, because I have already built a sequential mumps and applied their patches. @odow suggested we don't use ASL at all, and hence we removed it. Passes all tests.

Static vs. dynamic linking is the big one right now, but calls to lapack and mumps are chunky enough that it shouldn't matter. Here's the build logs:

https://dev.azure.com/JuliaPackaging/Yggdrasil/_build/results?buildId=3160&view=results

@juan-pablo-vielma
Copy link
Contributor

Maybe looking at more solve logs? In the pair that @odow posted above everything is identical up to iteration 2530 and then things change. Not sure if there is a way to get more detailed logs from Clp

@ViralBShah
Copy link

ViralBShah commented Apr 17, 2020

The other thing that is different is we are using openblas in the new builders.

@ViralBShah
Copy link

On mac the new binaries are 2x slower, but on linux they are about 10% slower.

@ViralBShah
Copy link

@staticfloat helped me debug that the optimizations flags were being clobbered, and -O3 wasn't being passed. Soon, we should have new builds and hopefully this is solved.

@ViralBShah
Copy link

Ok, new binaries published. Please try it out. For me, the optimized binaries are twice as fast as the older ones!

# With Clp_jll
julia> @time bench()
 17.777809 seconds
 19.452482 seconds (4.72 M allocations: 196.026 MiB, 0.42% gc time)

# With ClpBuilder
julia> @time bench()
 27.312733 seconds
 31.796835 seconds (16.40 M allocations: 766.499 MiB, 0.98% gc time)

@odow
Copy link
Member Author

odow commented Apr 17, 2020

Works for me!

@odow
Copy link
Member Author

odow commented Apr 17, 2020

Thank you for your outstanding help. You've probably learnt far more than you wanted to about how Clp works.

@ViralBShah
Copy link

ViralBShah commented Apr 17, 2020

Thank you for your outstanding help. You've probably learnt far more than you wanted to about how Clp works.

It's always fun to learn more about numerical software. And I am glad that these efforts paid off at least for some class of problems. Thanks for providing all the help and understanding.

@ViralBShah
Copy link

I think it will be much easier to update the Coin-OR solvers going forward.

@odow odow merged commit 13d37fd into master Apr 18, 2020
@odow odow deleted the od/jll branch April 18, 2020 02:00
@mtanneau mtanneau mentioned this pull request Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Library not loaded : @rpath/libgfortran.4.dylib Building Clp on Julia 1.3.0-rc3.0
7 participants