Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regressions from v0.18 #1403

Closed
IainNZ opened this issue Aug 5, 2018 · 22 comments
Closed

Performance regressions from v0.18 #1403

IainNZ opened this issue Aug 5, 2018 · 22 comments
Milestone

Comments

@IainNZ
Copy link
Collaborator

IainNZ commented Aug 5, 2018

test/perf/speed.jl

On Julia 0.6.3, JuMP release-0.18 branch:

IAINMBP:perf idunning$ jl speed.jl
PMEDIAN BUILD MIN=0.972930426  MED=1.147175941
PMEDIAN WRITE MIN=3.00756461  MED=3.071367754
CONT5 BUILD   MIN=0.379140023  MED=0.524470491
CONT5 WRITE   MIN=2.137859765  MED=2.267543146

On JuMP master, with write branch no-oped because writeLP/writeMPS are gone.

IAINMBP:perf idunning$ j6 speed.jl
PMEDIAN BUILD MIN=7.588425246  MED=7.81608915
PMEDIAN WRITE MIN=3.3e-8  MED=4.0e-8
CONT5 BUILD   MIN=3.352726934  MED=3.427871697
CONT5 WRITE   MIN=3.2e-8  MED=3.7e-8

IAINMBP:perf idunning$ j7 speed.jl
PMEDIAN BUILD MIN=5.852603898  MED=6.03251401
PMEDIAN WRITE MIN=3.2e-8  MED=3.8e-8
CONT5 BUILD   MIN=2.542352364  MED=2.594608011
CONT5 WRITE   MIN=3.2e-8  MED=3.7e-8

IAINMBP:perf idunning$ j1 speed.jl
PMEDIAN BUILD MIN=5.816850944  MED=6.002762153
PMEDIAN WRITE MIN=3.7e-8  MED=3.7e-8
CONT5 BUILD   MIN=2.570388408  MED=2.58887144
CONT5 WRITE   MIN=3.2e-8  MED=3.7e-8

So definitely an improvement in 0.7/1.0 over 0.6.

(Edited 2018/8/19 to add 0.7 an 1.0)

@IainNZ
Copy link
Collaborator Author

IainNZ commented Aug 5, 2018

test/perf/macro.jl

On Julia 0.6.3, JuMP release-0.18 branch:

Test 1
  Running N=20...
    N=20 min 0.008989889
    N=20 min 0.026527314
  Running N=50...
    N=50 min 0.185324626
    N=50 min 0.6867126
  Running N=100...
    N=100 min 2.064309912
    N=100 min 5.455418631

On Julia 0.6.3, JuMP master

Test 1
  Running N=20...
    N=20 min 0.109180035
    N=20 min 0.165221182
  Running N=50...
    N=50 min 1.659526812
    N=50 min 3.319008649
  Running N=100...

Gave up after couple of minutes...

@IainNZ
Copy link
Collaborator Author

IainNZ commented Aug 5, 2018

For vector_speedtest.jl, similar 10x regression.

On master:

n = 10
Vector with sum(): 4.9004e-5
Vector with vecdot() : 2.2591e-5
2D Matrix with sum(): 5.0597e-5
2D Matrix with bigvecdot(): 2.5236e-5
3D Matrix with sum(): 0.000272857
3D Matrix with vecdot(): 0.000234558
n = 50
Vector with sum(): 2.0794e-5
Vector with vecdot() : 2.0431e-5
2D Matrix with sum(): 0.000479182
2D Matrix with bigvecdot(): 0.00049265
3D Matrix with sum(): 0.042774696
3D Matrix with vecdot(): 0.05988021
n = 100
Vector with sum(): 3.1725e-5
Vector with vecdot() : 3.4341e-5
2D Matrix with sum(): 0.082652038
2D Matrix with bigvecdot(): 0.002823639
3D Matrix with sum(): 0.711038097
3D Matrix with vecdot(): 1.303699624
n = 200
Vector with sum(): 5.2218e-5
Vector with vecdot() : 6.4438e-5
2D Matrix with sum(): 0.011597388
2D Matrix with bigvecdot(): 0.013507154
3D Matrix with sum(): 9.78205693
3D Matrix with vecdot(): 15.292601086

Gave up here

@blegat
Copy link
Member

blegat commented Aug 6, 2018

It would be interested to see the results against test/JuMPExtensions.jl to see if this is related to MOI or to e.g. the use of OrderedDict in GenericAffExpr.

@blegat
Copy link
Member

blegat commented Aug 6, 2018

It might also be interested to see the impact of removing the copy here: https://github.com/JuliaOpt/MathOptInterface.jl/blob/253e16c23882fbdcbcd48ed5381b5be216db3657/src/Utilities/model.jl#L272

@ExpandingMan
Copy link
Contributor

I'm not sure what's happening in these tests, but one thing that will definitely be needed at some point is a custom OrderedDict constructor, if not a custom implementation. Creating OrderedDict is expensive and currently nothing is done to alleviate those costs. It has to be cheap to create GenericAffExpr if the linear algebra operations in operators.jl are to be tractable.

@IainNZ
Copy link
Collaborator Author

IainNZ commented Aug 19, 2018

Updated first post with 0.7/1.0 numbers - definitely better, but still way off.

@blegat
Copy link
Member

blegat commented Aug 22, 2018

I just made a quick benchmark on Julia 1.0, I get the same timing than in the first post for the JuMP.Model and with a JuMPExtension.MyModel, I get

PMEDIAN BUILD MIN=2.334072162  MED=4.033828604
PMEDIAN WRITE MIN=0.0  MED=0.0
CONT5 BUILD   MIN=0.711627309  MED=0.932109656
CONT5 WRITE   MIN=0.0  MED=0.0

Note that the main difference between a JuMPExtension.MyModel and a JuMP.Model is that the former stores the constraint in GenericAffExpr form and does not use MOI while the JuMP.Model stores the constraints in MOI in the MOI.ScalarAffineFunction form.

@mlubin mlubin added this to the 0.19 milestone Sep 22, 2018
@IainNZ
Copy link
Collaborator Author

IainNZ commented Nov 10, 2018

Running these again at JuMP commit 096e65253aa8c57b4f1cc0e019babbd49e537908 (from Nov 10) and MOI commit 4cf5a73020f5bc678d7548a96740baab2dc29985 (also from Nov 10), with Julia v1.0.0.

IAINMBP:JuMP idunning$ j1 --project=. test/perf/speed.jl
PMEDIAN BUILD MIN=4.576138518  MED=4.678331768
PMEDIAN WRITE MIN=3.2e-8  MED=3.8e-8
CONT5 BUILD   MIN=2.18141304  MED=2.245001559
CONT5 WRITE   MIN=2.7e-8  MED=3.7e-8

So roughly 4-4.5x slowdown versus JuMP 0.18 on Julia 0.6.3.

IAINMBP:JuMP idunning$ j1 --project=. test/perf/macro.jl
Test 1
  Running N=20...
    N=20 min 0.041255921
    N=20 min 0.092756913
  Running N=50...
    N=50 min 0.856259862
    N=50 min 2.375251773
IAINMBP:JuMP idunning$ j1 --project=. test/perf/vector_speedtest.jl
n = 10
Vector with sum(): 2.4799e-5
Vector with vecdot() : 0.027370036
2D Matrix with sum(): 2.5529e-5
2D Matrix with bigvecdot(): 2.0713e-5
3D Matrix with sum(): 0.00019005
3D Matrix with vecdot(): 0.000189322
n = 50
Vector with sum(): 1.4541e-5
Vector with vecdot() : 1.8622e-5
2D Matrix with sum(): 0.000432729
2D Matrix with bigvecdot(): 0.000402655
3D Matrix with sum(): 0.062735353
3D Matrix with vecdot(): 0.047802687
n = 100
Vector with sum(): 2.6186e-5
Vector with vecdot() : 3.075e-5
2D Matrix with sum(): 0.002175957
2D Matrix with bigvecdot(): 0.00193332
3D Matrix with sum(): 0.952785796
3D Matrix with vecdot(): 0.896865249
n = 200
Vector with sum(): 5.1098e-5
Vector with vecdot() : 5.4872e-5
2D Matrix with sum(): 0.015075274
2D Matrix with bigvecdot(): 0.015303574
3D Matrix with sum(): 8.964894618
3D Matrix with vecdot(): 21.701227773

@IainNZ
Copy link
Collaborator Author

IainNZ commented Nov 10, 2018

Reducing down speed.jl to its core:

using JuMP

function minipmed()
    I, J = 5000, 100
    model = Model()
    x = @variable(model, [1:I, 1:J], Bin)
    @show I * J
    @objective(model, Min, sum(abs(j - i) * x[i, j] for i in 1:I, j in 1:J))
end

@time minipmed()
@time minipmed()
@time minipmed()
@time minipmed()
@time minipmed()

On Julia 1.0.0, JuMP master vs 0.18:

master

IAINMBP:JuMP idunning$ j1 --project=. minipmed.jl
I * J = 500000
  2.410414 seconds (9.54 M allocations: 797.234 MiB, 17.41% gc time)
I * J = 500000
  0.655159 seconds (5.00 M allocations: 568.758 MiB, 43.67% gc time)
I * J = 500000
  0.655618 seconds (5.00 M allocations: 568.758 MiB, 43.80% gc time)
I * J = 500000
  0.707328 seconds (5.00 M allocations: 574.758 MiB, 44.61% gc time)
I * J = 500000
  0.662921 seconds (5.00 M allocations: 568.758 MiB, 44.68% gc time)

0.18

IAINMBP:JuMP18 idunning$ j1 --project=. ../dev/JuMP/minipmed.jl
I * J = 500000
  0.898920 seconds (3.34 M allocations: 197.492 MiB, 19.03% gc time)
I * J = 500000
  0.076578 seconds (500.23 k allocations: 59.124 MiB)
I * J = 500000
  0.188457 seconds (500.23 k allocations: 59.124 MiB, 57.91% gc time)
I * J = 500000
  0.081805 seconds (500.23 k allocations: 59.124 MiB)
I * J = 500000
  0.153297 seconds (500.23 k allocations: 59.124 MiB, 49.59% gc time)

@IainNZ
Copy link
Collaborator Author

IainNZ commented Nov 10, 2018

Worth noting that if you comment out the objective then its 3 M allocations
So 500k variables, 6 allocations to create them, 4 allocations in the sum, total 5 million.

@IainNZ
Copy link
Collaborator Author

IainNZ commented Nov 10, 2018

This variant:

function minipmed()
    I, J = 5000, 100
    model = Model()
    x = @variable(model, [1:I, 1:J], Bin)
    @show I * J
    y = 0 * x[1, 1]
    for i in 1:I, j in 1:J
        JuMP.add_to_expression!(y, 1.0 * abs(j - i), x[i, j])
    end
    @objective(model, Min, y)
    # @objective(model, Min, sum(abs(j - i) * x[i, j] for i in 1:I, j in 1:J))
end

Also has 5M allocations, suggesting add_to_expression is allocating 4 times?

@IainNZ
Copy link
Collaborator Author

IainNZ commented Nov 10, 2018

In add_or_set!, commenting out the get doesn't drop the 4 extra allocs, but copying out the set does. Implying just setting the value is 4 allocs.

@IainNZ
Copy link
Collaborator Author

IainNZ commented Nov 10, 2018

const OrderedDict = Dict and simplifying the VariableRef hash have no effect.

@IainNZ
Copy link
Collaborator Author

IainNZ commented Nov 10, 2018

To drill down into just the ordereddict time:


function minipmed2(I, J, model, x)
    y = 0 * x[1, 1]
    for i in 1:I, j in 1:J
        JuMP.add_to_expression!(y, 1.0 * abs(j - i), x[i, j])
    end
end

function run()

    I, J = 5000, 100
    model = Model()
    x = @variable(model, [1:I, 1:J], Bin)
    @show I * J

    @time minipmed2(I, J, model, x)
    @time minipmed2(I, J, model, x)
    @time minipmed2(I, J, model, x)
    @time minipmed2(I, J, model, x)
    @time minipmed2(I, J, model, x)

end

run()

Stubbing out ordereddict with an old-style double-vector AffExpr restores performance.

@IainNZ
Copy link
Collaborator Author

IainNZ commented Nov 10, 2018

... and actually running that code back with original OrderedDict is free (!). Its all in setobjective!

@mlubin
Copy link
Member

mlubin commented Nov 10, 2018

I tracked the issue down to moi_function:

using BenchmarkTools
using JuMP

I, J = 5000, 100
@show I * J
model = Model()
x = @variable(model, [1:I, 1:J], Bin)
y = 0 * x[1, 1]
for i in 1:I, j in 1:J
    JuMP.add_to_expression!(y, 1.0 * abs(j - i), x[i, j])
end

@btime JuMP.moi_function($y)

yields

  21.988 ms (2000020 allocations: 70.04 MiB)

@mlubin
Copy link
Member

mlubin commented Nov 11, 2018

@mlubin
Copy link
Member

mlubin commented Nov 13, 2018

Before (#1403 (comment)):

0.656219 seconds (5.00 M allocations: 568.758 MiB, 37.93% gc time)

After jump-dev/MathOptInterface.jl#567, jump-dev/MathOptInterface.jl#568, and #1604:

0.383531 seconds (2.00 M allocations: 119.749 MiB, 35.66% gc time)

@mlubin
Copy link
Member

mlubin commented Dec 1, 2018

The most egregious performance regressions have been addressed as far as I'm aware, so I'm removing the 0.19 milestone. The remaining planned performance improvements are discussed at:

We'll want to revisit the comparison with 0.18 after that.

@mlubin mlubin modified the milestones: 0.19, 1.0 Dec 1, 2018
@ExpandingMan
Copy link
Contributor

Note that there are still some pretty severe linear algebra regressions:

Random.seed!(999)

const N = 100

randmat() = Int[rand() > 0.8 ? 1 : 0 for i  1:N, j  1:N]

m = Model()

@variable(m, x[1:N, 1:N])

A = randmat()

on 0.18:

julia> @btime x*A;
  133.674 ms (3050021 allocations: 231.10 MiB)

julia> @btime A*x*A;
  2.653 s (10120042 allocations: 5.82 GiB)

on current master:

julia> @btime x*A;
  397.424 ms (8210029 allocations: 666.43 MiB)

julia> @btime A*x*A;
  24.011 s (14600058 allocations: 9.22 GiB)

@blegat
Copy link
Member

blegat commented Feb 15, 2019

The issue is
https://github.com/JuliaOpt/JuMP.jl/blob/13cf483622dd7adbdc200bf923914fd2fee55fc3/src/operators.jl#L416
which creates a temporary expression before merging it into the other one.
I am working on a fix

@odow
Copy link
Member

odow commented Oct 20, 2021

At this point I'm going to close this issue. No one has linked from another issue since October 2019, and no one has posted a follow-up comment complaining.

We can use #2735 and #42 (a 2-digit issue!) to track on-going performance. Ideally, we'll set up something that makes it easy to benchmark JuMP, and then keep attacking that. We can also open issues if specific performance problems are identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants