Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable the SLP Vectorizer optimization pass by default #26594

Merged
merged 1 commit into from
May 6, 2018

Conversation

KristofferC
Copy link
Member

@KristofferC KristofferC commented Mar 23, 2018

The justification for this is that it seems to pretty much negligble impact compilation time but has serious performance improvements for linear algebra and other operations with static arrays ( #26398 (comment))

SLP Disabled:

Sysimg build time: User time (seconds): 323.47

Running tests for StaticArrays

@testinf      |    3      3
  3.388018 seconds (4.51 M allocations: 303.673 MiB, 4.26% gc time)
SVector       |   53     53
  1.610420 seconds (946.33 k allocations: 54.435 MiB, 1.03% gc time)
MVector       |   52     52
  0.899716 seconds (366.94 k allocations: 20.813 MiB, 0.69% gc time)
SMatrix       |   68     68
  2.104827 seconds (1.26 M allocations: 71.847 MiB, 0.78% gc time)
MMatrix       |   71     71
  1.987582 seconds (754.25 k allocations: 43.105 MiB, 0.68% gc time)
SArray        |   92     92
  6.809774 seconds (4.14 M allocations: 226.083 MiB, 0.88% gc time)
MArray        |  101    101
  5.667526 seconds (2.80 M allocations: 148.517 MiB, 0.76% gc time)
FieldVector   |   27     27
  1.362807 seconds (693.47 k allocations: 40.276 MiB, 1.09% gc time)
Scalar        |    8      8
  3.657003 seconds (1.80 M allocations: 103.858 MiB, 2.87% gc time)
SUnitRange    |   10     10
  0.139713 seconds (30.73 k allocations: 1.809 MiB)
SizedArray    |   49       1     50
  2.333507 seconds (1.05 M allocations: 60.101 MiB, 1.03% gc time)
SDiagonal     |   71     71
 13.601758 seconds (13.72 M allocations: 720.386 MiB, 2.94% gc time)
Custom types  |    2      2
  0.068887 seconds (9.19 k allocations: 583.006 KiB)
Core definitions and constructors |   57     57
  1.817816 seconds (446.50 k allocations: 27.372 MiB, 0.46% gc time)
AbstractArray interface |   54     54
  3.961401 seconds (1.79 M allocations: 100.007 MiB, 1.16% gc time)
Indexing      |   73     73
  6.983536 seconds (3.72 M allocations: 211.556 MiB, 2.49% gc time)
Map, reduce, mapreduce, broadcast |   67     67
  9.380575 seconds (8.80 M allocations: 481.281 MiB, 1.86% gc time)
Array math    |  121    121
  3.541558 seconds (2.94 M allocations: 166.468 MiB, 1.89% gc time)
Broadcast sizes |   30     30
Broadcast     |   77      12     89
  8.707391 seconds (4.76 M allocations: 274.434 MiB, 1.17% gc time)
Linear algebra |   86     86
  7.220394 seconds (3.60 M allocations: 205.664 MiB, 1.42% gc time)
Matrix multiplication |   61       1     62
 28.005206 seconds (32.25 M allocations: 1.397 GiB, 4.13% gc time)

SLP Enabled

Sysimg build time: User time (seconds): 329.58

Running tests for StaticArrays

@testinf      |    3      3
  3.524660 seconds (4.51 M allocations: 304.837 MiB, 5.88% gc time)
SVector       |   53     53
  1.594506 seconds (947.30 k allocations: 54.376 MiB, 1.05% gc time)
MVector       |   52     52
  0.907068 seconds (367.29 k allocations: 20.782 MiB, 0.68% gc time)
SMatrix       |   68     68
  2.124502 seconds (1.26 M allocations: 71.770 MiB, 0.81% gc time)
MMatrix       |   71     71
  1.683213 seconds (754.89 k allocations: 43.175 MiB, 0.74% gc time)
SArray        |   92     92
  6.771967 seconds (4.15 M allocations: 225.854 MiB, 0.91% gc time)
MArray        |  101    101
  5.707790 seconds (2.80 M allocations: 148.675 MiB, 0.78% gc time)
FieldVector   |   27     27
  1.229272 seconds (694.42 k allocations: 40.199 MiB, 1.04% gc time)
Scalar        |    8      8
  3.683220 seconds (1.80 M allocations: 104.034 MiB, 2.84% gc time)
SUnitRange    |   10     10
  0.119789 seconds (30.79 k allocations: 1.793 MiB)
SizedArray    |   49       1     50
  2.187806 seconds (1.05 M allocations: 60.016 MiB, 0.97% gc time)
SDiagonal     |   71     71
 13.556212 seconds (13.73 M allocations: 720.349 MiB, 2.92% gc time)
Custom types  |    2      2
  0.064020 seconds (9.20 k allocations: 583.443 KiB)
Core definitions and constructors |   57     57
  1.797976 seconds (446.71 k allocations: 27.369 MiB, 0.52% gc time)
AbstractArray interface |   54     54
  3.770388 seconds (1.79 M allocations: 99.841 MiB, 1.12% gc time)
Indexing      |   73     73
  7.115772 seconds (3.73 M allocations: 211.120 MiB, 2.52% gc time)
Map, reduce, mapreduce, broadcast |   67     67
 10.392894 seconds (8.81 M allocations: 482.237 MiB, 1.76% gc time)
Array math    |  121    121
  3.884809 seconds (2.94 M allocations: 166.417 MiB, 1.86% gc time)
Broadcast sizes |   30     30
Broadcast     |   77      12     89
  9.460607 seconds (4.76 M allocations: 273.940 MiB, 1.15% gc time)
Linear algebra |   86     86
  7.504202 seconds (3.61 M allocations: 205.617 MiB, 1.40% gc time)
Matrix multiplication |   61       1     62
 28.755902 seconds (32.22 M allocations: 1.396 GiB, 4.07% gc time)

@KristofferC KristofferC added performance Must go faster triage This should be discussed on a triage call labels Mar 23, 2018
@KristofferC
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs = ":master")

@KristofferC
Copy link
Member Author

Meant to run this:

@nanosoldier runbenchmarks(ALL, vs = "@f0087141b79736beec7b5a2ee946d5c4ec257167")

@iamed2
Copy link
Contributor

iamed2 commented Mar 23, 2018

Those test runs don't seem to show noticeable performance improvements. Is this working properly?

@KristofferC
Copy link
Member Author

The test run is supposed to show compilation time.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Mar 23, 2018
@iamed2
Copy link
Contributor

iamed2 commented Mar 23, 2018

Oh that makes sense, sorry.

@KristofferC
Copy link
Member Author

@ararslan any idea about Nanosoldier?

@ararslan
Copy link
Member

Ugh, yeah. It's hitting JuliaWeb/HTTP.jl#220 again. I'll restart the server.

@ararslan
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

@KristofferC
Copy link
Member Author

Any ideas about nanosoldier @ararslan?

@ararslan
Copy link
Member

Same error, but it looks like other requests have gotten through, so I'll just try again. GitHub.jl needs to be updated to work with the changes in HTTP.jl, which is why the error keeps happening.

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

@KristofferC
Copy link
Member Author

Perhaps needs a restart @ararslan?

@ararslan
Copy link
Member

I restarted the server, dunno if it will help but worth a try.

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

@KristofferC
Copy link
Member Author

I ran the benchmarks locally and got https://gist.github.com/KristofferC/c220cb87cda1f77654ed3f89edd5ec60 (filtering out the scalar results because they seemed like noise).

My computer was not completely idle when running though so not sure how reliable those are. What I was mostly interested in looking at was the tuple linear algebra ones which all seems to have gotten a significant boost. Would be nice to get a real nanosoldier run though.

@ararslan
Copy link
Member

Other Nanosoldier runs have been getting through but for whatever reason this particular PR seems to hit the error from HTTP every time.

@andyferris
Copy link
Member

I definitely think we should do this, though I've been confused over time as to what happens when this is off (e.g. when I don't use -O3) since I can still see xmm and ymm registers and so-on being used - are there other parts of LLVM or the Julia compiler that would likely make that happen?

@ararslan
Copy link
Member

I modified the logging level in the server, so if this fails, hopefully we'll have a better understanding of why. Sorry for the noise here, Kristoffer, and thanks for your patience.

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

@ararslan
Copy link
Member

I had to modify the server's local clone of HTTP to fix an UndefVarError. I'll submit a PR for that to HTTP, but in the meantime:

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

@ararslan
Copy link
Member

This PR hit the same error again, and unfortunately Nanosoldier is now down until further notice for on-site work.

@KristofferC
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs="@c12922eeea1afb59d05477698d408e4ff54ff7f1")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@ararslan
Copy link
Member

Glad a Nanosoldier run was able to go through finally. I wouldn't put too much weight on those results though, since the benchmarks hadn't been retuned before running.

@ararslan
Copy link
Member

Benchmarks have been retuned.

@nanosoldier runbenchmarks(ALL, vs="@c12922eeea1afb59d05477698d408e4ff54ff7f1")

@KristofferC
Copy link
Member Author

KristofferC commented Mar 30, 2018

Maybe I should rebase this? I don't think it is running on the merge commit.

@ararslan
Copy link
Member

It should always be running on the merge commit unless there are branch conflicts.

@KristofferC
Copy link
Member Author

Okay but I m quite sure it doesn't because e.g.the memory regressions from #26435 (comment) also shows up here which mean that this commit includes that PR but not the one that we compared against.

@ararslan
Copy link
Member

Oh sorry you're right, I don't think it checks out the merge commit with master if it's comparing against another specific commit. Then yes, it would be good to rebase this.

@KristofferC
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs = ":vc/llvm6")

let's try this

@ararslan
Copy link
Member

Nanosoldier seems to be consistently hitting the IOError again.

@KristofferC
Copy link
Member Author

JuliaWeb/GitHub.jl#108 maybe :/

@ararslan
Copy link
Member

Trying something out, testing in production. #yolo.

@nanosoldier runbenchmarks(ALL, vs=":vc/llvm6")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@ararslan
Copy link
Member

He lives!

@KristofferC KristofferC changed the base branch from vc/llvm6 to master April 18, 2018 12:29
@KristofferC
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vchuravy
Copy link
Member

vchuravy commented May 6, 2018

rebase onto master, and then LGTM

@KristofferC
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vchuravy
Copy link
Member

vchuravy commented May 6, 2018

CI failures are all network failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants