Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark against 1.0.0 for potential 1.1.0 release #30218

Closed
wants to merge 1 commit into from

Conversation

KristofferC
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs = ":release-1.0")

Not sure if we should also run against 1.0.0?

@KristofferC KristofferC added the DO NOT MERGE Do not merge this PR! label Nov 30, 2018
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@KristofferC
Copy link
Member Author

KristofferC commented Dec 3, 2018

Stuff to look at:


["array", "index", "(\"sumcartesian_view\", \"100000:-1:1\")"] | 61052.43 (50%) ❌ | 1.00 (1%)
["array", "index", "(\"sumcartesian_view\", \"1:100000\")"] | 535.94 (50%) ❌ | 1.00 (1%)

["array", "index", "(\"sumeach_view\", \"100000:-1:1\")"] | 58277.14 (50%) ❌ | 1.00 (1%)
["array", "index", "(\"sumeach_view\", \"1:100000\")"] | 642.67 (50%) ❌ | 1.00 (1%)

["array", "index", "(\"sumlinear_view\", \"100000:-1:1\")"] | 61051.76 (50%) ❌ | 1.00 (1%)
["array", "index", "(\"sumlinear_view\", \"1:100000\")"] | 567.00 (50%) ❌ | 1.00 (1%)


["array", "setindex!", "(\"setindex!\", 1)"] | 2.24 (5%) ❌ | 1.00 (1%)
["array", "setindex!", "(\"setindex!\", 2)"] | 2.17 (5%) ❌ | 1.00 (1%)
["array", "setindex!", "(\"setindex!\", 3)"] | 2.15 (5%) ❌ | 1.00 (1%)
["array", "setindex!", "(\"setindex!\", 4)"] | 2.19 (5%) ❌ | 1.00 (1%)

Fixed by #30248


["micro", "printfd"] | 2.07 (5%) ❌ | 1.72 (1%) ❌

["misc", "iterators", "zip(1:1, 1:1, 1:1, 1:1)"] | 3.09 (5%) ❌ | 2.43 (1%) ❌

@martinholters ?

@martinholters
Copy link
Member

I won't have time to look into the zip thing this week, but please ping me next week if needed.

@StefanKarpinski
Copy link
Member

@mbauman: can you take a look at the array indexing regressions?

@KristofferC
Copy link
Member Author

They are most likely a case where before LLVM did the math and computed the answer without needing to loop while now, perhaps it can't see through the view abstraction.

@KristofferC
Copy link
Member Author

KristofferC commented Dec 4, 2018

Regarding

["micro", "printfd"] | 2.07 (5%) ❌ | 1.72 (1%) ❌

this was caused by #29907

Before reverting

  1.410 ms (10 allocations: 672 bytes)

After reverting:

717.883 μs (10 allocations: 672 bytes)

I thought task-local storage might be too slow for this, but fortunately the slowdown is only about 5-6%.

image

cc @JeffBezanson

@KristofferC
Copy link
Member Author

KristofferC commented Dec 4, 2018

Regarding

["array", "index", "(\"sumcartesian_view\", \"100000:-1:1\")"] | 61052.43 (50%) ❌ | 1.00 (1%)

and co,. on 1.0.2 the compiler can indeed do the arithmetic for e.g.

function perf_sumcartesian_view(A)
    s = zero(eltype(A))
    @inbounds @simd for I in CartesianIndices(size(A))
        val = view(A, I)
        s += val[]
    end
    return s
end

A = 1:100000000

while on master, it generates beautifully vectorized code, but obviously, working hard doesn't beat being smart.

@mbauman
Copy link
Member

mbauman commented Dec 4, 2018

I can kick off a bisect for that one.

@KristofferC
Copy link
Member Author

I am doing that right now :)

@KristofferC
Copy link
Member Author

KristofferC commented Dec 4, 2018

Regarding

["array", "index", "(\"sumcartesian_view\", \"100000:-1:1\")"] | 61052.43 (50%) ❌ | 1.00 (1%)

Regression introduced in #29895 cc @mfsch

@mbauman
Copy link
Member

mbauman commented Dec 4, 2018

Regression introduced in #29895 (no Nanosoldier run)

Dang, that is quite the surprise but my bisect agrees. No Nanosoldier run because, well, I don't have a mental model for how such a change would regress anything. Further, I don't think we have any BaseBenchmarks for scalar views, which is what that change was targeting. No need to CC the author (a first-time committer); @mfsch you should not worry about your PR or feel obligated to address this. Heck, I'm not sure how to address it — it's a great change that I want to keep.

Are we going over some magical number of methods? Or a type complexity heuristic?

@chethega
Copy link
Contributor

chethega commented Dec 4, 2018

I don't really get why the bisected PR impacts that, but FYI the fast version uses an O(1) summation algorithm (explicit formula) instead of vectorized code.

It is pretty impressive that llvm sometimes replaces reductions over integer ranges by explicit formulas. But I don't think that is a realistic case to worry about: People should never rely on compiler optimizations for complexity class. In this case O(1) vs O(N) for computing sum(1:N) == ( N*(N+1) )>>1.

@KristofferC
Copy link
Member Author

KristofferC commented Dec 4, 2018

People should never rely on compiler optimizations for complexity class. In this case O(1) vs O(N) for computing sum(1:N) == ( N*(N+1) )>>1.

While true, the regression here means that LLVM understand less about our SubArrays which can likely have effect in other contexts than just changing an O(n) to O(1).

@mbauman
Copy link
Member

mbauman commented Dec 4, 2018

Yeah, these indexing benchmarks have always sat on the knife's edge of O(1)-ization, but it's been a good stress test to ensure that we have the complicated indexing machinery as fast as we can make it and expressed in a manner that LLVM likes.

These regressions actually are all scalar views — I was wrong about not having benchmarks for this case. That PR shifts which methods get defined for 0-d views: the method SubArray implements shifts from being v[] to being v[1]… but the benchmarks call v[], so now our fallback indexing machinery has to insert that 1 for us and subarray has to do an addition it didn't need to do anymore. I wonder if full Cartesian indexing on "Fast" SubArrays via re-indexing would be faster (or equivalent) to doing the linearization up-front. Or perhaps we have something that's not quite optimal in the fallbacks.

@martinholters
Copy link
Member

martinholters commented Dec 10, 2018

["misc", "iterators", "zip(1:1, 1:1, 1:1, 1:1)"] | 3.09 (5%) ❌ | 2.43 (1%) ❌

Bisected to 1324ceb (#28284). I'll try to figure out what the problem is there, but can't promise anything...

EDIT: See #30331.

@JeffBezanson
Copy link
Member

I'll work on the printf regression. I believe the problem is that printf re-fetches the task-local buffer many times, and it should instead be saved in a local variable.

JeffBezanson added a commit that referenced this pull request Dec 12, 2018
mostly fixes the regression identified in #30218
@KristofferC KristofferC changed the base branch from master to backport-1.1.0 December 13, 2018 00:14
KristofferC pushed a commit that referenced this pull request Dec 13, 2018
KristofferC pushed a commit that referenced this pull request Dec 13, 2018
mostly fixes the regression identified in #30218

(cherry picked from commit e836937)
@KristofferC
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs = ":release-1.0")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@KristofferC
Copy link
Member Author

Ok, microfd is only at 18% regression now, and the one remaining is the optimizations for the scalar view which, from what I understand, we will accept. In that case, benchmarking looks good.

@ararslan ararslan deleted the KristofferC-patch-7 branch December 17, 2018 20:54
@DilumAluthge DilumAluthge removed the DO NOT MERGE Do not merge this PR! label Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants