-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark against 1.0.0 for potential 1.1.0 release #30218
Conversation
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Stuff to look at:
Fixed by #30248
|
I won't have time to look into the |
@mbauman: can you take a look at the array indexing regressions? |
They are most likely a case where before LLVM did the math and computed the answer without needing to loop while now, perhaps it can't see through the view abstraction. |
Regarding
this was caused by #29907 Before reverting
After reverting:
|
Regarding
and co,. on 1.0.2 the compiler can indeed do the arithmetic for e.g. function perf_sumcartesian_view(A)
s = zero(eltype(A))
@inbounds @simd for I in CartesianIndices(size(A))
val = view(A, I)
s += val[]
end
return s
end
A = 1:100000000 while on master, it generates beautifully vectorized code, but obviously, working hard doesn't beat being smart. |
I can kick off a bisect for that one. |
I am doing that right now :) |
Dang, that is quite the surprise but my bisect agrees. No Nanosoldier run because, well, I don't have a mental model for how such a change would regress anything. Are we going over some magical number of methods? Or a type complexity heuristic? |
I don't really get why the bisected PR impacts that, but FYI the fast version uses an It is pretty impressive that llvm sometimes replaces reductions over integer ranges by explicit formulas. But I don't think that is a realistic case to worry about: People should never rely on compiler optimizations for complexity class. In this case |
While true, the regression here means that LLVM understand less about our SubArrays which can likely have effect in other contexts than just changing an O(n) to O(1). |
Yeah, these indexing benchmarks have always sat on the knife's edge of These regressions actually are all scalar views — I was wrong about not having benchmarks for this case. That PR shifts which methods get defined for 0-d views: the method |
I'll work on the printf regression. I believe the problem is that printf re-fetches the task-local buffer many times, and it should instead be saved in a local variable. |
mostly fixes the regression identified in #30218
mostly fixes the regression identified in #30218
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Ok, |
@nanosoldier
runbenchmarks(ALL, vs = ":release-1.0")
Not sure if we should also run against
1.0.0
?