Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inlining arithmetic can produce slowdowns #13350

Closed
timholy opened this issue Sep 28, 2015 · 11 comments
Closed

Inlining arithmetic can produce slowdowns #13350

timholy opened this issue Sep 28, 2015 · 11 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@timholy
Copy link
Member

timholy commented Sep 28, 2015

I've got an application where I'm seeing a big performance difference between

cm*vm + c*v + cp*vp

and

(cm*vm + c*v) + cp*vp

This is related to previous issues #6193 (comment) #5011 #7075 (comment) #10278. However, they're all closed, so I thought it would be better to start fresh.

This demo relies on ForwardDiff.jl, and uses the code in this gist.

Timing results:

julia> include("perf_parens.jl")
Warmup @time
  0.000001 seconds (3 allocations: 144 bytes)
Without parentheses:
  0.350804 seconds (16.00 M allocations: 488.282 MB, 19.45% gc time)
With parentheses:
  0.085496 seconds (15 allocations: 624 bytes)

Considerably more discussion can be found here.

Anything to do here? Or just tweak Interpolations.jl (which is where those functions get @generated) to insert the parentheses?

@jrevels
Copy link
Member

jrevels commented Sep 28, 2015

To add a little context, GradientNumber is the number type with forcibly inlined arithmetic. With that in mind, I think the most useful comparisons here might be found by calling the below code (after loading the gist linked by @timholy above):

G = ForwardDiff.GradientNumber{1,Float64,Tuple{Float64}}
gdx, gdy = rand(G), rand(G)

@code_llvm mygetindex_slow(A, gdx, gdy)
@code_llvm mygetindex_fast(A, gdx, gdy)

In Tim's actual gist, ForwardDiff uses G = ForwardDiff.GradientNumber{2,Float64,Tuple{Float64, Float64}}, but the single-component case shown above is more minimal for the sake of demonstrating the performance differences.

@tkelman tkelman added the performance Must go faster label Sep 28, 2015
@timholy
Copy link
Member Author

timholy commented Sep 29, 2015

Actually, @jrevels, since this affects even the 1d case, we can make this even easier. It really is just the two arithmetic operations at the top. Would you be able to put together a demonstration in "raw" julia, without using ForwardDiff at all? Might help debug this.

@jrevels
Copy link
Member

jrevels commented Sep 29, 2015

Would you be able to put together a demonstration in "raw" julia, without using ForwardDiff at all?

I wasn't able to reproduce this when the involved number types are Base number types; or do you mean an example with a simpler wrapper type than GradientNumber, on which some inlined arithmetic functions are defined?

simonster added a commit that referenced this issue Sep 29, 2015
These were broken by #11274 but no one noticed. Fixes #13350.
@timholy
Copy link
Member Author

timholy commented Sep 29, 2015

Yes, or just a version for ForwardDiff itself that strips the operations down to the essentials. If it can be reduced to ~20 loc or something, that might make it a lot easier for folks. (I frequently do that when I report bugs, but in this case you understand ForwardDiff far better than I.)

@timholy
Copy link
Member Author

timholy commented Sep 29, 2015

Ah, but see #13355!

@JeffBezanson JeffBezanson added the regression Regression in behavior compared to a previous version label Sep 29, 2015
simonster added a commit that referenced this issue Sep 30, 2015
These were broken by #11274 but no one noticed. Fixes #13350.

(cherry picked from commit 7dfcf70)
ref #13355
simonster added a commit that referenced this issue Oct 22, 2015
These were broken by #11274 but no one noticed. Fixes #13350.
simonster added a commit that referenced this issue Oct 22, 2015
These were broken by #11274 but no one noticed. Fixes #13350.
simonster added a commit that referenced this issue Oct 22, 2015
These were broken by #11274 but no one noticed. Fixes #13350.
@JeffBezanson
Copy link
Member

I am now seeing much worse performance here, and it does not seem to be fixed by #13355.

@pao
Copy link
Member

pao commented Oct 23, 2015

Reopening based on @JeffBezanson's comment from yesterday--this got autoclosed by #13355.

@pao pao reopened this Oct 23, 2015
bjarthur pushed a commit to bjarthur/julia that referenced this issue Oct 27, 2015
These were broken by JuliaLang#11274 but no one noticed. Fixes JuliaLang#13350.
@jrevels jrevels added the potential benchmark Could make a good benchmark in BaseBenchmarks label Nov 13, 2015
@yuyichao
Copy link
Contributor

Is this a dup of #12219 ?

@simonster
Copy link
Member

The original issue was kind of the opposite of #12219: we had type information, but we weren't inlining aggressively enough to avoid calling * with a suboptimal calling convention. It was basically the same issue as #5011, and my fix was to fix the heuristic that @JeffBezanson introduced in 1316f81, which had been broken by inference refactoring in #11274. But I think there was a (different?) regression on master, which I haven't had time to take a look at.

@simonster
Copy link
Member

The new regression that @JeffBezanson saw may be #14294.

@vtjnash vtjnash closed this as completed Feb 27, 2016
@vtjnash
Copy link
Member

vtjnash commented Feb 27, 2016

confirmed the new issue was #14294, not the original report here

@KristofferC KristofferC removed the potential benchmark Could make a good benchmark in BaseBenchmarks label Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

9 participants