You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for the bug report! If I understand your proposed changes correctly, then you have:
manually unrolled the loops for the ufunc; and
specialized the code for the case where we reduce, i.e., have os=0, where you first bin the results and then reduce them in a second step.
The speed improvements are certainly impressive! I would conjecture that this is less because of the "unrolling" and instead because you are reducing the data dependency in the sum (so it can parallelize the whole thing). We could certainly add that special case!
Are you compiling this with some extra flags (say, -march=native)?
Configuration: xprec-1.4.2 on Intel macOS Sonoma
Benchmark:
ndarray[ddouble]
+ndarray[float64]
;ndarray[ddouble]
+=ndarray[float64]
;ndarray[float64]
asddouble
The profiler shows
u_added()
as a problem.For an experiment to evaluate possible performance improvements, I made the following changes:
Preliminary assessment of the change
What do you think about this problem and possible changes?
The text was updated successfully, but these errors were encountered: