Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mod(i::Integer, const_power_of_2) optimization #34990

Open
goretkin opened this issue Mar 4, 2020 · 3 comments
Open

mod(i::Integer, const_power_of_2) optimization #34990

goretkin opened this issue Mar 4, 2020 · 3 comments
Labels
performance Must go faster

Comments

@goretkin
Copy link
Contributor

goretkin commented Mar 4, 2020

I guess (and did a very quick search) LLVM does not have a mod intrinsic, otherwise I would expect it to do this optimization.

mod(i::Integer, 2^n) could be rewritten as something like (0b1<<n) & i (I think I meant (0b1 << (n+1)) - 1). I don't know how to benchmark small functions but here's a wonky benchmark:

f(i) = sin(mod(i, 4))

mod4(i) = 0b11 & i
g(i) = sin(mod4(i))

function facc(n)
    a = 0.0
    for i = 1:n
        a += f(i)
    end
    return a
end

function gacc(n)
    a = 0.0
    for i = 1:n
        a += g(i)
    end
    return a
end
julia> BenchmarkTools.@benchmark facc(10)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     100.668 ns (0.00% GC)
  median time:      100.920 ns (0.00% GC)
  mean time:        111.223 ns (0.00% GC)
  maximum time:     488.910 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     943

julia> BenchmarkTools.@benchmark gacc(10)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     67.257 ns (0.00% GC)
  median time:      67.278 ns (0.00% GC)
  mean time:        76.017 ns (0.00% GC)
  maximum time:     298.556 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     977
@goretkin
Copy link
Contributor Author

Perhaps I should use rem instead of mod to get this optimization (#26022 (comment))

@kimikage
Copy link
Contributor

kimikage commented Apr 27, 2020

I think this already has been improved.
I misunderstood that you mentioned the problem with the division instruction.

Probably using Unsigned instead of Signed will improve the native code.

f2(i) = sin(mod(unsigned(i), 4))

However, the change can cause other problems(e.g. inlining). In conclusion, this benchmarks don't seem to be able to isolate the problems.

The result of the following may be the same as g(i). Its ugliness is essentially unrelated to "const_power_of_2".

@inline f3(i) = sin(signed(mod(unsigned(i), 4)))

@kimikage
Copy link
Contributor

Let's sort this out.

I believe the main cause of OP stall is the failure of inlining. The mod produces slightly less efficient code than the bitwise and, but it does not matter compared to the slow sin.

The problem is that mod is supposed to use division instruction during the cost estimation for inlining. Of course, finally LLVM backend doesn't generate the divide instruction (on x86 systems).

The cost estimation is a difficult problem on modern CPUs. The case of OP should be resolvable with @inline and I don't think it's essential. I think we need the tips for non-developers about inlining. I have a plan to implement the hints feature in CodeCosts.jl package, but I haven't started yet.

@brenhinkeller brenhinkeller added the performance Must go faster label Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants