`mod(i::Integer, const_power_of_2)` optimization #34990

goretkin · 2020-03-04T04:26:27Z

I guess (and did a very quick search) LLVM does not have a mod intrinsic, otherwise I would expect it to do this optimization.

mod(i::Integer, 2^n) could be rewritten as something like ~~(0b1<<n) & i~~ (I think I meant (0b1 << (n+1)) - 1). I don't know how to benchmark small functions but here's a wonky benchmark:

f(i) = sin(mod(i, 4))

mod4(i) = 0b11 & i
g(i) = sin(mod4(i))

function facc(n)
    a = 0.0
    for i = 1:n
        a += f(i)
    end
    return a
end

function gacc(n)
    a = 0.0
    for i = 1:n
        a += g(i)
    end
    return a
end

julia> BenchmarkTools.@benchmark facc(10)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     100.668 ns (0.00% GC)
  median time:      100.920 ns (0.00% GC)
  mean time:        111.223 ns (0.00% GC)
  maximum time:     488.910 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     943

julia> BenchmarkTools.@benchmark gacc(10)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     67.257 ns (0.00% GC)
  median time:      67.278 ns (0.00% GC)
  mean time:        76.017 ns (0.00% GC)
  maximum time:     298.556 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     977

The text was updated successfully, but these errors were encountered:

goretkin · 2020-04-26T18:07:39Z

Perhaps I should use rem instead of mod to get this optimization (#26022 (comment))

kimikage · 2020-04-27T09:05:50Z

~~I think this already has been improved.~~
I misunderstood that you mentioned the problem with the division instruction.

Probably using Unsigned instead of Signed will improve the native code.

f2(i) = sin(mod(unsigned(i), 4))

However, the change can cause other problems(e.g. inlining). In conclusion, this benchmarks don't seem to be able to isolate the problems.

The result of the following may be the same as g(i). Its ugliness is essentially unrelated to "const_power_of_2".

@inline f3(i) = sin(signed(mod(unsigned(i), 4)))

kimikage · 2020-04-28T23:59:12Z

Let's sort this out.

I believe the main cause of OP stall is the failure of inlining. The mod produces slightly less efficient code than the bitwise and, but it does not matter compared to the slow sin.

The problem is that mod is supposed to use division instruction during the cost estimation for inlining. Of course, finally LLVM backend doesn't generate the divide instruction (on x86 systems).

The cost estimation is a difficult problem on modern CPUs. The case of OP should be resolvable with @inline and I don't think it's essential. I think we need the tips for non-developers about inlining. I have a plan to implement the hints feature in CodeCosts.jl package, but I haven't started yet.

brenhinkeller added the performance Must go faster label Nov 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`mod(i::Integer, const_power_of_2)` optimization #34990

`mod(i::Integer, const_power_of_2)` optimization #34990

goretkin commented Mar 4, 2020 •

edited

Loading

goretkin commented Apr 26, 2020

kimikage commented Apr 27, 2020 •

edited

Loading

kimikage commented Apr 28, 2020

mod(i::Integer, const_power_of_2) optimization #34990

mod(i::Integer, const_power_of_2) optimization #34990

Comments

goretkin commented Mar 4, 2020 • edited Loading

goretkin commented Apr 26, 2020

kimikage commented Apr 27, 2020 • edited Loading

kimikage commented Apr 28, 2020

`mod(i::Integer, const_power_of_2)` optimization #34990

`mod(i::Integer, const_power_of_2)` optimization #34990

goretkin commented Mar 4, 2020 •

edited

Loading

kimikage commented Apr 27, 2020 •

edited

Loading