Slow simple 2D copy kernel with Metal backend #464

LaurentPlagne · 2024-02-27T12:57:40Z

Hi,

I try to use KA for the first time and I wonder about the performance I obtain for a simple kernel copying 2 2D matrices of Float32 (I know that I could copy them as vectors) :

using Metal
using KernelAbstractions
using Random
using BenchmarkTools

@kernel function copy2D_kernel!(b, a)
    i, j = @index(Global, NTuple)
    @inbounds b[i, j] = a[i, j]
end

function copy2D!(b, a)
    backend = get_backend(a)
    groupsize = KernelAbstractions.isgpu(backend) ? 256 : 1024
    kernel! = copy2D_kernel!(backend, groupsize)
    kernel!(b, a, ndrange=size(a))
end

function go()

    res = 2^14
    # creating initial cpu arrays
    a_cpu = rand(Float32, res, res)
    b_cpu = zeros(Float32, res, res)
    @info("size of a,b (GB) :",2sizeof(a_cpu)/(1.e9))

    # creating initial gpu arrays
    a = MtlArray(a_cpu)
    b = MtlArray(b_cpu)

    backend = get_backend(a)
    gpu_elapsed = @belapsed begin
        copy2D!($b,$a)
        KernelAbstractions.synchronize($backend)
    end

    cpu_elapsed = @belapsed $a_cpu .= $b_cpu

    bandwidth_GBs(res,t,T) = sizeof(T)*res*res*2/(t*1.e9) 
    @info(cpu_elapsed,bandwidth_GBs(res,cpu_elapsed,Float32))
    @info(gpu_elapsed,bandwidth_GBs(res,gpu_elapsed,Float32))

    nothing
end

And I obtain (mbp M1Max) a cpu simple copy twice as fast at the KA GPU one...

┌ Info: size of a,b (GB) :
└ (2 * sizeof(a_cpu)) / 1.0e9 = 2.147483648
┌ Info: 0.022282291
└ bandwidth_GBs(res, cpu_elapsed, Float32) = 96.37625000050488
┌ Info: 0.047214875
└ bandwidth_GBs(res, gpu_elapsed, Float32) = 45.48320096156137

Any hint ?

Laurent

bjarthur · 2024-06-07T13:09:35Z

how do your benchmarks vary with groupsize and res? are there regions in that space for which the GPU is faster??

LaurentPlagne · 2024-06-08T21:23:52Z

It looks rather stable for res in {2^15,2^16} and groupsize in {126,256,512,1024}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow simple 2D copy kernel with Metal backend #464

Slow simple 2D copy kernel with Metal backend #464

LaurentPlagne commented Feb 27, 2024

bjarthur commented Jun 7, 2024

LaurentPlagne commented Jun 8, 2024

Slow simple 2D copy kernel with Metal backend #464

Slow simple 2D copy kernel with Metal backend #464

Comments

LaurentPlagne commented Feb 27, 2024

bjarthur commented Jun 7, 2024

LaurentPlagne commented Jun 8, 2024