You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to use KA for the first time and I wonder about the performance I obtain for a simple kernel copying 2 2D matrices of Float32 (I know that I could copy them as vectors) :
using Metal
using KernelAbstractions
using Random
using BenchmarkTools
@kernelfunctioncopy2D_kernel!(b, a)
i, j =@index(Global, NTuple)
@inbounds b[i, j] = a[i, j]
endfunctioncopy2D!(b, a)
backend =get_backend(a)
groupsize = KernelAbstractions.isgpu(backend) ?256:1024
kernel! =copy2D_kernel!(backend, groupsize)
kernel!(b, a, ndrange=size(a))
endfunctiongo()
res =2^14# creating initial cpu arrays
a_cpu =rand(Float32, res, res)
b_cpu =zeros(Float32, res, res)
@info("size of a,b (GB) :",2sizeof(a_cpu)/(1.e9))
# creating initial gpu arrays
a =MtlArray(a_cpu)
b =MtlArray(b_cpu)
backend =get_backend(a)
gpu_elapsed =@belapsedbegincopy2D!($b,$a)
KernelAbstractions.synchronize($backend)
end
cpu_elapsed =@belapsed$a_cpu .=$b_cpu
bandwidth_GBs(res,t,T) =sizeof(T)*res*res*2/(t*1.e9)
@info(cpu_elapsed,bandwidth_GBs(res,cpu_elapsed,Float32))
@info(gpu_elapsed,bandwidth_GBs(res,gpu_elapsed,Float32))
nothingend
And I obtain (mbp M1Max) a cpu simple copy twice as fast at the KA GPU one...
Hi,
I try to use KA for the first time and I wonder about the performance I obtain for a simple kernel copying 2 2D matrices of Float32 (I know that I could copy them as vectors) :
And I obtain (mbp M1Max) a cpu simple copy twice as fast at the KA GPU one...
┌ Info: size of a,b (GB) :
└ (2 * sizeof(a_cpu)) / 1.0e9 = 2.147483648
┌ Info: 0.022282291
└ bandwidth_GBs(res, cpu_elapsed, Float32) = 96.37625000050488
┌ Info: 0.047214875
└ bandwidth_GBs(res, gpu_elapsed, Float32) = 45.48320096156137
Any hint ?
Laurent
The text was updated successfully, but these errors were encountered: