CUDA array scalar getindex error #25

btwied · 2020-06-16T21:02:06Z

The following:

using TensorCast, CUDA; CUDA.allowscalar(false)
C = cu(ones(10,2))
L = cu(ones(10,3))
@reduce D[m,a] := sum(p) C[p,a] + L[p,m]

gives a scalar getindex is disallowed error. But using @cast as an intermediate step or re-ordering indices both work fine:

@cast T[p,m,a] := C[p,a] + L[p,m]
D = reshape(sum(T, dims=1), (3,2))

or

C = cu(ones(2,10))
L = cu(ones(3,10))
@reduce D[m,a] := sum(p) C[a,p] + L[m,p]

both produce

3×2 CuArray{Float32,2,CuArray{Float32,3,Nothing}}:
 20.0  20.0
 20.0  20.0
 20.0  20.0

Question was initially raised here: TensorCast & CUDA

The text was updated successfully, but these errors were encountered:

mcabbott · 2020-06-17T08:00:09Z

It's pretty odd that this fails while the example of #10 (comment) does not. Both use orient to reshape a transposed matrix. But doing this twice seems to cause problems:

reshape(C,1,2,10) .+ reshape(L', 3,1,10) # ok
reshape(C',1,2,10) .+ reshape(L, 3,1,10) # ok
reshape(C',1,2,10) .+ reshape(L', 3,1,10) # ERROR: scalar getindex is disallowed

reshape(C',1,2,10) |> typeof
# Base.ReshapedArray{Float32,3,LinearAlgebra.Adjoint{Float32,CuArray{Float32,2,Nothing}},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}

This seems the same on CuArrays v1.7.2, CuArrays v2.2.1, and CUDA v0.1.0.

mcabbott · 2020-06-17T10:20:56Z

Here is one way to work around this, forcing the broadcast to be a CUDA one:

trick = cu(fill(false))
@reduce D[m,a] := sum(p) C[p,a] + L[p,m] + trick

mcabbott · 2022-09-03T19:12:31Z

I think this can be closed as fixed by #31, current behaviour is:

julia> @reduce D[m,a] := sum(p) C[p,a] + L[p,m]
3×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 20.0  20.0
 20.0  20.0
 20.0  20.0

julia> @pretty @reduce D[m,a] := sum(p) C[p,a] + L[p,m]
begin
    @boundscheck ndims(C) == 2 || throw(ArgumentError("expected a 2-tensor C[p, a]"))
    @boundscheck axes(C, 1) == axes(L, 1) || throw(DimensionMismatch("range of index p must agree"))
    @boundscheck ndims(L) == 2 || throw(ArgumentError("expected a 2-tensor L[p, m]"))
    local fox = transmute(C, Val((nothing, 2, 1)))
    local tiger = transmute(L, Val((2, nothing, 1)))
    D = dropdims(sum(@__dot__(fox + tiger), dims = 3), dims = 3)
end

mcabbott mentioned this issue Jun 17, 2020

Broadcasting and reshaped, transposed, CuArrays JuliaGPU/CUDA.jl#228

Open

mcabbott added the bug Something isn't working label Jun 17, 2020

mcabbott mentioned this issue Oct 13, 2020

scalar getindex when shared index changes order #28

Closed

mcabbott mentioned this issue Feb 4, 2021

Use TransmuteDims #31

Merged

mcabbott added the gpu anything involving a CuArray or similar label Sep 3, 2022

mcabbott closed this as completed Sep 3, 2022

mcabbott mentioned this issue Sep 3, 2022

Slices and CuArrays #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA array scalar getindex error #25

CUDA array scalar getindex error #25

btwied commented Jun 16, 2020

mcabbott commented Jun 17, 2020

mcabbott commented Jun 17, 2020

mcabbott commented Sep 3, 2022

CUDA array scalar getindex error #25

CUDA array scalar getindex error #25

Comments

btwied commented Jun 16, 2020

mcabbott commented Jun 17, 2020

mcabbott commented Jun 17, 2020

mcabbott commented Sep 3, 2022