Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix to allow for converting to Int #273

Merged
merged 1 commit into from
Nov 1, 2021
Merged

Conversation

leios
Copy link
Contributor

@leios leios commented Oct 22, 2021

A bit lost with this PR, but it is an attempt at a fix for #254 and #265.

I posted the error messages on those Issues and have a few lines here that I've played around with. I am ultimately trying to replace the InexactError with a working version

@leios
Copy link
Contributor Author

leios commented Oct 23, 2021

I didn't mention it, but here is my test script:

using Test
using CUDA
using KernelAbstractions
using CUDAKernels

@kernel function f_test_kernel!(a,b)
    tid = @index(Global, Linear)
    convert(Int32,tid)
end

function f_test!(a, b; numcores = 4, numthreads = 256)

    if isa(a, Array)
        kernel! = f_test_kernel!(CPU(), 4)
    else
        kernel! = f_test_kernel!(CUDADevice(), 256)
    end

    kernel!(a, b, ndrange=size(a))
end

function main()
    a = Int64.(rand(1:128,10, 10))
    b = zeros(Int64, 10, 10)

    event = f_test!(a, b)
    wait(event)

    if has_cuda_gpu()
        d_a = CuArray(a)
        d_b = similar(d_a)

        event = f_test!(d_a, d_b)
        wait(event)

    end
end

main()

I had it wrapped in a test set when testing shared memory, but now just have it wrapped in main() to see if I can at least get it to run before making sure the solution works with everyone else's tests (and my own).

The error is nested task error: MethodError: no method matching InexactError(::Int64):

ERROR: LoadError: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:322 [inlined]
 [2] wait
   @ ~/projects/KernelAbstractions.jl/src/cpu.jl:65 [inlined]
 [3] wait (repeats 2 times)
   @ ~/projects/KernelAbstractions.jl/src/cpu.jl:29 [inlined]
 [4] main()
   @ Main ~/projects/simuleios/histograms/mwe2.jl:27
 [5] top-level scope
   @ ~/projects/simuleios/histograms/mwe2.jl:43
 [6] include(fname::String)
   @ Base.MainInclude ./client.jl:451
 [7] top-level scope
   @ REPL[1]:1

    nested task error: MethodError: no method matching InexactError(::Int64)
    Closest candidates are:
      InexactError(::Symbol, ::Any, ::Any) at ~/builds/julia/usr/share/julia/base/boot.jl:318
    Stacktrace:
      [1] overdub
        @ ~/projects/KernelAbstractions.jl/src/compiler.jl:55 [inlined]
      [2] oneunit(::Type{Int64})
        @ ./number.jl:358 [inlined]
      [3] overdub
        @ ./number.jl:358 [inlined]
      [4] first(::Base.OneTo{Int64})
        @ ./range.jl:804 [inlined]
      [5] overdub
        @ ./range.jl:804 [inlined]
      [6] map(::typeof(first), ::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}})
        @ ./tuple.jl:221 [inlined]
      [7] overdub
        @ ./tuple.jl:221 [inlined]
      [8] first(::CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
        @ ./multidimensional.jl:464 [inlined]
      [9] overdub
        @ ./multidimensional.jl:464 [inlined]
     [10] iterate(::CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
        @ ./multidimensional.jl:405 [inlined]
     [11] overdub
        @ ./multidimensional.jl:405 [inlined]
     [12] overdub
        @ ~/projects/KernelAbstractions.jl/src/macros.jl:263 [inlined]
     [13] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Int64}, Matrix{Int64}}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
        @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:157
     [14] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Int64}, Matrix{Int64}}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
        @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:130
     [15] (::KernelAbstractions.var"#37#38"{Nothing, Nothing, typeof(KernelAbstractions.__run), Tuple{KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, Tuple{Int64, Int64}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, Tuple{Matrix{Int64}, Matrix{Int64}}, KernelAbstractions.NDIteration.DynamicCheck}})()
        @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:22
in expression starting at /home/leios/projects/simuleios/histograms/mwe2.jl:43

@leios
Copy link
Contributor Author

leios commented Oct 23, 2021

As another important note, changing the InexactError to a print (@inline Cassette.overdub(::$Ctx, ::typeof(InexactError), args...) = println(args)

For some reason only works on 2 threads before saying that we are multiplying 2 print statements.

I think the error above is because args... is grabbing the arguments for the toInt32(...) call instead of the throw_inexacterror(...) call.

Here is the error for when we swap it out with a print:

julia> include("mwe2.jl")
[ Info: Precompiling KernelAbstractions [63c18a36-062a-441e-b654-da1e3ab1ce7c]
[ Info: Precompiling CUDAKernels [72cfdca4-0801-4ab0-bf6a-d52aa10adc57]
(4,)
(1,)
ERROR: LoadError: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:322 [inlined]
 [2] wait
   @ ~/projects/KernelAbstractions.jl/src/cpu.jl:65 [inlined]
 [3] wait (repeats 2 times)
   @ ~/projects/KernelAbstractions.jl/src/cpu.jl:29 [inlined]
 [4] main()
   @ Main ~/projects/simuleios/histograms/mwe2.jl:27
 [5] top-level scope
   @ ~/projects/simuleios/histograms/mwe2.jl:43
 [6] include(fname::String)
   @ Base.MainInclude ./client.jl:444
 [7] top-level scope
   @ REPL[1]:1

    nested task error: MethodError: no method matching *(::Nothing, ::Nothing)
    Closest candidates are:
      *(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
      *(::ChainRulesCore.Tangent, ::Any) at /home/leios/.julia/packages/ChainRulesCore/Y1Mee/src/tangent_arithmetic.jl:152
      *(::SpecialFunctions.SimplePoly, ::Any) at /home/leios/.julia/packages/SpecialFunctions/6MVgC/src/expint.jl:8
      ...
    Stacktrace:
      [1] call
        @ ~/.julia/packages/Cassette/1lyEM/src/context.jl:456 [inlined]
      [2] fallback
        @ ~/.julia/packages/Cassette/1lyEM/src/context.jl:454 [inlined]
      [3] _overdub_fallback(::Any, ::Vararg{Any, N} where N)
        @ ~/.julia/packages/Cassette/1lyEM/src/overdub.jl:586 [inlined]
      [4] overdub
        @ ~/.julia/packages/Cassette/1lyEM/src/overdub.jl:586 [inlined]
      [5] prod(::Tuple{Nothing, Nothing})
        @ ./tuple.jl:480 [inlined]
      [6] overdub
        @ ./tuple.jl:480 [inlined]
      [7] length(::CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
        @ ./multidimensional.jl:427 [inlined]
      [8] overdub
        @ ./multidimensional.jl:427 [inlined]
      [9] cpu_f_test_kernel!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}}, ::Matrix{Int64}, ::Matrix{Int64})
        @ ./none:0 [inlined]
     [10] overdub
        @ ./none:0 [inlined]
     [11] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Int64}, Matrix{Int64}}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
        @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:157
     [12] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Int64}, Matrix{Int64}}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
        @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:130
     [13] (::KernelAbstractions.var"#33#34"{Nothing, Nothing, typeof(KernelAbstractions.__run), Tuple{KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, Tuple{Int64, Int64}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, Tuple{Matrix{Int64}, Matrix{Int64}}, KernelAbstractions.NDIteration.DynamicCheck}})()
        @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:22
in expression starting at /home/leios/projects/simuleios/histograms/mwe2.jl:43

Since the task is a nested task error, I did try this on the latest Julia version from github to make sure that it wasn't somehow a catch with this issue: #232

@leios leios marked this pull request as ready for review October 23, 2021 19:49
@leios
Copy link
Contributor Author

leios commented Oct 23, 2021

I checked the examples and believe this also fixes #254 and #265

@vchuravy
Copy link
Member

Great! Can you also add some tests for #254 and #265?

@vchuravy
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Oct 23, 2021
@bors
Copy link
Contributor

bors bot commented Oct 23, 2021

try

Build failed:

@leios
Copy link
Contributor Author

leios commented Oct 25, 2021

I pushed 2 commits here. I tried a bunch of different ways of iterating through the types, such as the one shown in the previous commit, but also just sending a bunch of types in via CuArray and using tid in parallel. Honestly, none of them worked on the GPU, so I just manually unrolled everything.

I don't think it's worth messing with this code to make it that pretty, but if you want to try to get the loops to work, feel free.

Also: should we have excluded this from ROC? I have not been able to test this on AMD, but I would guess it would work as well?

test/convert.jl Outdated
@@ -0,0 +1,35 @@
using CUDA, CUDAKernels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need these using lines as part of the test-suite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I will fix that. Are you ok with the unrolled test? If so, I'll squash now as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test are a bit odd since you will always convert to eltype(B)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean, the outpur doesn't really matter too much. We don't have to output anything, but can instead just run Float64() on all of them.

Then there wouldn't be an @test, though.

I'll rewrite everything so each row in B will be another one of the unrolled tests. Give me a few minutes...

@leios
Copy link
Contributor Author

leios commented Oct 25, 2021

I re-wrote the tests so that B now has a different row for each integer conversion. We could have a bunch of other arrays for different floating types, but those don't have the inexact errors, so converting to any floating type is fine.

My broadcasting magic is a bit weak here. I was able to do 3 tests on the GPU without the for loop, but it did not work on the CPU, so we now have 30 tests.

@leios
Copy link
Contributor Author

leios commented Nov 1, 2021

By the way, I am happy to squash the commits here if you are happy with the new test

@vchuravy
Copy link
Member

vchuravy commented Nov 1, 2021

LGTM

@leios
Copy link
Contributor Author

leios commented Nov 1, 2021

Ok, squashed

@vchuravy vchuravy merged commit 9cfadee into JuliaGPU:master Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants