Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adapt Field to run on GPU #746

Closed
wants to merge 5 commits into from
Closed

[WIP] Adapt Field to run on GPU #746

wants to merge 5 commits into from

Conversation

glwagner
Copy link
Member

@glwagner glwagner commented May 2, 2020

This PR attempts to use adapt_structure for Oceananigans.Fields so that they can be used as arguments in kernels on the GPU.

After fixing a few related issues, attempts at compilation on the GPU fail with the error

CUDA error: a PTX JIT compilation failed (code 218, ERROR_INVALID_PTX)
  ptxas application ptx input, line 6381; error   : Entry function 'ptxcall_calculate_Gu__66' uses too much parameter space (0x16c8 bytes, 0x1100 max).
  ptxas fatal   : Ptx assembly aborted due to errors
  Stacktrace:
   [1] CUDAdrv.CuModule(::String, ::Dict{CUDAdrv.CUjit_option_enum,Any}) at /data5/glwagner/.julia/packages/CUDAdrv/mCr0O/src/module.jl:41
   [2] macro expansion at /data5/glwagner/.julia/packages/CUDAnative/wdJjC/src/execution.jl:423 [inlined]
   [3] #cufunction#195(::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(cufunction),

I don't have too much hope that I can solve this (the burden on the compiler is too great?), but I'm opening this PR as a way to record what I've done.

Resolves #722

@glwagner glwagner marked this pull request as draft May 2, 2020 02:11
@glwagner glwagner added abstractions 🎨 Whatever that means cleanup 🧹 Paying off technical debt help wanted 🦮 plz halp (guide dog provided) labels May 17, 2020
@vchuravy
Copy link
Collaborator

vchuravy commented Jul 1, 2020

What is an easy way to reproduce the above failure? cc: @maleadt

@maleadt
Copy link
Collaborator

maleadt commented Jul 2, 2020

We could make it so that passing Ref(arg) actually passes by reference, and doesn't do the by-value conversion.

@glwagner
Copy link
Member Author

glwagner commented Jul 2, 2020

What does

Entry function 'ptxcall_calculate_Gu__66' uses too much parameter space (0x16c8 bytes, 0x1100 max)

mean?

@maleadt
Copy link
Collaborator

maleadt commented Jul 3, 2020

CUDA uses a special buffer, the parameter space, to put arguments in. This buffer is about 4K large, and has special semantics that benefit performance (read-only, so threads can read from it without synchronizing, etc). Although arguments in Julia are normally passed by reference, i.e. putting pointers in that space, when invoking kernels we change the calling convention and pass by reference such that loading e.g. the size or pointer of an array doesn't synchronize threads. That works great, until you pass a large (number of) arguments as you apparently do.

@glwagner
Copy link
Member Author

glwagner commented Jul 8, 2020

Thanks @maleadt, that's very helpful!

In this PR, we haven't directly changed any kernel function signatures. However, this PR does pass more complicated objects into kernels (a wrapper around an OffsetArray called a "Field", rather than simply the OffsetArray). The primary changes in this PR are thus 1. not to extract the underlying OffsetArray from a Field, and 2. writing an adapt_structure method for Fields. I suppose the translation that's performed by adapt_structure increases the number or arguments to the function ptxcall_calculate_Gu__66?

The changes made in this PR are not strictly necessary --- they are a convenience. If manually unwrapping Fields (the method we previously used) is necessitated by CUDA limitations, I think we can live with that. If I understand this issue correctly, we are facing a basic trade-off between (compiler?) performance and the use of convenient but complicated abstraction objects?

@maleadt
Copy link
Collaborator

maleadt commented Jul 8, 2020

If I understand this issue correctly, we are facing a basic trade-off between (compiler?) performance and the use of convenient but complicated abstraction objects?

It's a hardware limitation, really. The compiler could anticipate though, e.g. by not passing very large objects by value, or by providing an escape hatch (like the Ref suggestion in JuliaGPU/CUDA.jl#267). You can experiment with this yourself, by changing which arguments get tagged byval in https://github.com/JuliaGPU/GPUCompiler.jl/blob/master/src/irgen.jl#L607, and changing the logic that packs arguments in https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cudadrv/execution.jl#L8-L37 accordingly (to pass a pointer to a pointer instead of a pointer to a value).

@glwagner
Copy link
Member Author

glwagner commented Nov 5, 2020

Superceded by #1057

@glwagner glwagner closed this Nov 5, 2020
@glwagner glwagner deleted the glw/adapt-field branch June 3, 2021 22:52
@glwagner glwagner restored the glw/adapt-field branch June 3, 2021 22:52
@glwagner glwagner deleted the glw/adapt-field branch June 3, 2021 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abstractions 🎨 Whatever that means cleanup 🧹 Paying off technical debt help wanted 🦮 plz halp (guide dog provided)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible elegant solution for compiling kernels with fields as arguments
3 participants