Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defines many identity's to avoid recursion when compiling AbstractOperations #1595

Merged
merged 17 commits into from
Apr 17, 2021

Conversation

glwagner
Copy link
Member

This might resolve #1241 ...

cc @ali-ramadhan @tomchor

@tomchor
Copy link
Collaborator

tomchor commented Apr 16, 2021

This might resolve #1241 ...

Awesome! Hopefully.

If I may ask, what's the rationale, here? Sorry but I didn't understand the changes very much...

@glwagner
Copy link
Member Author

This might resolve #1241 ...

Awesome! Hopefully.

If I may ask, what's the rationale, here? Sorry but I didn't understand the changes very much...

More or less the problem is that we can't pipe a computation recursively through identity. This solution gets rid of the original, global definition of identity and instead uses a closure to implement identity. I think this means that we define a new identity function for every AbstractOperation (every time interpolation_operator is called). I'm hoping this solves what we thought was the problem (that the compiler encounters the function identity multiple times during the recursive evaluation of an operator). Here's some of the discussion on #1241 that tries to explain:

On problem 1: meeting with @vchuravy we think there is a "recursive call cycle with levels of indirection". In other words, calling getindex on a BinaryOperation:

@inline Base.getindex::BinaryOperation, i, j, k) = β.▶op(i, j, k, β.grid, β.op, β.▶a, β.▶b, β.a, β.b)

calls β.▶op = identity:

@inline identity(i, j, k, grid, F::TF, args...) where TF<:Function = F(i, j, k, grid, args...)

which calls op:

@inline $op(i, j, k, grid::AbstractGrid, ▶a, ▶b, a, b) =
@inbounds $op(▶a(i, j, k, grid, a), ▶b(i, j, k, grid, b))

which may invoke another call to either ▶a=identity or ▶b=identity (which might subsequently go back to getindex...) Due to some aspect of this process, the compiler throws up its hands because of something like "you wouldn't want unbound recursion in your compiler, right?"

A possible solution, which is also a hilarious hack, is to define multiple identity functions and use an internal counter to use the different yet identical copies of this function when constructing BinaryOperations. Then we wouldn't be recursing over identity (from the compiler's point of view), since we'd be calling identity1, identity2, etc. If we also need different flavors of BinaryOperation, we can do that too...

@glwagner
Copy link
Member Author

All of that said I'm not sure it works. Trying to figure that out. The fact that tests pass is good (at least the changes didn't break anything).

@glwagner
Copy link
Member Author

Close but no cigar I think because I was wrong about how closures work:

using Oceananigans
using Oceananigans.AbstractOperations
using Oceananigans.Fields

grid = RegularRectilinearGrid(size=(1, 1, 1), extent=(1, 1, 1)) 

model = IncompressibleModel(architecture=GPU(), grid=grid)

u, v, w = model.velocities

op = u + v - w

but

julia> op.a.▶op === op.▶op
true

in other words, the identity interpolation for op is the same function used for its child op.a.

I think we are close though... might just need some @eval magic...

@glwagner
Copy link
Member Author

glwagner commented Apr 16, 2021

Things like this work now...

julia> op = u + v - w

julia> compute!(ComputedField(op))

There's a lot of identity* now...

julia> op = u + v - w
BinaryOperation at (Face, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x  [0.0, 1.0], y  [0.0, 1.0], z  [-1.0, 0.0]
└── tree: 
    - at (Face, Center, Center) via identity6
    ├── + at (Face, Center, Center) via identity3
    │   ├── Field located at (Face, Center, Center)
    │   └── Field located at (Center, Face, Center)
    └── Field located at (Center, Center, Face)

@tomchor
Copy link
Collaborator

tomchor commented Apr 16, 2021

Thanks for the explanation. I'm kinda lost (especially with the last commit haha) but paying attention and crossing my fingers 👍

@glwagner
Copy link
Member Author

If you have specific questions I would love to answer them!

@glwagner glwagner changed the title Uses a closure to define identity interpolation Defines many identity's to avoid recursion when compiling AbstractOperations Apr 16, 2021
@glwagner glwagner added the bug 🐞 Even a perfect program still has bugs label Apr 16, 2021
@glwagner
Copy link
Member Author

glwagner commented Apr 17, 2021

It looks like this PR fixes some issues with complex AbstractOperations, but it does not allow us to use AveragedField on the GPU.

I think a possible avenue to explore could maybe be to Adapt an AveragedField by wrapping the underlying, Adapted data in Base.Broadcast.Broadcasted, rather than attempting to adapt AveragedField (with its custom getindex, which it the crucial part) directly for the GPU. We know that broadcasting with singleton dimensions already works on the GPU and its possible we might borrow some of that machinery. The key function we might want to get a hold of is _broadcast_getindex:

https://github.com/JuliaLang/julia/blob/e467661f080a1b14ca1a9cf6681a8c713a3ae20c/base/broadcast.jl#L572-L630

@tomchor
Copy link
Collaborator

tomchor commented Apr 17, 2021

It looks like this PR fixes some issues with complex AbstractOperations, but it does not allow us to use AveragedField on the GPU.

Thanks, @glwagner, that's awesome news! Just for clarity, what issues does it fix? Can we now calculate ComputedFields with arbitrary complexity? Or just increased complexity?

@glwagner
Copy link
Member Author

glwagner commented Apr 17, 2021

This PR defines multiple, identical functions called identityN, where N is an integer between 1 and 30. The identity function is one of our interpolation functions that specifies "no interpolation", ie, evaluate the field at i, j, k without averaging.

The interpolation operator is selected by a function interpolation_operator(to_location, from_location). When to_location and from_location are identical, this function previously returned identity --- no interpolation. Now, rather than returning the sole function identity, it increments a counter to select a different identity function each time it's called. The counter cycles between 1 and 30.

This solves a specific problem that we speculated was plaguing abstract operations on #1241 (and proposed as a solution there). Specifically, abstraction operations that failed to compile due to a recursive call to identity now compile, because we use different identity functions. The compiler doesn't complain and compiles these objects. This includes operators like u - v + w as demonstrated in my example.

This hack doesn't allow us to execute arbitrarily complex abstract operations on the GPU. I don't think we can guarantee execution of arbitrary code in general. In this case, there are other issues that compiler might encounter that are not related to recursive calls to identity. We identified two additional issues on #1241 (comment).

There may be other problems that we haven't uncovered.

An important additional case that doesn't work right now is operations that have embedded AveragedField. I think this is some kind of type inference issue. For Field on the GPU we "throw away" the wrapper and expose the underlying OffsetArray to GPU kernels. So compilation of functions of Field is "no more difficult" than compilation of functions with OffsetArray. This idealization is successful because indexing into the underlying field.data is identical indexing into the field itself, and because we don't require field locations inside the kernel (we build expression trees for AbstractOperations on the CPU, prior to launching the kernel).

But this idealization doesn't hold for AveragedField or any ReducedField. In particular, abstract operations index into these objects at all i, j, k. However, they don't vary on one or more of these directions; the indexing operation needs to be "collapsed" so that reduced indices are translated correctly. Thus when we adapt AveragedField for the GPU, we hold onto the wrapper:

Adapt.adapt_structure(to, averaged_field::AveragedField{X, Y, Z}) where {X, Y, Z} =
AveragedField{X, Y, Z}(Adapt.adapt(to, averaged_field.data), nothing,
nothing, averaged_field.dims, nothing, nothing)

Peeking at the broadcasting code used by julia Base gives a hint. Broadcasting has to solve the same problem: we have to be able to make computations between arrays of size (Nx, Ny, 1) and (Nx, Ny, Nz), for example. In this case, the indices of the first array are "extruded" into the third dimension. There are some shenanigans in Base.Broadcast that look like they are solving some type instability problem (which would doom GPU compilation for us if it were occurring). So we might be able to learn / borrow code from Base.Broadcast. All speculation from a naive julia programmer...

@tomchor
Copy link
Collaborator

tomchor commented Apr 17, 2021

Alright, thanks, that makes a lot of sense! Very nice explanation.

So, if I understand correctly, in practical terms the result of this PR is that some abstract operations that didn't compile before (the ones where recursive calls to identity were a problem and that don't have averaged fields embedded) now compile and can be used. Right? That's a nice improvement!

@glwagner
Copy link
Member Author

Alright, thanks, that makes a lot of sense! Very nice explanation.

So, if I understand correctly, in practical terms the result of this PR is that some abstract operations that didn't compile before (the ones where recursive calls to identity were a problem and that don't have averaged fields embedded) now compile and can be used. Right? That's a nice improvement!

Yes, I think so. I didn't test many, but I did confirm that u - v + w will compile (where it did not previously).

The error we were previously receiving was "dynamic function invocation error". This is often a type inference problem: if the julia compiler cannot infer types probably, then the resulting julia code cannot be translated into CUDA. Thus the kernel still contains "dynamic julia functions".

This is the same error we get when trying to compile operations containing AveragedField. But apparently the compilation issues for those kernels are different and not resolved by this PR sadly. I think there is a very specific issue associated with AveragedField.

We received other independent errors from seemingly more complicated operations such as "device kernel image is invalid", and "entry function uses too much parameter space". I think solving these might require contributions / modifications to CUDA.jl.

@glwagner glwagner merged commit e293068 into master Apr 17, 2021
@glwagner glwagner deleted the glw/identity-closure branch April 17, 2021 18:32
glwagner added a commit that referenced this pull request Apr 21, 2021
Release notes:

* Tests and fixes for `FFTBasedPoissonSolver` for topologies with `Flat` dimensions (#1560)
* Improved `AbstractOperations` that are much more likely to compile on the GPU, with better "location inference" for `BinaryOperation` (#1595, #1599)
@glwagner glwagner mentioned this pull request Apr 21, 2021
glwagner added a commit that referenced this pull request Apr 21, 2021
Release notes:

* Tests and fixes for `FFTBasedPoissonSolver` for topologies with `Flat` dimensions (#1560)
* Improved `AbstractOperations` that are much more likely to compile on the GPU, with better "location inference" for `BinaryOperation` (#1595, #1599)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Even a perfect program still has bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complex AbstractOperations cannot be computed on GPU
2 participants