Defines many identity's to avoid recursion when compiling AbstractOperations #1595

glwagner · 2021-04-16T16:08:28Z

This might resolve #1241 ...

tomchor · 2021-04-16T16:12:50Z

This might resolve #1241 ...

Awesome! Hopefully.

If I may ask, what's the rationale, here? Sorry but I didn't understand the changes very much...

glwagner · 2021-04-16T17:41:33Z

This might resolve #1241 ...

Awesome! Hopefully.

If I may ask, what's the rationale, here? Sorry but I didn't understand the changes very much...

More or less the problem is that we can't pipe a computation recursively through identity. This solution gets rid of the original, global definition of identity and instead uses a closure to implement identity. I think this means that we define a new identity function for every AbstractOperation (every time interpolation_operator is called). I'm hoping this solves what we thought was the problem (that the compiler encounters the function identity multiple times during the recursive evaluation of an operator). Here's some of the discussion on #1241 that tries to explain:

On problem 1: meeting with @vchuravy we think there is a "recursive call cycle with levels of indirection". In other words, calling getindex on a BinaryOperation:

Oceananigans.jl/src/AbstractOperations/binary_operations.jl

Line 34 in c3b688f

@inline Base.getindex(β::BinaryOperation, i, j, k) = β.▶op(i, j, k, β.grid, β.op, β.▶a, β.▶b, β.a, β.b)

calls β.▶op = identity:

Oceananigans.jl/src/Operators/interpolation_utils.jl

Line 11 in c3b688f

@inline identity(i, j, k, grid, F::TF, args...) where TF<:Function = F(i, j, k, grid, args...)

which calls op:

Oceananigans.jl/src/AbstractOperations/binary_operations.jl

Lines 59 to 60 in c3b688f

@inline $op(i, j, k, grid::AbstractGrid, ▶a, ▶b, a, b) =

@inbounds $op(▶a(i, j, k, grid, a), ▶b(i, j, k, grid, b))

which may invoke another call to either ▶a=identity or ▶b=identity (which might subsequently go back to getindex...) Due to some aspect of this process, the compiler throws up its hands because of something like "you wouldn't want unbound recursion in your compiler, right?"

A possible solution, which is also a hilarious hack, is to define multiple identity functions and use an internal counter to use the different yet identical copies of this function when constructing BinaryOperations. Then we wouldn't be recursing over identity (from the compiler's point of view), since we'd be calling identity1, identity2, etc. If we also need different flavors of BinaryOperation, we can do that too...

glwagner · 2021-04-16T17:43:26Z

All of that said I'm not sure it works. Trying to figure that out. The fact that tests pass is good (at least the changes didn't break anything).

glwagner · 2021-04-16T19:28:13Z

Close but no cigar I think because I was wrong about how closures work:

using Oceananigans
using Oceananigans.AbstractOperations
using Oceananigans.Fields

grid = RegularRectilinearGrid(size=(1, 1, 1), extent=(1, 1, 1)) 

model = IncompressibleModel(architecture=GPU(), grid=grid)

u, v, w = model.velocities

op = u + v - w

but

julia> op.a.▶op === op.▶op
true

in other words, the identity interpolation for op is the same function used for its child op.a.

I think we are close though... might just need some @eval magic...

glwagner · 2021-04-16T19:45:31Z

Things like this work now...

julia> op = u + v - w

julia> compute!(ComputedField(op))

There's a lot of identity* now...

julia> op = u + v - w
BinaryOperation at (Face, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│   └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree: 
    - at (Face, Center, Center) via identity6
    ├── + at (Face, Center, Center) via identity3
    │   ├── Field located at (Face, Center, Center)
    │   └── Field located at (Center, Face, Center)
    └── Field located at (Center, Center, Face)

tomchor · 2021-04-16T20:21:00Z

Thanks for the explanation. I'm kinda lost (especially with the last commit haha) but paying attention and crossing my fingers 👍

glwagner · 2021-04-16T20:34:39Z

If you have specific questions I would love to answer them!

glwagner · 2021-04-17T06:16:37Z

It looks like this PR fixes some issues with complex AbstractOperations, but it does not allow us to use AveragedField on the GPU.

I think a possible avenue to explore could maybe be to Adapt an AveragedField by wrapping the underlying, Adapted data in Base.Broadcast.Broadcasted, rather than attempting to adapt AveragedField (with its custom getindex, which it the crucial part) directly for the GPU. We know that broadcasting with singleton dimensions already works on the GPU and its possible we might borrow some of that machinery. The key function we might want to get a hold of is _broadcast_getindex:

https://github.com/JuliaLang/julia/blob/e467661f080a1b14ca1a9cf6681a8c713a3ae20c/base/broadcast.jl#L572-L630

tomchor · 2021-04-17T13:30:24Z

It looks like this PR fixes some issues with complex AbstractOperations, but it does not allow us to use AveragedField on the GPU.

Thanks, @glwagner, that's awesome news! Just for clarity, what issues does it fix? Can we now calculate ComputedFields with arbitrary complexity? Or just increased complexity?

glwagner · 2021-04-17T16:14:35Z

This PR defines multiple, identical functions called identityN, where N is an integer between 1 and 30. The identity function is one of our interpolation functions that specifies "no interpolation", ie, evaluate the field at i, j, k without averaging.

The interpolation operator is selected by a function interpolation_operator(to_location, from_location). When to_location and from_location are identical, this function previously returned identity --- no interpolation. Now, rather than returning the sole function identity, it increments a counter to select a different identity function each time it's called. The counter cycles between 1 and 30.

This solves a specific problem that we speculated was plaguing abstract operations on #1241 (and proposed as a solution there). Specifically, abstraction operations that failed to compile due to a recursive call to identity now compile, because we use different identity functions. The compiler doesn't complain and compiles these objects. This includes operators like u - v + w as demonstrated in my example.

This hack doesn't allow us to execute arbitrarily complex abstract operations on the GPU. I don't think we can guarantee execution of arbitrary code in general. In this case, there are other issues that compiler might encounter that are not related to recursive calls to identity. We identified two additional issues on #1241 (comment).

There may be other problems that we haven't uncovered.

An important additional case that doesn't work right now is operations that have embedded AveragedField. I think this is some kind of type inference issue. For Field on the GPU we "throw away" the wrapper and expose the underlying OffsetArray to GPU kernels. So compilation of functions of Field is "no more difficult" than compilation of functions with OffsetArray. This idealization is successful because indexing into the underlying field.data is identical indexing into the field itself, and because we don't require field locations inside the kernel (we build expression trees for AbstractOperations on the CPU, prior to launching the kernel).

But this idealization doesn't hold for AveragedField or any ReducedField. In particular, abstract operations index into these objects at all i, j, k. However, they don't vary on one or more of these directions; the indexing operation needs to be "collapsed" so that reduced indices are translated correctly. Thus when we adapt AveragedField for the GPU, we hold onto the wrapper:

Oceananigans.jl/src/Fields/averaged_field.jl

Lines 94 to 96 in 98cd4f7

    
           Adapt.adapt_structure(to, averaged_field::AveragedField{X, Y, Z}) where {X, Y, Z} = 
        
               AveragedField{X, Y, Z}(Adapt.adapt(to, averaged_field.data), nothing, 
        
                                      nothing, averaged_field.dims, nothing, nothing)

Peeking at the broadcasting code used by julia Base gives a hint. Broadcasting has to solve the same problem: we have to be able to make computations between arrays of size (Nx, Ny, 1) and (Nx, Ny, Nz), for example. In this case, the indices of the first array are "extruded" into the third dimension. There are some shenanigans in Base.Broadcast that look like they are solving some type instability problem (which would doom GPU compilation for us if it were occurring). So we might be able to learn / borrow code from Base.Broadcast. All speculation from a naive julia programmer...

tomchor · 2021-04-17T16:23:18Z

Alright, thanks, that makes a lot of sense! Very nice explanation.

So, if I understand correctly, in practical terms the result of this PR is that some abstract operations that didn't compile before (the ones where recursive calls to identity were a problem and that don't have averaged fields embedded) now compile and can be used. Right? That's a nice improvement!

glwagner · 2021-04-17T17:20:52Z

Alright, thanks, that makes a lot of sense! Very nice explanation.

So, if I understand correctly, in practical terms the result of this PR is that some abstract operations that didn't compile before (the ones where recursive calls to identity were a problem and that don't have averaged fields embedded) now compile and can be used. Right? That's a nice improvement!

Yes, I think so. I didn't test many, but I did confirm that u - v + w will compile (where it did not previously).

The error we were previously receiving was "dynamic function invocation error". This is often a type inference problem: if the julia compiler cannot infer types probably, then the resulting julia code cannot be translated into CUDA. Thus the kernel still contains "dynamic julia functions".

This is the same error we get when trying to compile operations containing AveragedField. But apparently the compilation issues for those kernels are different and not resolved by this PR sadly. I think there is a very specific issue associated with AveragedField.

We received other independent errors from seemingly more complicated operations such as "device kernel image is invalid", and "entry function uses too much parameter space". I think solving these might require contributions / modifications to CUDA.jl.

Release notes: * Tests and fixes for `FFTBasedPoissonSolver` for topologies with `Flat` dimensions (#1560) * Improved `AbstractOperations` that are much more likely to compile on the GPU, with better "location inference" for `BinaryOperation` (#1595, #1599)

Uses a closure to define identity interpolation

5b9bd1a

This is crazy, but heres my number...

284c7c5

Random identities

2c2e3e7

glwagner added 4 commits April 16, 2021 13:38

Maybe we need different numbers for different functions

fa0be5f

Precompile those identities

345ad07

Better comments

c8a12d5

There was no number

e6148f5

glwagner changed the title ~~Uses a closure to define identity interpolation~~ Defines many identity's to avoid recursion when compiling AbstractOperations Apr 16, 2021

Trim trailing numbers from all the identical identity functions

637646a

glwagner added the bug 🐞 Even a perfect program still has bugs label Apr 16, 2021

Actually trim trailing numbers from identity operators without bugs

8134a6e

glwagner requested a review from ali-ramadhan April 16, 2021 23:46

glwagner added 7 commits April 16, 2021 17:54

Lets try 10 identities rather than 30

f91ab7a

Further reduce number of identities and uncomment tests

3b8e71e

Moar identities

b3206f1

Its important that my number is not 0

c0bb280

Back to 6 identities

d00604a

Some hacks to try to get AveragedField to compile in kernels

db75269

Still have to skip some computations with AveragedField

09c20cf

Update test_implicit_free_surface_solver.jl

f3daea7

tomchor approved these changes Apr 17, 2021

View reviewed changes

glwagner merged commit e293068 into master Apr 17, 2021

glwagner deleted the glw/identity-closure branch April 17, 2021 18:32

glwagner mentioned this pull request Apr 21, 2021

Bump to 0.55.0 #1600

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defines many identity's to avoid recursion when compiling AbstractOperations #1595

Defines many identity's to avoid recursion when compiling AbstractOperations #1595

glwagner commented Apr 16, 2021

tomchor commented Apr 16, 2021

glwagner commented Apr 16, 2021

glwagner commented Apr 16, 2021

glwagner commented Apr 16, 2021

glwagner commented Apr 16, 2021 •

edited

Loading

tomchor commented Apr 16, 2021

glwagner commented Apr 16, 2021

glwagner commented Apr 17, 2021 •

edited

Loading

tomchor commented Apr 17, 2021

glwagner commented Apr 17, 2021 •

edited

Loading

tomchor commented Apr 17, 2021

glwagner commented Apr 17, 2021

Defines many identity's to avoid recursion when compiling AbstractOperations #1595

Defines many identity's to avoid recursion when compiling AbstractOperations #1595

Conversation

glwagner commented Apr 16, 2021

tomchor commented Apr 16, 2021

glwagner commented Apr 16, 2021

glwagner commented Apr 16, 2021

glwagner commented Apr 16, 2021

glwagner commented Apr 16, 2021 • edited Loading

tomchor commented Apr 16, 2021

glwagner commented Apr 16, 2021

glwagner commented Apr 17, 2021 • edited Loading

tomchor commented Apr 17, 2021

glwagner commented Apr 17, 2021 • edited Loading

tomchor commented Apr 17, 2021

glwagner commented Apr 17, 2021

glwagner commented Apr 16, 2021 •

edited

Loading

glwagner commented Apr 17, 2021 •

edited

Loading

glwagner commented Apr 17, 2021 •

edited

Loading