Skip to content
This repository has been archived by the owner on Mar 1, 2023. It is now read-only.

Replace copy_stack_field_down! with broadcasts of reshaped MPIStateArrays #1301

Merged
merged 2 commits into from
Jul 1, 2020

Conversation

blallen
Copy link
Contributor

@blallen blallen commented Jun 29, 2020

Description

After #1071 is merged, we no longer need a dedicated kernel to copy the vertical velocity at the top of the ocean (wz0) down the entire stack for use in the source term for η. We can also use this PR to discuss how we want the basic_grid_info function to work.

I have

  • Written and run all necessary tests with CLIMA by including tests/runtests.jl
  • Followed all necessary style guidelines and run julia .dev/climaformat.jl .
  • Updated the documentation to reflect changes from this PR.

For review by CLIMA developers

  • There are no open pull requests for this already
  • CLIMA developers with relevant expertise have been assigned to review this submission
  • The code conforms to the style guidelines and has consistent naming conventions. julia .dev/format.jl has been run in a separate commit.
  • This code does what it is technically intended to do (all numerics make sense physically and/or computationally)

@blallen blallen added Defensive programming Makes code safer Ocean 🌊 "How inappropriate to call this planet Earth when it is quite clearly Ocean" labels Jun 29, 2020
@blallen blallen self-assigned this Jun 29, 2020
@blallen
Copy link
Contributor Author

blallen commented Jun 29, 2020

@jkozdon if this works on the GPU, should I just remove copy_stack_field_down! entirely?

@blallen
Copy link
Contributor Author

blallen commented Jun 29, 2020

bors try

bors bot added a commit that referenced this pull request Jun 29, 2020
@bors
Copy link
Contributor

bors bot commented Jun 29, 2020

try

Build failed:

@vchuravy vchuravy changed the base branch from master to jrs/MPI_wrapper June 29, 2020 17:04
@vchuravy vchuravy changed the base branch from jrs/MPI_wrapper to master June 29, 2020 17:04
@blallen
Copy link
Contributor Author

blallen commented Jun 29, 2020

@leios can you check why the ocean tests failed on the GPU on the slurm CI? this is a practical usecase and it seems to not work

Copy link

@jkozdon jkozdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real opinion.

@jkozdon
Copy link

jkozdon commented Jun 29, 2020

@leios can you check why the ocean tests failed on the GPU on the slurm CI? this is a practical usecase and it seems to not work

If you cannot get it working with reshapes of the underlying mpistate array, you can also just do it with the data directly.

@leios
Copy link
Contributor

leios commented Jun 29, 2020

Yeah, so looking at this further, the types for the broadcast that are failing are:

1. Base.ReshapedArray{Float64,5,SubArray{Float64,3,CUDA.CuArray{Float64,3,CUDA.CuArray{Float64,3,Nothing}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64},Base.Slice{Base.OneTo{Int64}}},false},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}
2.
Base.ReshapedArray{Float64,5,SubArray{Float64,3,Base.ReshapedArray{Float64,5,SubArray{Float64,3,CUDA.CuArray{Float64,3,CUDA.CuArray{Float64,3,Nothing}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64},Base.Slice{Base.OneTo{Int64}}},false},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}}},false},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}

PR #1071 allows for broadcasting of reshaped MPIStateArrays, but this might be more closely related to an issue with adapt, itself

@leios leios closed this Jun 29, 2020
@leios leios reopened this Jun 29, 2020
@jkozdon
Copy link

jkozdon commented Jun 29, 2020

Yeah, so looking at this further, the types for the broadcast that are failing are:

Right. Just saw that. This has nothing to with with #1071, since it's broadcast the data based on the vars views of the data.

but this might be more closely related to an issue with adapt, itself

This seems likely.

Comment on lines 635 to 645
boxy_w = reshape(A.w, Nq^2, Nqk, 1, nelemv, nelemh)
flat_w = @view boxy_w[:, end, :, end, :]
flat_wz0 = reshape(flat_w, Nq^2, 1, 1, 1, nelemh)
boxy_wz0 = reshape(A.wz0, Nq^2, Nqk, 1, nelemv, nelemh)
boxy_wz0 .= flat_wz0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding .data after A.w and A.wz0 solve the issue? The split-explicit code we developed with @jm-c does this quite a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth a shot. If that doesn't work, we can export the MPIStateArray functions we created to find .data of adapted MPIStateArrays.

@jkozdon
Copy link

jkozdon commented Jun 29, 2020

but this might be more closely related to an issue with adapt, itself

Nothing to do with CLIMA or MPIStateArrays.

using CUDA;
CUDA.allowscalar(false);
Nq = 5;
Nqk = 5;
nelemv = 10;
nelemh = 100;
nelemh_ghost = 10;
nstate = 5;
Q = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);
A = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);
w = view(Q, 3, :, :);
wz0 = view(A, 1, :, :);
boxy_w = reshape(w, Nq^2, Nqk, 1, nelemv, nelemh);
flat_w = @view boxy_w[:, end, :, end, :];
flat_wz0 = reshape(flat_w, Nq^2, 1, 1, 1, nelemh);
boxy_wz0 = reshape(wz0, Nq^2, Nqk, 1, nelemv, nelemh);
boxy_wz0 .= flat_wz0;

will produce:

julia> using CUDA;

julia> CUDA.allowscalar(false);

julia> Nq = 5;

julia> Nqk = 5;

julia> nelemv = 10;

julia> nelemh = 100;

julia> nelemh_ghost = 10;

julia> nstate = 5;

julia> Q = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);

julia> A = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);

julia> w = view(Q, 3, :, :);

julia> wz0 = view(A, 1, :, :);

julia> boxy_w = reshape(w, Nq^2, Nqk, 1, nelemv, nelemh);

julia> flat_w = @view boxy_w[:, end, :, end, :];

julia> flat_wz0 = reshape(flat_w, Nq^2, 1, 1, 1, nelemh);

julia> boxy_wz0 = reshape(wz0, Nq^2, Nqk, 1, nelemv, nelemh);

julia> boxy_wz0 .= flat_wz0;
ERROR: scalar getindex is disallowed
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] assertscalar(::String) at /home/jekozdon/.julia/packages/GPUArrays/X4SqE/src/host/indexing.jl:41
 [3] getindex at /home/jekozdon/.julia/packages/GPUArrays/X4SqE/src/host/indexing.jl:96 [inlined]
 [4] getindex at ./subarray.jl:245 [inlined]
 [5] _unsafe_getindex_rs at ./reshapedarray.jl:249 [inlined]
 [6] _unsafe_getindex at ./reshapedarray.jl:246 [inlined]
 [7] getindex at ./reshapedarray.jl:234 [inlined]
 [8] getindex at ./subarray.jl:236 [inlined]
 [9] _unsafe_getindex_rs(::SubArray{Float64,3,Base.ReshapedArray{Float64,5,SubArray{Float64,2,CuArray{Float64,3,Nothing},Tuple{Int64,Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}}},true},Tuple{}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}}},false}, ::Tuple{Int64,Int64,Int64}) at ./reshapedarray.jl:249
 [10] _unsafe_getindex(::Base.ReshapedArray{Float64,5,SubArray{Float64,3,Base.ReshapedArray{Float64,5,SubArray{Float64,2,CuArray{Float64,3,Nothing},Tuple{Int64,Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}}},true},Tuple{}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}}},false},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64) at ./reshapedarray.jl:246
 [11] getindex at ./reshapedarray.jl:234 [inlined]
 [12] _getindex at ./abstractarray.jl:1020 [inlined]
 [13] getindex at ./abstractarray.jl:980 [inlined]
 [14] _broadcast_getindex at ./broadcast.jl:597 [inlined]
 [15] _getindex at ./broadcast.jl:628 [inlined]
 [16] _broadcast_getindex at ./broadcast.jl:603 [inlined]
 [17] getindex at ./broadcast.jl:564 [inlined]
 [18] macro expansion at ./broadcast.jl:910 [inlined]
 [19] macro expansion at ./simdloop.jl:77 [inlined]
 [20] copyto! at ./broadcast.jl:909 [inlined]
 [21] copyto! at ./broadcast.jl:864 [inlined]
 [22] materialize!(::Base.ReshapedArray{Float64,5,SubArray{Float64,2,CuArray{Float64,3,Nothing},Tuple{Int64,Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}}},true},Tuple{}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{5},Nothing,typeof(identity),Tuple{Base.ReshapedArray{Float64,5,SubArray{Float64,3,Base.ReshapedArray{Float64,5,SubArray{Float64,2,CuArray{Float64,3,Nothing},Tuple{Int64,Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}}},true},Tuple{}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}}},false},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}}}) at ./broadcast.jl:823
 [23] top-level scope at REPL[69]:1

@leios
Copy link
Contributor

leios commented Jun 29, 2020

yeah, then we have to mess around with Adapt to get this to work... Unless there is a way to simplify the types (specifically flat_w)? There was a similar situation I ran into where we couldn't broadcast from a view to a non-view array.

@jkozdon
Copy link

jkozdon commented Jun 29, 2020

If you reshape before the views this would work:

using CUDA;
CUDA.allowscalar(false);
Nq = 5;
Nqk = 5;
nelemv = 10;
nelemh = 100;
nelemh_ghost = 10;
nstate = 5;
A = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);
B = reshape(A, nstate, Nq^2, Nqk, nelemv, nelemh);
w = view(B, 3, :, :, :, :);
wz0 = view(B, 1, :, 1:1, 1:1, :);
w .= wz0;

To do this you need to access the data then reshape and pull out the indices you want.

@blallen
Copy link
Contributor Author

blallen commented Jun 29, 2020

@jkozdon we ended up using varsindex and .data to get things to work, see d8e0d92. It's not elegant, but it's functional.

Comment on lines 640 to 644
# project w(z=0) down the stack
boxy_w = reshape(A.data[:,index_w,:], Nq^2, Nqk, 1, nelemv, nelemh)
flat_w = @view boxy_w[:, end, :, end, :]
flat_wz0 = reshape(flat_w, Nq^2, 1, 1, 1, nelemh)
boxy_wz0 = reshape(A.data[:,index_wz0,:], Nq^2, Nqk, 1, nelemv, nelemh)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work, you are creating an array slice not a view, so you are not modifying the original array but a newly created array.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you haven't accounted for the ghost elements, so I don't think it will work in parallel.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly it would work in parrallel once you fix the views with:

    Nq, Nqk, _, _, nelemv, nelemh, _, _ = basic_grid_info(dg)

above, though you will be copying down the ghost elements too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I think you are right. This worked on my GPU, but we need to think of a more general solution.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to reshape, then do the views (like I did here: #1301 (comment))

The reshape returns a new CuArray (pointing to the same data) which is why you miss the double indirection problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me lurking - is there an extra ,1, after num_aux_state or is that needed for some more magic?

Ah it is an extra ,1,! that's what my parenthetical was about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o.o I think @jkozdon is editting it, the indices just jumped

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (original code) doesn't work, not even on a CPU. so we need to make some kind of further change for sure

Couple typos for sure. Updated my suggestion.

@jkozdon can you refresh me on the MPIStateArray.data layout? I thought it was ["GL points in element", "states", "elements"] but that doesn't match up with your views (or even the first reshape, shouldn't the 1 be a :?).

data ordering is [dof, state, elements]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pushed a commit based on that fix, worked on the CPU, about to try it on the GPU

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm didn't succeed on GPU due to the reshape dimensions not matching the array.data: https://gist.github.com/climabot/4a0358efe092e8b8015870c489a0e2df#file-out_10192381

any ideas @jkozdon? it works fine on CPU

@blallen
Copy link
Contributor Author

blallen commented Jun 29, 2020

bors try

bors bot added a commit that referenced this pull request Jun 29, 2020
@bors
Copy link
Contributor

bors bot commented Jun 29, 2020

try

Build failed:

@blallen
Copy link
Contributor Author

blallen commented Jun 30, 2020

bors try

bors bot added a commit that referenced this pull request Jun 30, 2020
@bors
Copy link
Contributor

bors bot commented Jun 30, 2020

try

Build failed:

@blallen
Copy link
Contributor Author

blallen commented Jun 30, 2020

try

Build failed:

Does this mean slurm/CI succeeded or that it wasn't tried at all?

@leios
Copy link
Contributor

leios commented Jun 30, 2020

I think it is still trying those

@blallen
Copy link
Contributor Author

blallen commented Jun 30, 2020

@jkozdon if this works on the GPU, should I just remove copy_stack_field_down! entirely?

@jkozdon thoughts? since this seems to be working finally

@blallen
Copy link
Contributor Author

blallen commented Jun 30, 2020

bors try

bors bot added a commit that referenced this pull request Jun 30, 2020
@jkozdon
Copy link

jkozdon commented Jun 30, 2020

@jkozdon thoughts? since this seems to be working finally

Fine by me to remove it. I believe that you are the only ones using this functionality anyway.

blallen and others added 2 commits June 30, 2020 16:00
modifying HBM to no longer use vars

beautification

make basic_grid_info return NamedTuple

use views of `.data` instead of reshapes of slices

account for ghost elements

clean up and doc fix

Co-authored-by: leios <jrs.schloss@gmail.com>
Co-authored-by: Jeremy E Kozdon <jekozdon@nps.edu>
@bors
Copy link
Contributor

bors bot commented Jun 30, 2020

try

Build succeeded:

@blallen
Copy link
Contributor Author

blallen commented Jun 30, 2020

@jkozdon thoughts? since this seems to be working finally

Fine by me to remove it. I believe that you are the only ones using this functionality anyway.

okay, I went ahead and removed them in a separate commit.

@blallen
Copy link
Contributor Author

blallen commented Jun 30, 2020

bors r+

@jkozdon
Copy link

jkozdon commented Jun 30, 2020

okay, I went ahead and removed them in a separate commit.

Two commits. Love it!

@blallen
Copy link
Contributor Author

blallen commented Jun 30, 2020

okay, I went ahead and removed them in a separate commit.

Two commits. Love it!

haha I know that both commits will pass CI :) But we had a conversation at MIT earlier today about this and sometimes multiple commits are justified for clarity. Also, I think we said we could have the auto-bug checker thing run only on bors commits, right?

bors bot added a commit that referenced this pull request Jun 30, 2020
1301: Replace `copy_stack_field_down!` with broadcasts of reshaped MPIStateArrays r=blallen a=blallen

# Description

After #1071 is merged, we no longer need a dedicated kernel to copy the vertical velocity at the top of the ocean (`wz0`) down the entire stack for use in the source term for `η`. We can also use this PR to discuss how we want the `basic_grid_info` function to work. 



Co-authored-by: Brandon Allen <ballen@mit.edu>
@jkozdon
Copy link

jkozdon commented Jun 30, 2020 via email

@bors
Copy link
Contributor

bors bot commented Jun 30, 2020

Build failed:

@charleskawczynski
Copy link
Member

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 1, 2020

@bors bors bot merged commit 78fd04b into CliMA:master Jul 1, 2020
@blallen blallen deleted the bla/wz0_broadcast branch July 23, 2020 19:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Defensive programming Makes code safer Ocean 🌊 "How inappropriate to call this planet Earth when it is quite clearly Ocean"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants