-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redefines equality (==) for VerticallyStretchedRectilinearGrids #2019
Conversation
field of the two elements. (The default behavior is using `===`, which makes it so that | ||
`grid1==deepcopy(grid1)` for `VerticallyStretchedRectilinearGrid`s. | ||
""" | ||
function Base.:(==)(grid1::VerticallyStretchedRectilinearGrid, grid2::VerticallyStretchedRectilinearGrid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about other types of grids? Curvilinear grids for example? Would they show the same "problem" as the VerticallyStretchedRectilinearGrid? Why don't we redefine
Base.:(==)to work for
AbstactGrid`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea for other fields, so I wanted to be conservative for now. This seems to be related to array comparisons and, since vertically stretched grids needs an array, this error gets thrown. Do other grids have arrays in their struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap. Other grids have it.
No need to be conservative. Let's fix it for all grids!
Would the ==
as above fail for RegularRectilinearGrids
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the == as above fail for RegularRectilinearGrids?
I think that would work for all grids without change actually. But I haven't tested it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
==
works (now) for RegularRectilinearGrid
because it is isbits
I think. We may not need a special method for this grid. The same goes for RegularLatitudeLongitudeGrid
.
For any grid that contains arrays, we need to define our own numeric equality operator ==
.
It fixes it for CPUs, but I get an error for GPU architecture. It's likely that we need to allow scalar operations due to |
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test would be nice to come along with this PR. I can help out with that if you want.
Hm... Whatever we do should work for both CPUs and GPUs. @glwagner and I were thinking about equality between grids at some point. Probably (I'm not sure -- you can double check that), the fix in this PR fails for GPUs, because when in that case when the checkpointer tries to compare the two grids, the one saved on disk has |
Thanks! I'll take you up on that if you don't mind. Given that you said that other grids also have arrays, it might be best to expand checkpoint testing for all grids? |
Yeap. But first we need to sort the GPU/CPU issue. For example, two grids that are identical but the arrays of one live on CPU but the other on GPU should they be considered equal? I believe in that case we should add methods so that: julia> grid1 == grid2
true
julia> grid1 === grid2
false |
But I would suggest we postpone this until after #1998 so that we don't do double work. Also, by putting |
I don't know if I agree that waiting is best though. Seems like #1998 is still a draft so it might take a while, and a bug is a bug. I'm gonna have to fix this anyway to move on with my research since I need to pickup my simulations and I'd like to use vertically stretched grids. Also, I think we can define a method that returns |
@tomchor, you can always do pkg> add Oceananigans#tc/pickup and continue research if that does the job. I'm just a bit reluctant to apply a quick fix that works only for one case only to have to do the same work again for the lat/lon grid for example, etc. I'm very keen to solve this bug before #1998 -- postponing after #1998 was just a suggestion. But the solution should be one that covers all grids and all architectures. Not something that works only for one grid and leaves the bug for some others. Perhaps let's see what others have to chip in on this? |
Oh yeah, I didn't mean to half-fix it. I do believe it should be a "proper" fix. I just think that adding architecture to grids (#1998) isn't gonna change what the fix is. I'll have more info tomorrow after I actually try to do this for GPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return false if the grids are on a different topology.
It does. And currently it also returns |
Regarding the comment here
I'm not sure what you mean by "restoring" a grid with GPU architecture. This last commit was enough for me to pickup my GPU-architecture simulation with a VerticallyStretchedGrid. For example, the following mwe works in this PR: using Oceananigans
using CUDA
grid1 = VerticallyStretchedRectilinearGrid(size=(2, 2, 2),
architecture=GPU(),
x=(0, 1), y=(0, 1), z_faces=k -> k,
halo=(3,3,3),
)
grid2 = VerticallyStretchedRectilinearGrid(size=(2, 2, 2),
architecture=GPU(),
x=(0, 1), y=(0, 1), z_faces=k -> k,
halo=(3,3,3),
)
if CUDA.@allowscalar grid1!=grid2
throw(error)
end
model = NonhydrostaticModel(grid=grid1,
architecture=GPU(),
)
progress(sim) = @info "Iteration: $(sim.model.clock.iteration), time: $(round(Int, sim.model.clock.time))"
simulation = Simulation(model, Δt=1, stop_time=10, progress=progress)
simulation.output_writers[:chk_writer] = Checkpointer(model;
dir=".",
prefix = "mwe",
schedule = TimeInterval(2),
force = false,
cleanup = true,
)
run!(simulation)
simulation.stop_time = 20
run!(simulation, pickup=true) |
So, everything that I've tested so far leads me to believe this is working with vertically stretched grids for now. I've tested a few examples with both CPU and GPU architecture and they've picked up successfully. (Although it's possible I'm missing something...) However, since @navidcy and @glwagner have reservations about this fix (regarding #1998) I won't expand it to all abstract grids for now. I think more discussions need to be done. I'll leave this PR open though since I need this fix to complete my research, if that's okay. |
@@ -194,7 +194,7 @@ function set!(model, filepath::AbstractString) | |||
|
|||
# Validate the grid | |||
checkpointed_grid = file["grid"] | |||
model.grid == checkpointed_grid || | |||
CUDA.@allowscalar model.grid == checkpointed_grid || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to allow scalar operations specifically where they are needed? Since we don't need scalar operations for all equality comparisons, we may want to check within ==(grid1, grid2)
where applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that first but it failed for some reason. Couldn't really figure out why, but I agree we should strive to use scalar op only when necessary.
Can we just compare 1) the topology and 2) all of the nodes? I think that would be sufficient to determine equality. I'm worried if we don't explicitly check the topology that we might run into some edge cases where the nodes / spacings are the same but the topologies are different (these would possibly be pathological with 0 halo in a periodic / flat direction, but still good to explicitly check I think...) When comparing numeric equality for nodes, we should add I think we should add a test that equality works as expected, as well. We should also add a test for checkpointing on stretched grids. I think its ok if we don't support a numeric equality for other grids --- we can build them up one at a time. They are specific to each grid. The main downside here is that we need to refactor our grid implementation to have a single Because of this: julia> using CUDA
julia> a = rand(2)
2-element Vector{Float64}:
0.8207604162394306
0.3815099688071648
julia> b = CuArray(a)
2-element CuArray{Float64, 1}:
0.8207604162394306
0.3815099688071648
julia> CUDA.@allowscalar a == b
true I think that this will work if the checkpointed grid is deserialized onto the CPU, even though |
It'd be great to get this PR through. I don't think you need the PR open for your research though... ? You can use any code you like for that. We should keep this PR open if we are planning to finish it and merge it into |
True, I guess I just need the branch/code. I intend to merge this though. It's just that apparently #1998 needs to be finished first, according to @navidcy at least, so I thought I'd wait for that. Are you suggesting we move on this one before other PRs and just for vertically stretched grids? |
I don't think we need #1998 first. I think we may need different == for different grid types, but I'm not totally sure. |
Nah... Well, I may have initially thought that #1998 was important to finish first, but I take it back :) |
@tomchor you should be able to user v0.63.3 now. When you make sure that everything works for you then delete this branch. |
Closes #2018