Should we store `architecture` in `grid`? #1825

glwagner · 2021-07-05T21:32:24Z

In the general case of a pre-computed grid or mesh, the architecture on which the grid is stored has be known. Thus in general, the architecture of a grid must be provided.

The few special cases to this rule are grids with metrics that are either constant or can be computed on the fly, like RegularRectilinearGrid. Up to recently RegularRectilinearGrid was the only option, so we hadn't the motivation to ponder whether architecture belonged as a model property, field property, or grid property...

Putting architecture in grid would mean that we don't have to specify it in model constructors. Since grid is so central we wouldn't really need architecture as a property anywhere else. For example, we wouldn't need to store architecture in every.single.field object, as we do now. Alleviating this boilerplate makes the change feel especially right...

The main downside is a loss of flexibility for those grids that are truly architecture-independent, like RegularRectilinearGrid and RegularLatitudeLongitudeGrid. On the other hand, we have to specify architecture somewhere and rarely helpful to mix CPU and GPU computations using the same grid.

The text was updated successfully, but these errors were encountered:

francispoulin · 2021-07-05T21:43:18Z

Thanks for the nice summary @glwagner .

I vote for storing architecture in grid.

navidcy · 2021-07-07T04:25:54Z

architecture lives in the Grid in FourierFlows.jl (if that's an argument.. haha)

navidcy · 2021-07-07T04:31:02Z

Also, float_type is part of the grid and all field defined on the grid could inherit that, right?
(perhaps this is already the case?)

glwagner · 2021-07-07T13:33:40Z

eltype(grid) propagates everywhere because of defaults like

Oceananigans.jl/src/Grids/new_data.jl

Line 67 in f7408dc

new_data(arch, grid, loc) = new_data(eltype(grid), arch, grid, loc)

grid seems like a good place for users to specify architecture and the floating point type. I think the API is better if important things like that only need to be specified once.

glwagner · 2021-07-09T12:40:24Z

@ali-ramadhan would be good to get your thoughts before doing anything. Does this help / hurt memory distributed models?

francispoulin · 2021-07-09T12:44:07Z

Out of curiosity, when we create a PR, is it possible to find out how long the tests take compared to before? It seems like it might be useful when making a change like this, as well as pretty much anything else.

glwagner · 2021-07-09T12:59:51Z

The amount of time it takes each step to run is reported by buildkite:

francispoulin · 2021-07-09T14:14:27Z

Thanks. And I guess you can go to the latest merge to find out how long it used to take? It seems like showing the difference might be helpful, in case there is a big difference but that is probably wishing for the stars.

glwagner · 2021-07-09T14:33:53Z

Right, you can reference prior PRs to compare old build times to a new build time.

Custom reports might be possible, I don't know! Could be worth checking out the buildkite documentation. If you want something written into the PR I think you have to build a bot or something. Also possible, probably not "hard" once you know what to do.

navidcy · 2021-07-09T20:14:58Z

But since all tests run on a computer at MIT the time it takes also depends on how many tests are running (eg how many prs were just open) and other random factors.

glwagner · 2021-07-09T20:25:57Z

Right, there are some caveats on how you interpret those times. They aren't a benchmark.

francispoulin · 2021-07-09T21:36:39Z

I guess bennchmarks are the best way to tell and we should run those often enough to see whether anything has changed singificantly. Thanks.

glwagner · 2021-07-15T16:18:51Z

Looking at the code for DistributedIncompressibleModel:

Oceananigans.jl/src/Distributed/distributed_incompressible_model.jl

Lines 8 to 36 in 6e39d3f

    
           function DistributedIncompressibleModel(; architecture, grid, model_kwargs...) 
        
               i, j, k = architecture.local_index 
        
               Rx, Ry, Rz = architecture.ranks 
        
               my_connectivity = architecture.connectivity 
        
               Nx, Ny, Nz = size(grid) 
        
               Lx, Ly, Lz = length(grid) 
        
               # Pull out endpoints for full grid. 
        
               xL, xR = grid.xF[1], grid.xF[Nx+1] 
        
               yL, yR = grid.yF[1], grid.yF[Ny+1] 
        
               zL, zR = grid.zF[1], grid.zF[Nz+1] 
        
               # Make sure we can put an integer number of grid points in each rank. 
        
               # Will generalize in the future. 
        
               # TODO: Check that we have enough grid points on each rank to fit the halos! 
        
               @assert isinteger(Nx / Rx) 
        
               @assert isinteger(Ny / Ry) 
        
               @assert isinteger(Nz / Rz) 
        
               nx, ny, nz = Nx÷Rx, Ny÷Ry, Nz÷Rz 
        
               lx, ly, lz = Lx/Rx, Ly/Ry, Lz/Rz 
        
               x₁, x₂ = xL + (i-1)*lx, xL + i*lx 
        
               y₁, y₂ = yL + (j-1)*ly, yL + j*ly 
        
               z₁, z₂ = zL + (k-1)*lz, zL + k*lz 
        
               # FIXME? local grid might have different topology! 
        
               my_grid = RegularRectilinearGrid(topology=topology(grid), size=(nx, ny, nz), x=(x₁, x₂), y=(y₁, y₂), z=(z₁, z₂), halo=halo_size(grid))

suggests to me that it's important to include architecture when constructing grid, even for grids that are device independent. Specifically, our architecture object combines information about the memory layout and the device that's used for computation (CPU or GPU). RegularRectilinearGrid is device independent, but, apparently, is in practice not architecture in dependent (because we need to restrict ourselves to knowledge of just a "local" grid for local computations).

One complication is that, apparently, the pressure solver requires the global grid:

Oceananigans.jl/src/Distributed/distributed_incompressible_model.jl

Lines 40 to 41 in 6e39d3f

    
           pressure_solver = haskey(model_kwargs, :pressure_solver) ? Dict(model_kwargs)[:pressure_solver] : 
        
                                                                      DistributedFFTBasedPoissonSolver(architecture, grid, my_grid)

We'll have to look into that in more detail to understand what needs to be done for that, and for other solvers like the PreconditionedConjugateGradientSolver.

glwagner · 2021-07-15T16:25:28Z

@ali-ramadhan can we "reconstruct" the full grid parameters needed to build the DistributedFFTPoissonSolver, rather than passing full_grid as an argument to the constructor?

Oceananigans.jl/src/Distributed/distributed_fft_based_poisson_solver.jl

Lines 13 to 31 in 6e39d3f

    
           function DistributedFFTBasedPoissonSolver(arch, full_grid, local_grid) 
        
               topo = (TX, TY, TZ) = topology(full_grid) 
        
               λx = poisson_eigenvalues(full_grid.Nx, full_grid.Lx, 1, TX()) 
        
               λy = poisson_eigenvalues(full_grid.Ny, full_grid.Ly, 2, TY()) 
        
               λz = poisson_eigenvalues(full_grid.Nz, full_grid.Lz, 3, TZ()) 
        
               I, J, K = arch.local_index 
        
               λx = λx[(J-1)*local_grid.Ny+1:J*local_grid.Ny, :, :] 
        
               eigenvalues = (; λx, λy, λz) 
        
               transform = PencilFFTs.Transforms.FFT!() 
        
               proc_dims = (arch.ranks[2], arch.ranks[3]) 
        
               plan = PencilFFTs.PencilFFTPlan(size(full_grid), transform, proc_dims, MPI.COMM_WORLD) 
        
               storage = PencilFFTs.allocate_input(plan) 
        
               return DistributedFFTBasedPoissonSolver(plan, full_grid, local_grid, eigenvalues, storage) 
        
           end

I think we just need an Allgather to infer the global number of grid points and global grid extent from each process-local grid.

DistributedShallowWaterModel doesn't use the global grid at all except to build the process-local grids.

glwagner · 2021-07-15T16:36:42Z

As for the API, it might simplify matters to have architecture as a positional argument, so that we dispatch on it. So we might end up with syntax that looks something like

grid = RegularRectilinearGrid(CPU(), float_type=Float32, size=(1, 1, 1), x=(0, 1), y=(1, 2), z=(-3, 0))

We can also add an architecture kwarg if others think that's better, eg

grid = RegularRectilinearGrid(architecture=CPU(), float_type=Float32, size=(1, 1, 1), x=(0, 1), y=(1, 2), z=(-3, 0))

We will have to translate the architecture kwarg to a positional arg (so that we can dispatch on it for the purpose of Oceananigans.Distributed). This adds some boilerplate for all grids, but isn't ultimately a huge concern.

@christophernhill might have some useful comments.

francispoulin · 2021-07-15T17:06:47Z

Dispatching over architecture sounds like a good idea to me, but I don't pretend to understand all the details of the two approaches.

glwagner added cleanup 🧹 Paying off technical debt user interface/experience 💻 labels Jul 5, 2021

This was referenced Jul 18, 2021

Boussinesq model vs incompressible model #1620

Closed

NonhydrostaticModel creates an extra empty element in diffusivities list when a tuple is passed to as closure #1878

Closed

tomchor mentioned this issue Jul 23, 2021

Roadmap to version 1.0 #1234

Closed

8 tasks

glwagner added the 🚨 high priority 🚨 label Sep 16, 2021

glwagner mentioned this issue Oct 5, 2021

Grids, meet architecture #1998

Closed

navidcy mentioned this issue Oct 17, 2021

Redefines equality (==) for VerticallyStretchedRectilinearGrids #2019

Merged

navidcy mentioned this issue Nov 22, 2021

from Architectures to Grids to Models #2078

Merged

simone-silvestri closed this as completed in #2078 Dec 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we store `architecture` in `grid`? #1825

Should we store `architecture` in `grid`? #1825

glwagner commented Jul 5, 2021

francispoulin commented Jul 5, 2021

navidcy commented Jul 7, 2021

navidcy commented Jul 7, 2021

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 9, 2021

francispoulin commented Jul 9, 2021

glwagner commented Jul 9, 2021

francispoulin commented Jul 9, 2021

glwagner commented Jul 9, 2021

navidcy commented Jul 9, 2021

glwagner commented Jul 9, 2021

francispoulin commented Jul 9, 2021

glwagner commented Jul 15, 2021

glwagner commented Jul 15, 2021

glwagner commented Jul 15, 2021 •

edited

Loading

francispoulin commented Jul 15, 2021

Should we store architecture in grid? #1825

Should we store architecture in grid? #1825

Comments

glwagner commented Jul 5, 2021

francispoulin commented Jul 5, 2021

navidcy commented Jul 7, 2021

navidcy commented Jul 7, 2021

glwagner commented Jul 7, 2021 • edited Loading

glwagner commented Jul 9, 2021

francispoulin commented Jul 9, 2021

glwagner commented Jul 9, 2021

francispoulin commented Jul 9, 2021

glwagner commented Jul 9, 2021

navidcy commented Jul 9, 2021

glwagner commented Jul 9, 2021

francispoulin commented Jul 9, 2021

glwagner commented Jul 15, 2021

glwagner commented Jul 15, 2021

glwagner commented Jul 15, 2021 • edited Loading

francispoulin commented Jul 15, 2021

Should we store `architecture` in `grid`? #1825

Should we store `architecture` in `grid`? #1825

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 15, 2021 •

edited

Loading