Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regions don't work with sliced Datasets #115

Open
johnomotani opened this issue Mar 28, 2020 · 19 comments
Open

Regions don't work with sliced Datasets #115

johnomotani opened this issue Mar 28, 2020 · 19 comments

Comments

@johnomotani
Copy link
Collaborator

The regions introduced in #107 are based on slicing the Dataset with global indices. The indices stored in Region objects will not be correct if the Dataset is sliced for some reason, e.g. ds = ds.isel(x=slice(20, None).

One way to resolve this might be to add global index coordinates, e.g. for a dataset with dimensions {'x', 'theta', 'zeta'} have some coordinates {'x_ind', 'theta_ind', 'zeta_ind'} that give global integer indices. Then we could use this in Region.getSlices() (which might need an extra Dataset or DataArray argument to get the coordinates from) to get offset. E.g. if the passed-in DataArray has x_ind that does not start from 0, can subtract the da[x_ind][0] from Region.xinner_ind, and check that Region.xinner_ind is within the bounds of x_ind.

@TomNicholas
Copy link
Collaborator

If the regions are specified along the dims 'x' and 'y', can we not just promote 'x' and 'y' to be 1D coordinates? Then their values would be preserved upon slicing.

The only disadvantage I can think of that approach would be if you wanted x=0 to be within the domain, and x=-1 to be in the left-hand boundary cells, then you would end up with an x coord which had a value -1 where it's corresponding dim was index 0...

@johnomotani
Copy link
Collaborator Author

Yes, naming the coordinates x, y and z would make sense. I think I was worried that that would get confusing when the dimensions are sometimes renamed and sometimes not, e.g. if you rename dimension y to theta, what happens to the coordinate y? Maybe it just needs to be created after the dimensions are renamed...

It would be nice to have the indices count just 'real' grid cells, but in practice I would start the coordinates from 0 in the boundary cells. Otherwise it's implied that the coordinate y should not count any boundary cells, which would make it double-valued in the upper target boundary cells (if there are any).

@TomNicholas
Copy link
Collaborator

I think I was worried that that would get confusing when the dimensions are sometimes renamed

I think this is an argument for not renaming dimensions at all. There isn't a penalty to carrying around multiple coordinates, so I think theta should be added as a new coordinate, without replacing y. (I realise that might be different from what I've advised before but if there's a clear need for x and y to be coordinates then I think we should always keep them).

There might be one or two functions in xarray which should accept 1D non-dimension coordinates but currently would refuse to, but the neatest solution to that would just be to generalise upstream (I've submitted at least one PR doing that before).

@johnomotani
Copy link
Collaborator Author

I guess the biggest one would be: can you isel(theta=42) when theta is a coordinate but not a dimension?

@TomNicholas
Copy link
Collaborator

I guess the biggest one would be: can you isel(theta=42) when theta is a coordinate but not a dimension?

No you can't. That makes some sense though - theta doesn't have a value of 42 anywhere, it only takes values between 0 and 2pi in any tokamak. To select the 42nd index you conceptually should be selecting along a dimension index, not a coordinate, with isel(y=42). (You could imagine changing .sel() so that it dispatched to .isel() if passed a dimension, but I expect that would lead to lots of hard-to-debug mistakes in user code).

You should be able to do sel(theta=42), but unfortunately you currently can't. This was something I've been meaning to add because it seems pretty crucial!

@johnomotani
Copy link
Collaborator Author

dimensions with the same names as coordinates seem a bit confusing in general... especially since BOUT++ uses x/y/z both as coordinates (with spacing dx/dy/dz) and often (in the code) as indices. dx, dy and dz are available in the Dataset, so setting y to be a grid-cell-number coordinate is arguably just as confusing as isel(theta=42) is at the moment... Not sure what a good naming convention is though... (i,j,k) is a bit likely to have name clashes. (ix, iy, iz), (jx, jy, jz), (ix, jy, kz) are possible but a bit obscure. If we put some naming suggestions as individual comments, we can vote on them with reactions. Feel free to add more if you think of them! ...

@johnomotani
Copy link
Collaborator Author

Question: what should we name the dimensions and grid-cell-index coordinates in xBOUT?

@johnomotani
Copy link
Collaborator Author

psi, theta, zeta

@johnomotani
Copy link
Collaborator Author

x, y, z

@johnomotani
Copy link
Collaborator Author

psi_ind, theta_ind, zeta_ind

@johnomotani
Copy link
Collaborator Author

x_ind, y_ind, z_ind

@johnomotani
Copy link
Collaborator Author

i, j, k

@johnomotani
Copy link
Collaborator Author

ix, iy, iz

@johnomotani
Copy link
Collaborator Author

jx, jy, jz

@johnomotani
Copy link
Collaborator Author

ix, jy, kz

@ZedThree
Copy link
Member

I'd be happy with any variation on x/y/z_ind, i/j/k, ix/iy/iz or jz/jy/jz as long as it's documented up front :) I prefer those options as they're a little bit clearer that they're index-space, or at least likely not real space.

Can I veto any variation that uses psi/theta/zeta though? That heavily implies a particular coordinate system.

kz makes me think "wavenumber", so -1 for that option 😄

@johnomotani
Copy link
Collaborator Author

@ZedThree sorry, was writing the above a bit too late... The 'psi/theta/zeta' options would depend on the geometry chosen by the user. At the moment if you pass geometry='toroidal' to open_boutdataset(), when it calls apply_geometry() it goes through to add_toroidal_geometry_coords(), which renames the dimensions as x/theta/zeta (x because psi_poloidal is not necessarily a 1d coordinate due to possibly different dx in PFR core). Other geometry implementations can do the same (e.g. s-alpha uses 'r' for the radial coordinate). So psi/theta/zeta should really be 'geometry-choice-dependent, and possibly user-defined, for example psi/theta/zeta', and similarly 'psi_ind/theta_ind/zeta_ind'.

@ZedThree
Copy link
Member

Ah! In that case, definitely the psi_ind version!

@johnomotani
Copy link
Collaborator Author

johnomotani commented Jan 19, 2021

Just came back to look at this issue again. It's what prevents plotting selections of the poloidal plane from working.

The problems discussed above I think are now mostly resolved, since we added bout_tdim, bout_xdim, bout_ydim and bout_zdim to the metadata to keep track of the dimension names.

Fixing this now needs a bit of extra work to:

  1. add global grid-index coordinates (which would be more convenient to use with ixseps1, jyseps1_2, etc.). For example if ds.metadata["bout_ydim"] == "theta" we could add a coordinate called theta_global_ind.
  2. update the methods in region.py to use those global coordinates i.e. to replace current things like (this isn't actual code, just to illustrate changing isel to sel)
    xcoord = ds.metadata["bout_xdim"]
    result = ds.isel({xcoord: slice(lower, upper)})
    
    to something like
    xcoord = f"{ds.metadata['bout_xdim']}_global_ind"
    result = ds.sel({xcoord=slice(lower, upper)})
    
    [and we'll have to be careful because isel excludes the end-point in the normal Python way, but sel includes the end-point (I guess because it works with arbitrary labels, not just integers and if you had a coordinate that was ("yellow", "green", "blue") it's weird to need to do .sel(color=slice("yellow", "blue") to get just "yellow" and "green".]
  3. Might also need some special handling for the case when a region has no points left in it after a Dataset is sliced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants