Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenate using global indexes #3

Open
TomNicholas opened this issue Dec 8, 2018 · 2 comments
Open

Concatenate using global indexes #3

TomNicholas opened this issue Dec 8, 2018 · 2 comments
Labels
enhancement New feature or request requires BOUT changes May require changes to BOUT++ upstream to implement requires xarray changes May require changes to xarray upstream to implement

Comments

@TomNicholas
Copy link
Collaborator

TomNicholas commented Dec 8, 2018

If BOUT is changed upstream to write out the global grid point indexes to each file, then xBOUT can use them to infer the order in which the datasets should be concatenated.

This would also require implementing an infer_order_from_coords option in xarray.auto_combine(), as discussed here.

This would move even more of the concatenation logic upstream to xarray, as then we wouldn't need to create a nested list-of-lists anymore, we could just pass the unordered glob of all dump files.

On the other hand this might make the open_mfdataset() call considerably slower, as it will have to read the coordinate variables in order to order the datasets (related discussion here).

@TomNicholas TomNicholas added enhancement New feature or request requires BOUT changes May require changes to BOUT++ upstream to implement requires xarray changes May require changes to xarray upstream to implement labels Dec 8, 2018
@d7919
Copy link
Member

d7919 commented Dec 8, 2018

What logic is currently used to decide how to merge the files?

Is it possible to use xBOUT to load just a single dump file? Sometimes it can be helpful to explore a single processor's data rather than the entire dataset (e.g. for debugging).

@TomNicholas
Copy link
Collaborator Author

TomNicholas commented Dec 9, 2018

What logic is currently used to decide how to merge the files?

Currently it reads the processor splitting used (nxpe & nype) from the first file supplied, and uses that in conjunction with the names of the files to order them into a nested list structure, which specifies the order in which they should be merged. It's exploiting the fact that we know the relationship between the processor number (and hence filename) and it's position in the domain for BOUT. This nested list is potentially 3 lists deep (time, x & y), and is passed into the (modified) xarray.open_mfdataset() here.

If you want to understand the full reasoning of why I chose to do it like this I would recommend reading my issue about multidimensional concatenation in xarray, and all the discussion on the pull request that xBOUT relies on. We will have to conform to whichever solution is ultimately selected there, but at the moment it looks like we'll end up having a choice between the current method and global indexes instead.

Is it possible to use xBOUT to load just a single dump file?

So I intended this to be possible by just specifying that particular filepath exactly instead of a glob, e.g.

ds = open_boutdataset('BOUT.dmp.1.nc')

But actually this doesn't work, which is a bug (I'll make an issue for it). I should add a unit test for this specifically. Even once I fix this bug then right now it will trim off the ghost cells though until #19 is implemented however.

You can still use bare xarray to read a single dump file by using

ds = xarray.open_dataset('BOUT.dmp.1.nc)

though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request requires BOUT changes May require changes to BOUT++ upstream to implement requires xarray changes May require changes to xarray upstream to implement
Projects
None yet
Development

No branches or pull requests

2 participants