Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility of BOUT output data format with xarray/xBOUT #1641

Open
TomNicholas opened this issue Mar 14, 2019 · 0 comments
Open

Compatibility of BOUT output data format with xarray/xBOUT #1641

TomNicholas opened this issue Mar 14, 2019 · 0 comments

Comments

@TomNicholas
Copy link

TomNicholas commented Mar 14, 2019

This is a continuation of the discussion that arose from trying to load SD1D data into xBOUT. I'm copying and pasting a summary of the problem here for reference (@dschwoerer ):

It turns out that it's possible to write data into different dump files which cannot be easily combined using xarray, so xBOUT cannot generally open any set of BOUT++ data.

Do we really need to support writing different variables to different files at all? It makes the problem of concatenation considerably more complicated (I will explain why in a sec), and doesn't seem to give much benefit over just defining the variable globally. The cost of communicating the scalar will never be a limiting factor in performance scaling. If you want to print one number or string per process to a file for debugging purposes then you can always put it in the attributes, which will then basically be ignored on concatenation.

Also it seems to me that there is no physically-meaningful variable which is only defined on one process. You either have global quantities or spatially-varying ones, it doesn't make physical sense to have quantities which are only defined on part of the domain. Essentially, flux_ion should either be considered to be a global scalar and communicated, or the ion flux should be calculated (or at least written as a zero or NaN) for all grid cells so flux_ion = flux_ion(y), and you select the value at y=wall if that's what you're interested in. @bendudson what do you think about that?

Why different variables in different files is so inconvenient for xBOUT

Firstly, I am assuming that we eventually want a replacement for boutdata.collect which uses a function from the xarray top-level API, without having to do anything particularly complex or special-case for BOUT++, and not using lower-level private xarray functions either. (Guard cells would be trimmed using the preprocess argument to xarray.open_mfdataset.) We want this because then our new collection routine would benefit from all the optimization, parallelization, tests and bugfixing etc that all the rest of the xarray community provide, and the behaviour we want in xarray will be explicitly supported indefinitely.

That means the output of BOUT++ has to conform to some restrictions. Firstly it has to conform to the netCDF data model, because xarray is trying to combine all the dump files into a single netCDF dataset. That was the original problem with flux_ion - it violates the netCDF data model to have the same scalar variable in different files having different values, you either have to have a globally-consistent scalar or a flux_ion=flux_ion('y'). This was what was fixed by #4.

What I didn't realise until now was that the exact xarray API creates other restrictions on the output format of BOUT++ as well. In particular the planned API for xarray prevents you combining along a dimension which has some variables present in some datasets but not in others. Here it was decided that the two public-facing multidimensional combining functions should be xarray.auto_combine and xarray.manual_combine. The idea was that auto_combine could deal with any situation as long as there were global dimension coordinates to use to arrange the datasets, whereas manual_combine would a use a series of successive xarray.concat and xarray.merge operations.

Unfortunately BOUT++ does not have global coordinates so we can't use auto_combine, and if we allow writing out BoutReals into one processor but not others then we can't use manual_combine for those cases either. The reason we can't use manual_combine is that the current situation with flux_ion can't be joined with a concat alone or a merge alone along 'y'. That means we either have to:

  1. Not support writing out variables to one dump file and not others in BOUT++ so that we can always use manual_combine like xBOUT currently does (my preferred solution).

  2. Add global dimension coordinates to BOUT++ so that we can use auto_combine. This obviously wouldn't be backwards-compatible though.

  3. Try to persuade the xarray devs to change the planned API to something which supports having this type of output. They might but don't think they will, because the choice of auto_combine and manual_combine has already been discussed a fair amount, and I think it makes a lot of sense, but I will ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant