You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It turns out that it's possible to write data into different dump files which cannot be easily combined using xarray, so xBOUT cannot generally open any set of BOUT++ data.
Do we really need to support writing different variables to different files at all? It makes the problem of concatenation considerably more complicated (I will explain why in a sec), and doesn't seem to give much benefit over just defining the variable globally. The cost of communicating the scalar will never be a limiting factor in performance scaling. If you want to print one number or string per process to a file for debugging purposes then you can always put it in the attributes, which will then basically be ignored on concatenation.
Also it seems to me that there is no physically-meaningful variable which is only defined on one process. You either have global quantities or spatially-varying ones, it doesn't make physical sense to have quantities which are only defined on part of the domain. Essentially, flux_ion should either be considered to be a global scalar and communicated, or the ion flux should be calculated (or at least written as a zero or NaN) for all grid cells so flux_ion = flux_ion(y), and you select the value at y=wall if that's what you're interested in. @bendudson what do you think about that?
Why different variables in different files is so inconvenient for xBOUT
Firstly, I am assuming that we eventually want a replacement for boutdata.collect which uses a function from the xarray top-level API, without having to do anything particularly complex or special-case for BOUT++, and not using lower-level private xarray functions either. (Guard cells would be trimmed using the preprocess argument to xarray.open_mfdataset.) We want this because then our new collection routine would benefit from all the optimization, parallelization, tests and bugfixing etc that all the rest of the xarray community provide, and the behaviour we want in xarray will be explicitly supported indefinitely.
That means the output of BOUT++ has to conform to some restrictions. Firstly it has to conform to the netCDF data model, because xarray is trying to combine all the dump files into a single netCDF dataset. That was the original problem with flux_ion - it violates the netCDF data model to have the same scalar variable in different files having different values, you either have to have a globally-consistent scalar or a flux_ion=flux_ion('y'). This was what was fixed by #4.
What I didn't realise until now was that the exact xarray API creates other restrictions on the output format of BOUT++ as well. In particular the planned API for xarray prevents you combining along a dimension which has some variables present in some datasets but not in others. Here it was decided that the two public-facing multidimensional combining functions should be xarray.auto_combine and xarray.manual_combine. The idea was that auto_combine could deal with any situation as long as there were global dimension coordinates to use to arrange the datasets, whereas manual_combine would a use a series of successive xarray.concat and xarray.merge operations.
Unfortunately BOUT++ does not have global coordinates so we can't use auto_combine, and if we allow writing out BoutReals into one processor but not others then we can't use manual_combine for those cases either. The reason we can't use manual_combine is that the current situation with flux_ion can't be joined with a concat alone or a merge alone along 'y'. That means we either have to:
Not support writing out variables to one dump file and not others in BOUT++ so that we can always use manual_combine like xBOUT currently does (my preferred solution).
Add global dimension coordinates to BOUT++ so that we can use auto_combine. This obviously wouldn't be backwards-compatible though.
Try to persuade the xarray devs to change the planned API to something which supports having this type of output. They might but don't think they will, because the choice of auto_combine and manual_combine has already been discussed a fair amount, and I think it makes a lot of sense, but I will ask.
The text was updated successfully, but these errors were encountered:
This is a continuation of the discussion that arose from trying to load SD1D data into xBOUT. I'm copying and pasting a summary of the problem here for reference (@dschwoerer ):
It turns out that it's possible to write data into different dump files which cannot be easily combined using
xarray
, so xBOUT cannot generally open any set of BOUT++ data.Do we really need to support writing different variables to different files at all? It makes the problem of concatenation considerably more complicated (I will explain why in a sec), and doesn't seem to give much benefit over just defining the variable globally. The cost of communicating the scalar will never be a limiting factor in performance scaling. If you want to print one number or string per process to a file for debugging purposes then you can always put it in the attributes, which will then basically be ignored on concatenation.
Also it seems to me that there is no physically-meaningful variable which is only defined on one process. You either have global quantities or spatially-varying ones, it doesn't make physical sense to have quantities which are only defined on part of the domain. Essentially,
flux_ion
should either be considered to be a global scalar and communicated, or the ion flux should be calculated (or at least written as a zero or NaN) for all grid cells soflux_ion = flux_ion(y)
, and you select the value aty=wall
if that's what you're interested in. @bendudson what do you think about that?Why different variables in different files is so inconvenient for xBOUT
Firstly, I am assuming that we eventually want a replacement for
boutdata.collect
which uses a function from thexarray
top-level API, without having to do anything particularly complex or special-case for BOUT++, and not using lower-level private xarray functions either. (Guard cells would be trimmed using thepreprocess
argument toxarray.open_mfdataset
.) We want this because then our new collection routine would benefit from all the optimization, parallelization, tests and bugfixing etc that all the rest of thexarray
community provide, and the behaviour we want inxarray
will be explicitly supported indefinitely.That means the output of BOUT++ has to conform to some restrictions. Firstly it has to conform to the netCDF data model, because xarray is trying to combine all the dump files into a single netCDF dataset. That was the original problem with
flux_ion
- it violates the netCDF data model to have the same scalar variable in different files having different values, you either have to have a globally-consistent scalar or aflux_ion=flux_ion('y')
. This was what was fixed by #4.What I didn't realise until now was that the exact
xarray
API creates other restrictions on the output format of BOUT++ as well. In particular the planned API for xarray prevents you combining along a dimension which has some variables present in some datasets but not in others. Here it was decided that the two public-facing multidimensional combining functions should bexarray.auto_combine
andxarray.manual_combine
. The idea was thatauto_combine
could deal with any situation as long as there were global dimension coordinates to use to arrange the datasets, whereasmanual_combine
would a use a series of successivexarray.concat
andxarray.merge
operations.Unfortunately BOUT++ does not have global coordinates so we can't use
auto_combine
, and if we allow writing out BoutReals into one processor but not others then we can't usemanual_combine
for those cases either. The reason we can't usemanual_combine
is that the current situation withflux_ion
can't be joined with aconcat
alone or amerge
alone along'y'
. That means we either have to:Not support writing out variables to one dump file and not others in BOUT++ so that we can always use
manual_combine
like xBOUT currently does (my preferred solution).Add global dimension coordinates to BOUT++ so that we can use
auto_combine
. This obviously wouldn't be backwards-compatible though.Try to persuade the xarray devs to change the planned API to something which supports having this type of output. They might but don't think they will, because the choice of
auto_combine
andmanual_combine
has already been discussed a fair amount, and I think it makes a lot of sense, but I will ask.The text was updated successfully, but these errors were encountered: