-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel interpolation #108
Conversation
Need to re-chunk grid after using xr.concat to re-join the grid Dataset with upper boundary points removed. Otherwise 'y' is re-chunked into at least two parts.
If set to false, does not store the results in the Dataset, in order to save memory.
Making zShift a coordinate means that the Dataset is no longer required for the interpolation, so move the method to the BoutDataArray.
Provides a workaround for when the boundary cells were not saved in the data file.
Use coordinate ranges stored in Region instead.
Instead, will provide functionality to save the high-resolution variables into a new Dataset.
Distribute the output points of the high-resolution field like a standard BOUT++ cell-centred variable.
Increase jyseps*, ny, ny_inner, MYSUB to reflect the new resolution.
This method only actually does one thing, and is not required to be called before parallel interpolation, so rename to be clearer, and make 'n' argument non-optional.
By converting the DataArrays in each region into Datasets, can combine with xarray.combine_by_coords. Then is natural to return a Dataset, making it more straightforward to merge the results of calling this method for several variables into a single Dataset.
Information in regions is not correct for the high-res variable, and needs to be recalculated later.
In BoutDataArray.highParallelRes(), copy the attrs from the first part of the variable to the combined Dataset.
After interpolating to higher parallel resolution, a Dataset has the correct coordinates, but no regions. This commit makes add_toroidal_geometry_coords and add_s_alpha_geometry_coords skip adding coordinates if the coordinates are already present, so that the functions can be applied again to interpolated Datasets. At the moment, the only thing this does is to re-create the regions.
Need to slice hthe with 'y' instead of 'theta' if it was read from grid file.
Adding 'dy' as a coordinate allows it to be assembled correctly when DataArrays are combined with combine_by_coords, which is much more straightforward than recalculating it from the y-coordinate. When initialising a Dataset from the interpolated variables, will demote 'dy' to a variable again.
Add method BoutDataset.getHighParallelResVars() that takes a list of variables, and returns a new BoutDataset containing those variables with an increased parallel resolution. The new Dataset is a fully valid BoutDataset, so all plotting methods, etc. work.
xarray.testing also provides an assert_allclose function, so it is clearer to be explicit about which module the function belongs to.
Attributes like 'direction_y' only make sense for a particular DataArray, not the whole Dataset.
Some coordinates corresponding to x (calculated from the index), y (calculated from dy) and z (calculated from ZMIN and ZMAX) can always be created, although they might be named differently. So create them in the top-level apply_geometry() function, not the registered functions for particular geometries.
Previously were off by half a grid-cell.
Needed to pass checks in toFieldAligned().
For interpolation, where there is a physical boundary, want the limit of the coordinate (that is stored in the region) to be the global coordinate value at the boundary, not at the grid edge (which was what was stored previously).
Conflicts: xbout/boutdataset.py xbout/geometries.py xbout/tests/test_boutdataset.py
Issue with merging attrs has been fixed in xarray-0.16.0, so can remove workaround, as well as fixing problem with inconsistent regions with new default compat="no_conflicts" for xarray's combine_by_coords().
The result to be returned is updated_ds, checking ds meant always adding a new xcoord to updated_ds, even if it was already added by add_geometry_coords().
Ensure 'metadata', 'options', 'regions' and 'geometry' attributes are always added to all coordinates. Ensures consistency between original and saved-and-reloaded Datasets, allowing some workarounds in tests to be removed.
Merge conflicts fixed now. Also updated some handling of |
Adding attrs to the 'ycoord' coordinate in d062fa9 made interpolate_parallel() very slow. Don't understand why, but adding 'da = da.compute()' before the interpolation restores the speed.
That was a very strange issue. Tests failed on d062fa9 because they timed out. Turns out adding attrs to the |
xarray-0.16.0 is required now, older versions will fail the tests.
xarray requires less-than-6-months old dask, so 0.16.0 requires dask-2.10.
Codecov Report
@@ Coverage Diff @@
## master #108 +/- ##
==========================================
+ Coverage 70.83% 71.30% +0.46%
==========================================
Files 14 14
Lines 1519 1697 +178
Branches 306 359 +53
==========================================
+ Hits 1076 1210 +134
- Misses 353 382 +29
- Partials 90 105 +15
Continue to review full report at Codecov.
|
Removing attrs from y-coordinate means we do not need to call da.compute(), which would load the entire result into memory. It is better not to, as the result may be sliced or processed somehow later and we don't want to force loading in case the variable is too large to fit in memory.
This was intended to be moved from add_toroidal_geometry_coords() into apply_geometry(), but ended up being added back into add_toroidal_geometry_coords() in a merge.
The coordinates of a DataArray that has been interpolated will have attrs that are not consistent with the new DataArray. This commit updates _update_metadata_increased_resolution() to also replace the attrs of the DataArray's coords with the attrs of the new DataArray.
As @TomNicholas noticed when discussing the performance regression issue today, using @TomNicholas - I'd like to put this as an FYI issue on Lines 314 to 320 in 66b286a
should re-introduce the slow-down if anyone has time to take a look. |
Thanks for this report @johnomotani . This behaviour is definitely something I would like to understand, and ideally flag up with an issue on xarray. But for that I think it would need a reproducible example though. With this I can come back to it later at least. Do you want me to review this so you can merge it and move on? |
That would be great! 👍 |
Staggered grid cases are not implemented yet, would need to use zShift_CELL_XLOW or zShift_CELL_YLOW (which may or may not be present in the Dateset, depending on the PhysicsModel).
Not all dump files (especially older ones) have cell_location attrs written, so if none is present, assume it's OK to do toFieldAligned and fromFieldAligned with the cell-centre zShift since we cannot check.
The index-value coordinates are now added for dimensions without coordinates after the geometry is applied, so no 'x' coordinate has been created to drop.
43fe97b
to
585330f
Compare
cell_location attribute only makes sense for a DataArray not a whole Dataset, so remove in to_dataset() method.
Seems to be required at the moment to avoid an import error in the minimum-package-versions test.
Tests pass now! Any more review comments before I merge? |
Provides methods
BoutDataArray.getHighRes()
to get a version of a variable interpolated in the parallel direction to increase the poloidal resolution, orBoutDataSet.getHighResVars(['var1', 'var2', ...)
to get aDataset
with high-resolution versions of a list of variables. An example from TORPEX simulations - before interpolationand after
Also adds a feature to the tests - can use
@pytest.mark.long
to mark a test as long, in which case it is skipped by default. Long tests are run if--long
argument is passed to pytest, and the Travis tests do run the long tests.Includes #107, this PR will be only 1,384 additions and 363 deletions after that is merged.