Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel interpolation #108

Merged
merged 101 commits into from
Aug 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
1277532
Chunk grid file after removing upper boundary points
johnomotani Mar 15, 2020
bd0e88d
Methods for parallel interpolation of variables
johnomotani Mar 15, 2020
f21d3d6
Caching argument for getHighParallelResRegion
johnomotani Mar 16, 2020
f600aaa
Cache high-res variables in Region, not in Dataset
johnomotani Mar 16, 2020
f84d919
Move highParallelResRegion to BoutDataArray from BoutDataset
johnomotani Mar 16, 2020
ae50094
When interpolating, extrapolate at the boundaries if necessary
johnomotani Mar 16, 2020
51e04d2
Remove uses of dx and dy in highParallelResRegion
johnomotani Mar 16, 2020
2e63880
Method to create high resolutions of a variable for all regions
johnomotani Mar 16, 2020
9213bff
Remove caching of high-resolution variables
johnomotani Mar 16, 2020
845e2eb
Update definition of y_fine used for parallel interpolation
johnomotani Mar 16, 2020
5ab0ecf
Update metadata of high-resolution variable
johnomotani Mar 16, 2020
a74f325
Rename setupParallelInterp to resetParallelInterpFactor
johnomotani Mar 17, 2020
8ff5036
BoutDataArray.highParallelRes returns a Dataset
johnomotani Mar 17, 2020
2bb80f3
Update default to n=8 in highParallelResRegion
johnomotani Mar 17, 2020
0f1c64a
Remove regions attr when creating high-res interpolated variable
johnomotani Mar 17, 2020
67a2939
Workaround for combine_by_coords not keeping attrs
johnomotani Mar 17, 2020
9f3a3d9
Allow geometries to be re-applied to a Dataset
johnomotani Mar 17, 2020
15dfe81
Fix applying s-alpha geometry
johnomotani Mar 18, 2020
4b4999a
Add dy to interpolated DataArray as a coordinate
johnomotani Mar 17, 2020
72865d0
Create new Dataset of high-parallel resolution variables
johnomotani Mar 17, 2020
ac7e29a
Only select toroidal_points if variable has z-dimension
johnomotani Mar 17, 2020
dbd2862
Test for _update_metadata_increased_resolution()
johnomotani Mar 20, 2020
06fdb91
Test for BoutDataSet.resetParallelInterpFactor()
johnomotani Mar 20, 2020
373c43b
import numpy.testing as npt instead of importing assert_allclose
johnomotani Mar 20, 2020
8d10dde
Tests for BoutDataArray.highParallelResRegion()
johnomotani Mar 20, 2020
5ac595f
Drop attrs that do not belong in Dataset in BoutDataArray.to_dataset()
johnomotani Mar 20, 2020
d8f8d54
Define global 1d coordinates in apply_geometry()
johnomotani Mar 20, 2020
ba2b5b4
Fix the coordinate calculations in Region.__init__()
johnomotani Mar 20, 2020
c804396
Add 'direction_y' attrs to test data
johnomotani Mar 20, 2020
05138df
Fix region coordinate limits at boundaries
johnomotani Mar 20, 2020
a180248
Set up connections in Region constructor
johnomotani Mar 20, 2020
c155d70
Don't drop 'x' when creating 'r' for s-alpha geometry
johnomotani Mar 20, 2020
9fdec41
Include 'dy' in all inputs for tests
johnomotani Mar 20, 2020
e66e441
Test for BoutDataArray.highParallelResRegion() in single-null topology
johnomotani Mar 21, 2020
e736fe3
Test for highParallelResRegion with different enhancement factors
johnomotani Mar 21, 2020
05fd4cd
Test for BoutDataArray.highParallelRes()
johnomotani Mar 21, 2020
456d210
Test for BoutDataset.getHghParallelResVars()
johnomotani Mar 21, 2020
fda57ea
Remove region attribute from result in BoutDataArray.highParallelRes()
johnomotani Mar 21, 2020
c69f81e
Fix merge of Datasets in BoutDataset.getHighParallelResVars()
johnomotani Mar 21, 2020
8487f3f
Reduce grid sizes to speed up parallel interpolation tests
johnomotani Mar 21, 2020
73fe457
Test toroidal_points argument for highParallelRes, highParallelResRegion
johnomotani Mar 21, 2020
f772530
Fix for when toroidal_points does not divide nz exactly
johnomotani Mar 21, 2020
ccf418e
Add pytest.mark.long to disable long tests by default
johnomotani Mar 21, 2020
6ff08f8
Return DataArray by default from BoutDataArray.highParallelRes()
johnomotani Mar 21, 2020
bf8a039
Merge branch 'rfft' into parallel-interpolation
johnomotani Mar 21, 2020
236402c
Mark most permutations of to/fromFieldAligned tests as 'long'
johnomotani Mar 21, 2020
6781890
Merge branch 'rfft' into parallel-interpolation
johnomotani Mar 21, 2020
dc2f982
PEP8 fixes
johnomotani Mar 21, 2020
8db119d
Move Region creation out into a dict of functions
johnomotani Mar 23, 2020
ffb5bd5
Just use dict, not OrderedDict, in region.py
johnomotani Mar 23, 2020
8830487
Use more f-strings
johnomotani Mar 23, 2020
2d04255
Merge branch 'regions' into parallel-interpolation
johnomotani Mar 23, 2020
4aab6fe
Fix merge of 'regions' branch
johnomotani Mar 23, 2020
bd7a824
Rename highParallelResRegion and highParallelRes to interpolate_parallel
johnomotani Mar 23, 2020
f9cdbda
Rename resetParallelInterpFactor to set_parallel_interpolation_factor
johnomotani Mar 23, 2020
c5686c9
Better test for whether ds has a 'geometry' attribute
johnomotani Mar 23, 2020
f3a652a
Rename getHighParallelResVars() to interpolate_parallel()
johnomotani Mar 23, 2020
d7bbe42
Merge branch 'regions' into parallel-interpolation
johnomotani Mar 23, 2020
71f0304
Remove accidentally-committed print statements
johnomotani Mar 23, 2020
0b65e1b
More f-strings
johnomotani Mar 23, 2020
55b7fb9
Fix merge of test_toFieldAligned from regions
johnomotani Mar 24, 2020
8c2ff03
Add Ellipsis option for 'variables' arg BoutDataset.interpolate_parallel
johnomotani Mar 24, 2020
d06e38f
Extra test for interpolate_parallel
johnomotani Mar 24, 2020
50af753
Fix update of jyseps to higher resolution
johnomotani Mar 24, 2020
df25fc0
Add t_array as the t-coordinate in apply_geometry()
johnomotani Mar 24, 2020
ac41588
PEP8 fixes
johnomotani Mar 24, 2020
8e989b5
Merge branch 'regions' into parallel-interpolation
johnomotani Mar 25, 2020
6a7a748
Update to incorporate variable and argument name changes from regions
johnomotani Mar 24, 2020
fafe7ae
Fix merge of regions
johnomotani Mar 26, 2020
1d3e5c7
Refactor fine_interpolation_factor as a @property
johnomotani Apr 4, 2020
c81a143
Small tidy-up
johnomotani Apr 4, 2020
e8ac8c6
Still need to use .values() to get DataArrays instead of names
johnomotani Apr 4, 2020
37d1220
Fix use of fine_interpolation_factor in tests
johnomotani Apr 9, 2020
acad60d
Always check if dy is in variables in BoutDataset.interpolate_parallel
johnomotani Apr 20, 2020
a316fb2
Simplify BoutDataset.interpolate_parallel()
johnomotani Apr 20, 2020
c74b941
Convert tuple to list if passed to BoutDataset.interpolate_parallel()
johnomotani Apr 20, 2020
2e49738
Check dimensions of variables to interpolate
johnomotani Apr 20, 2020
5f300ea
Add test of ... arg: TestBoutDatasetMethods.test_interpolate_parallel
johnomotani Apr 20, 2020
33735d3
PEP8 fixes
johnomotani Apr 20, 2020
c95f194
Assign DataArray to Dataset member instead of using merge()
johnomotani Apr 21, 2020
2acb61d
Avoid errors if regions were not created
johnomotani Apr 21, 2020
1e4d47a
Better support for generic x and y in poloidal plots
johnomotani Apr 21, 2020
10fda87
Merge branch 'master' into parallel-interpolation
johnomotani Apr 23, 2020
46e30d2
Merge branch 'master' into parallel-interpolation
johnomotani Jul 29, 2020
b642545
Require xarray-0.16.0, fix merging attrs in interpolate_parallel()
johnomotani Jul 29, 2020
f313bc1
Check for xcoord in updated_ds not ds
johnomotani Jul 29, 2020
d062fa9
More consistent attrs on coordinates
johnomotani Jul 29, 2020
56a597c
Fix performance regression in BoutDataArray.interpolate_parallel()
johnomotani Jul 30, 2020
ed27def
Update Travis config for minimum versions to xarray-0.16.0
johnomotani Jul 30, 2020
dfab775
Update minimum dask to minimum version supported by xarray-0.16.0
johnomotani Jul 30, 2020
ccf1017
Better fix, BoutDataArray.interpolate_parallel() performance regression
johnomotani Jul 30, 2020
ff1bc30
Clean up merge - remove duplicated code adding 1d coordinates
johnomotani Jul 30, 2020
66b286a
Update coord attrs in _update_metadata_increased_resolution()
johnomotani Jul 30, 2020
629aacf
Add checks for cell_location in toFieldAligned and fromFieldAligned
johnomotani Aug 1, 2020
66e6db9
Add staggered zShift variables as coordinates if they exist
johnomotani Aug 1, 2020
9494f2e
set_coords() raises ValueError not KeyError when variable not present
johnomotani Aug 1, 2020
324179a
Allow toFieldAligned and fromFieldAligned when no cell_location attr
johnomotani Aug 16, 2020
af22f10
Add cell_location attr to variables in unit test Datasets
johnomotani Aug 16, 2020
585330f
Remove drop('x') from add_s_alpha_geometry_coords()
johnomotani Aug 16, 2020
31debc0
Drop cell_location attr from Dataset in BoutDataArray.to_dataset()
johnomotani Aug 17, 2020
8e3e14a
Add fsspec to PIP_PACKAGES in minimum versions Travis test
johnomotani Aug 17, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ python:
- "3.7"
env:
- PIP_PACKAGES="setuptools pip pytest pytest-cov coverage codecov boutdata xarray!=0.14.0 numpy>=1.16.0"
- PIP_PACKAGES="setuptools pip pytest pytest-cov coverage codecov boutdata xarray==0.13.0 dask==1.0.0 numpy==1.16.0 natsort==5.5.0 matplotlib==3.1.1 animatplot==0.4.1 netcdf4==1.4.2 Pillow==6.1.0" # test with oldest supported version of packages. Note, using numpy==1.16.0 as a workaround for some weird fails on Travis, in principle we should work with numpy>=1.13.3.
- PIP_PACKAGES="setuptools pip pytest pytest-cov coverage codecov boutdata xarray==0.16.0 dask==2.10.0 numpy==1.16.0 natsort==5.5.0 matplotlib==3.1.1 animatplot==0.4.1 netcdf4==1.4.2 Pillow==6.1.0 fsspec" # test with oldest supported version of packages. Note, using numpy==1.16.0 as a workaround for some weird fails on Travis, in principle we should work with numpy>=1.13.3. We should not need to install fsspec explicitly, but at the moment are getting import errors in the tests due to fsspec not being present - should remove in future, probably when dask version is increased.
install:
- pip install --upgrade ${PIP_PACKAGES}
- pip install -r requirements.txt
- pip install -e .
script:
- pytest -v --cov
- pytest -v --long --cov
after_success:
- codecov
17 changes: 17 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import pytest


# Add command line option '--long' for pytest, to be used to enable long tests
def pytest_addoption(parser):
parser.addoption("--long", action="store_true", default=False,
help="enable tests marked as 'long'")


def pytest_collection_modifyitems(config, items):
if not config.getoption("--long"):
# --long not given in cli: skip long tests
print("\n skipping long tests, pass '--long' to enable")
skip_long = pytest.mark.skip(reason="need --long option to run")
for item in items:
if "long" in item.keywords:
item.add_marker(skip_long)
3 changes: 3 additions & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@
filterwarnings =
ignore:No geometry type found, no coordinates will be added:UserWarning
ignore:deallocating CachingFileManager.*, but file is not already closed. This may indicate a bug\.:RuntimeWarning

markers =
long: long test, or one of many permutations (disabled by default)
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
xarray >= 0.13.0
dask[array] >= 1.0.0
xarray >= 0.16.0
dask[array] >= 2.10.0
natsort >= 5.5.0
matplotlib >= 3.1.1
animatplot >= 0.4.1
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@
license="Apache",
python_requires='>=3.6',
install_requires=[
'xarray>=v0.13.0',
'dask[array]>=1.0.0',
'xarray>=0.16.0',
'dask[array]>=2.10.0',
'natsort>=5.5.0',
'matplotlib>=3.1.1',
'animatplot>=0.4.1',
Expand Down
198 changes: 196 additions & 2 deletions xbout/boutdataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from .plotting.utils import _create_norm
from .region import (Region, _concat_inner_guards, _concat_outer_guards,
_concat_lower_guards, _concat_upper_guards)
from .utils import _update_metadata_increased_resolution


@register_dataarray_accessor('bout')
Expand Down Expand Up @@ -48,12 +49,20 @@ def __str__(self):
def to_dataset(self):
"""
Convert a DataArray to a Dataset, copying the attributes from the DataArray to
the Dataset.
the Dataset, and dropping attributes that only make sense for a DataArray
"""
da = self.data
ds = da.to_dataset()

ds.attrs = da.attrs
ds.attrs = deepcopy(da.attrs)

def dropIfExists(ds, name):
if name in ds.attrs:
del ds.attrs[name]

dropIfExists(ds, 'direction_y')
johnomotani marked this conversation as resolved.
Show resolved Hide resolved
dropIfExists(ds, 'direction_z')
dropIfExists(ds, 'cell_location')

return ds

Expand Down Expand Up @@ -117,6 +126,14 @@ def toFieldAligned(self):
if self.data.direction_y != "Standard":
raise ValueError("Cannot shift a " + self.direction_y + " type field to "
+ "field-aligned coordinates")
if hasattr(self.data, "cell_location") and not (
self.data.cell_location == "CELL_CENTRE"
or self.data.cell_location == "CELL_ZLOW"
):
raise ValueError(
f"toFieldAligned does not support staggered grids yet, but "
f"location is {self.data.cell_location}."
)
result = self._shiftZ(self.data['zShift'])
result["direction_y"] = "Aligned"
return result
Expand All @@ -129,10 +146,28 @@ def fromFieldAligned(self):
if self.data.direction_y != "Aligned":
raise ValueError("Cannot shift a " + self.direction_y + " type field to "
+ "field-aligned coordinates")
if hasattr(self.data, "cell_location") and not (
self.data.cell_location == "CELL_CENTRE"
or self.data.cell_location == "CELL_ZLOW"
):
raise ValueError(
f"fromFieldAligned does not support staggered grids yet, but "
f"location is {self.data.cell_location}."
)
result = self._shiftZ(-self.data['zShift'])
result["direction_y"] = "Standard"
return result

@property
def regions(self):
if "regions" not in self.data.attrs:
raise ValueError(
"Called a method requiring regions, but these have not been created. "
"Please set the 'geometry' option when calling open_boutdataset() to "
"create regions."
)
return self.data.attrs["regions"]

def from_region(self, name, with_guards=None):
"""
Get a logically-rectangular section of data from a certain region.
Expand Down Expand Up @@ -180,6 +215,165 @@ def from_region(self, name, with_guards=None):

return da

@property
def fine_interpolation_factor(self):
"""
The default factor to increase resolution when doing parallel interpolation
"""
return self.data.metadata['fine_interpolation_factor']

@fine_interpolation_factor.setter
def fine_interpolation_factor(self, n):
"""
Set the default factor to increase resolution when doing parallel interpolation.

Parameters
-----------
n : int
Factor to increase parallel resolution by
"""
self.data.metadata['fine_interpolation_factor'] = n

def interpolate_parallel(self, region=None, *, n=None, toroidal_points=None,
method='cubic', return_dataset=False):
"""
Interpolate in the parallel direction to get a higher resolution version of the
variable.

Parameters
----------
region : str, optional
By default, return a result with all regions interpolated separately and then
combined. If an explicit region argument is passed, then return the variable
from only that region.
n : int, optional
The factor to increase the resolution by. Defaults to the value set by
BoutDataset.setupParallelInterp(), or 10 if that has not been called.
toroidal_points : int or sequence of int, optional
If int, number of toroidal points to output, applies a stride to toroidal
direction to save memory usage. If sequence of int, the indexes of toroidal
points for the output.
method : str, optional
The interpolation method to use. Options from xarray.DataArray.interp(),
currently: linear, nearest, zero, slinear, quadratic, cubic. Default is
'cubic'.
return_dataset : bool, optional
If this is set to True, return a Dataset containing this variable as a member
(by default returns a DataArray). Only used when region=None.

Returns
-------
A new DataArray containing a high-resolution version of the variable. (If
return_dataset=True, instead returns a Dataset containing the DataArray.)
"""

if region is None:
# Call the single-region version of this method for each region, and combine
# the results together
parts = [
self.interpolate_parallel(region, n=n, toroidal_points=toroidal_points,
method=method).bout.to_dataset()
for region in self.data.regions]

# 'region' is not the same for all parts, and should not exist in the result,
# so delete before merging
for part in parts:
if 'region' in part.attrs:
del part.attrs['region']
if 'region' in part[self.data.name].attrs:
del part[self.data.name].attrs['region']

result = xr.combine_by_coords(parts)

if return_dataset:
return result
else:
# Extract the DataArray to return
return result[self.data.name]

# Select a particular 'region' and interpolate to higher parallel resolution
da = self.data
region = da.regions[region]
tcoord = da.metadata['bout_tdim']
xcoord = da.metadata['bout_xdim']
ycoord = da.metadata['bout_ydim']
zcoord = da.metadata['bout_zdim']

if zcoord in da.dims and da.direction_y != 'Aligned':
aligned_input = False
da = da.bout.toFieldAligned()
else:
aligned_input = True

if n is None:
n = self.fine_interpolation_factor

da = da.bout.from_region(region.name, with_guards={xcoord: 0, ycoord: 2})
da = da.chunk({ycoord: None})

ny_fine = n*region.ny
dy = (region.yupper - region.ylower)/ny_fine

myg = da.metadata['MYG']
if da.metadata['keep_yboundaries'] and region.connection_lower_y is None:
ybndry_lower = myg
else:
ybndry_lower = 0
if da.metadata['keep_yboundaries'] and region.connection_upper_y is None:
ybndry_upper = myg
else:
ybndry_upper = 0

y_fine = np.linspace(region.ylower - (ybndry_lower - 0.5)*dy,
region.yupper + (ybndry_upper - 0.5)*dy,
ny_fine + ybndry_lower + ybndry_upper)

# This prevents da.interp() from being very slow.
# Apparently large attrs (i.e. regions) on a coordinate which is passed as an
# argument to dask.array.map_blocks() slow things down, maybe because coordinates
# are numpy arrays, not dask arrays?
# Slow-down was introduced in d062fa9e75c02fbfdd46e5d1104b9b12f034448f when
# _add_attrs_to_var(updated_ds, ycoord) was added in geometries.py
da[ycoord].attrs = {}

da = da.interp({ycoord: y_fine.data}, assume_sorted=True, method=method,
kwargs={'fill_value': 'extrapolate'})

da = _update_metadata_increased_resolution(da, n)

# Add dy to da as a coordinate. This will only be temporary, once we have
# combined the regions together, we will demote dy to a regular variable
dy_array = xr.DataArray(np.full([da.sizes[xcoord], da.sizes[ycoord]], dy),
dims=[xcoord, ycoord])
# need a view of da with only x- and y-dimensions, unfortunately no neat way to
# do this with isel
da_2d = da
if tcoord in da.sizes:
da_2d = da_2d.isel(**{tcoord: 0}, drop=True)
if zcoord in da.sizes:
da_2d = da_2d.isel(**{zcoord: 0}, drop=True)
dy_array = da_2d.copy(data=dy_array)
da = da.assign_coords(dy=dy_array)

# Remove regions which have incorrect information for the high-resolution grid.
# New regions will be generated when creating a new Dataset in
# BoutDataset.getHighParallelResVars
del da.attrs['regions']

if not aligned_input:
# Want output in non-aligned coordinates
da = da.bout.fromFieldAligned()

if toroidal_points is not None and zcoord in da.sizes:
if isinstance(toroidal_points, int):
nz = len(da[zcoord])
zstride = (nz + toroidal_points - 1)//toroidal_points
da = da.isel(**{zcoord: slice(None, None, zstride)})
else:
da = da.isel(**{zcoord: toroidal_points})

return da

def animate2D(self, animate_over='t', x=None, y=None, animate=True, fps=10,
save_as=None, ax=None, poloidal_plot=False, logscale=None, **kwargs):
"""
Expand Down
Loading