Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize empty or full DataArray #3159

Merged
merged 31 commits into from
Aug 26, 2019
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
4f6311e
TST: add test for DataArray init with a single value
griverat Jul 24, 2019
1aa3ff3
ENH: add empty and full DataArray initialization
griverat Jul 24, 2019
db2cb28
Update whats-new
griverat Jul 24, 2019
4cecfe9
Remove ValueError test
griverat Jul 24, 2019
a538dab
Add function to verify and fill array according to coordinates
griverat Jul 24, 2019
4c45b7b
Use item in numpy array to compare with None
griverat Jul 24, 2019
b0f2d4e
ENH: add empty and full DataArray initialization
griverat Jul 24, 2019
2fa6e48
Add function to verify and fill array according to coordinates
griverat Jul 24, 2019
c481560
Handle coords being a list of tuples
griverat Aug 5, 2019
28b7336
Use .shape to identify scalar arrays
griverat Aug 5, 2019
59a632f
Better handling of dims
griverat Aug 5, 2019
8c165fd
Remove conditionals over shape value
griverat Aug 5, 2019
f337550
Ignore 0d arrays
griverat Aug 9, 2019
9a57dc5
Fill array with NaN when no data given
griverat Aug 9, 2019
70786c1
Add more tests
griverat Aug 9, 2019
3f95767
Merge commit 'f172c673' into add-init-val-darray
griverat Aug 9, 2019
8c97a46
black
griverat Aug 9, 2019
dec7622
black2
griverat Aug 9, 2019
8c5aaf3
Merge commit 'd089df38' into add-init-val-darray
griverat Aug 9, 2019
347ec33
Merge branch 'master' into add-init-val-darray
griverat Aug 9, 2019
e3127a3
Type check for ExplicitlyIndexed objects
griverat Aug 9, 2019
38858c6
Change parameter name
griverat Aug 9, 2019
4be0607
Remove Optional
griverat Aug 9, 2019
9bb3530
Merge branch 'master' into add-init-val-darray
griverat Aug 16, 2019
68ff54a
Remove abbreviation
griverat Aug 16, 2019
7407757
Use as_variable
griverat Aug 17, 2019
4e95cc3
Pass tuples explicitly to coords in test
griverat Aug 17, 2019
4df28db
Tests for 0d
griverat Aug 21, 2019
8ea1c47
Move ExplicitlyIndexed check into is_scalar
griverat Aug 26, 2019
95c88ab
Merge branch 'master' into add-init-val-darray
max-sixty Aug 26, 2019
2c0c634
Update utils.py
max-sixty Aug 26, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ Enhancements
- In :py:meth:`~xarray.Dataset.to_zarr`, passing ``mode`` is not mandatory if
``append_dim`` is set, as it will automatically be set to ``'a'`` internally.
By `David Brochart <https://github.com/davidbrochart>`_.
- Added the ability to initialize an empty or full DataArray
with a single value. (:issue:`277`)
By `Gerardo Rivera <http://github.com/dangomelon>`_.

Bug fixes
~~~~~~~~~
Expand Down
25 changes: 24 additions & 1 deletion xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,28 @@ def _infer_coords_and_dims(
return new_coords, dims


def _check_data_shape(data, coords, dims):
if data is dtypes.NA:
data = np.nan
if (
coords is not None
and utils.is_scalar(data, include_0d=False)
and not isinstance(data, indexing.ExplicitlyIndexed)
griverat marked this conversation as resolved.
Show resolved Hide resolved
):
if utils.is_dict_like(coords):
if dims is None:
return data
else:
data_shape = tuple(
as_variable(coords[k], k).size if k in coords.keys() else 1
for k in dims
)
else:
data_shape = tuple(as_variable(coord, "foo").size for coord in coords)
griverat marked this conversation as resolved.
Show resolved Hide resolved
data = np.full(data_shape, data)
return data


class _LocIndexer:
def __init__(self, data_array: "DataArray"):
self.data_array = data_array
Expand Down Expand Up @@ -234,7 +256,7 @@ class DataArray(AbstractArray, DataWithCoords):

def __init__(
self,
data: Any,
data: Any = dtypes.NA,
griverat marked this conversation as resolved.
Show resolved Hide resolved
coords: Union[Sequence[Tuple], Mapping[Hashable, Any], None] = None,
dims: Union[Hashable, Sequence[Hashable], None] = None,
name: Hashable = None,
Expand Down Expand Up @@ -323,6 +345,7 @@ def __init__(
if encoding is None:
encoding = getattr(data, "encoding", None)

data = _check_data_shape(data, coords, dims)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should move this logic above as_compatible_data, which would let us distinguish between scalar values like float/int (which don't have an inherent shape) vs 0-dimensional NumPy arrays (which do have an array shape already).

For example:

  • xarray.DataArray(0.5, coords=[('x', np.arange(3)), ('y', ['a', 'b'])]) -> duplicate the scalar to make an array of shape (3, 2)
  • xarray.DataArray(np.array(1.0), coords=[('x', np.arange(3)), ('y', ['a', 'b'])]) -> error, shapes do not match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the second example you provided shouldn't work since np.array(1.0) is a 0-dimensional NumPy array with shape () and DataArray expects it to have a (3, 2) shape, right?. The current behavior is set to duplicate the value as if it were xarray.DataArray(1.0, coords=[('x', np.arange(3)), ('y', ['a', 'b'])]), which I thought was the desired feature. I am currently pushing a commit that makes this work since I didn't consider the case of coords being a list of tuples (although all test passed).

Regarding the _check_data_shape position, I placed it after as_compatible_data since the latter returns an ndarray containing the value passed to it, scalar or None, on which I can check the shape.

data = as_compatible_data(data)
coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
variable = Variable(dims, data, attrs, encoding, fastpath=True)
Expand Down
6 changes: 4 additions & 2 deletions xarray/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,13 +272,15 @@ def either_dict_or_kwargs(
return cast(Mapping[Hashable, T], kw_kwargs)


def is_scalar(value: Any) -> bool:
def is_scalar(value: Any, include_0d: bool = True) -> bool:
"""Whether to treat a value as a scalar.

Any non-iterable, string, or 0-D array
"""
if include_0d:
include_0d = getattr(value, "ndim", None) == 0
return (
getattr(value, "ndim", None) == 0
include_0d
or isinstance(value, (str, bytes))
or not (
isinstance(value, (Iterable,) + dask_array_type)
Expand Down
21 changes: 21 additions & 0 deletions xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -1506,6 +1506,27 @@ def test_rename(self):
renamed_kwargs = self.dv.x.rename(x="z").rename("z")
assert_identical(renamed, renamed_kwargs)

def test_init_value(self):
griverat marked this conversation as resolved.
Show resolved Hide resolved
expected = DataArray(
np.full((3, 4), 3), dims=["x", "y"], coords=[range(3), range(4)]
)
actual = DataArray(3, dims=["x", "y"], coords=[range(3), range(4)])
assert_identical(expected, actual)

expected = DataArray(
np.full((1, 10, 2), 0),
dims=["w", "x", "y"],
coords={"x": np.arange(10), "y": ["north", "south"]},
)
actual = DataArray(0, dims=expected.dims, coords=expected.coords)
assert_identical(expected, actual)

expected = DataArray(
np.full((10, 2), np.nan), coords=[("x", np.arange(10)), ("y", ["a", "b"])]
)
actual = DataArray(coords=[("x", np.arange(10)), ("y", ["a", "b"])])
assert_identical(expected, actual)

def test_swap_dims(self):
array = DataArray(np.random.randn(3), {"y": ("x", list("abc"))}, "x")
expected = DataArray(array.values, {"y": list("abc")}, dims="y")
Expand Down