Skip to content

Commit

Permalink
Feature/weighted (#2922)
Browse files Browse the repository at this point in the history
* weighted for DataArray

* remove some commented code

* pep8 and faulty import tests

* add weighted sum, replace 0s in sum_of_wgt

* weighted: overhaul tests

* weighted: pep8

* weighted: pep8 lines

* weighted update docs

* weighted: fix typo

* weighted: pep8

* undo changes to avoid merge conflict

* add weighted to dataarray again

* remove super

* overhaul core/weighted.py

* add DatasetWeighted class

* _maybe_get_all_dims return sorted tuple

* work on: test_weighted

* black and flake8

* Apply suggestions from code review (docs)

* restructure interim

* restructure classes

* update weighted.py

* black

* use map; add keep_attrs

* implement expected_weighted; update tests

* add whats new

* undo changes to whats-new

* F811: noqa where?

* api.rst

* add to computation

* small updates

* add example to gallery

* typo

* another typo

* correct docstring in core/common.py

* typos

* adjust review

* clean tests

* add test nonequal coords

* comment on use of dot

* fix erroneous merge

* update tests

* move example to notebook

* move whats-new entry to 15.1

* some doc updates

* dot to own function

* simplify some tests

* Doc updates

* very minor changes.

* fix & add references

* doc: return 0/NaN on 0 weights

* Update xarray/core/common.py

Co-authored-by: dcherian <deepak@cherian.net>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
  • Loading branch information
3 people authored Mar 19, 2020
1 parent 65a5bff commit df614b9
Show file tree
Hide file tree
Showing 10 changed files with 922 additions and 1 deletion.
18 changes: 18 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ Computation
Dataset.groupby_bins
Dataset.rolling
Dataset.rolling_exp
Dataset.weighted
Dataset.coarsen
Dataset.resample
Dataset.diff
Expand Down Expand Up @@ -340,6 +341,7 @@ Computation
DataArray.groupby_bins
DataArray.rolling
DataArray.rolling_exp
DataArray.weighted
DataArray.coarsen
DataArray.dt
DataArray.resample
Expand Down Expand Up @@ -577,6 +579,22 @@ Rolling objects
core.rolling.DatasetRolling.reduce
core.rolling_exp.RollingExp

Weighted objects
================

.. autosummary::
:toctree: generated/

core.weighted.DataArrayWeighted
core.weighted.DataArrayWeighted.mean
core.weighted.DataArrayWeighted.sum
core.weighted.DataArrayWeighted.sum_of_weights
core.weighted.DatasetWeighted
core.weighted.DatasetWeighted.mean
core.weighted.DatasetWeighted.sum
core.weighted.DatasetWeighted.sum_of_weights


Coarsen objects
===============

Expand Down
86 changes: 85 additions & 1 deletion doc/computation.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. currentmodule:: xarray

.. _comput:

###########
Expand Down Expand Up @@ -241,12 +243,94 @@ You can also use ``construct`` to compute a weighted rolling sum:
To avoid this, use ``skipna=False`` as the above example.


.. _comput.weighted:

Weighted array reductions
=========================

:py:class:`DataArray` and :py:class:`Dataset` objects include :py:meth:`DataArray.weighted`
and :py:meth:`Dataset.weighted` array reduction methods. They currently
support weighted ``sum`` and weighted ``mean``.

.. ipython:: python
coords = dict(month=('month', [1, 2, 3]))
prec = xr.DataArray([1.1, 1.0, 0.9], dims=('month', ), coords=coords)
weights = xr.DataArray([31, 28, 31], dims=('month', ), coords=coords)
Create a weighted object:

.. ipython:: python
weighted_prec = prec.weighted(weights)
weighted_prec
Calculate the weighted sum:

.. ipython:: python
weighted_prec.sum()
Calculate the weighted mean:

.. ipython:: python
weighted_prec.mean(dim="month")
The weighted sum corresponds to:

.. ipython:: python
weighted_sum = (prec * weights).sum()
weighted_sum
and the weighted mean to:

.. ipython:: python
weighted_mean = weighted_sum / weights.sum()
weighted_mean
However, the functions also take missing values in the data into account:

.. ipython:: python
data = xr.DataArray([np.NaN, 2, 4])
weights = xr.DataArray([8, 1, 1])
data.weighted(weights).mean()
Using ``(data * weights).sum() / weights.sum()`` would (incorrectly) result
in 0.6.


If the weights add up to to 0, ``sum`` returns 0:

.. ipython:: python
data = xr.DataArray([1.0, 1.0])
weights = xr.DataArray([-1.0, 1.0])
data.weighted(weights).sum()
and ``mean`` returns ``NaN``:

.. ipython:: python
data.weighted(weights).mean()
.. note::
``weights`` must be a :py:class:`DataArray` and cannot contain missing values.
Missing values can be replaced manually by ``weights.fillna(0)``.

.. _comput.coarsen:

Coarsen large arrays
====================

``DataArray`` and ``Dataset`` objects include a
:py:class:`DataArray` and :py:class:`Dataset` objects include a
:py:meth:`~xarray.DataArray.coarsen` and :py:meth:`~xarray.Dataset.coarsen`
methods. This supports the block aggregation along multiple dimensions,

Expand Down
1 change: 1 addition & 0 deletions doc/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Examples

examples/weather-data
examples/monthly-means
examples/area_weighted_temperature
examples/multidimensional-coords
examples/visualization_gallery
examples/ROMS_ocean_model
Expand Down
226 changes: 226 additions & 0 deletions doc/examples/area_weighted_temperature.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"toc": true
},
"source": [
"<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
"<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Compare-weighted-and-unweighted-mean-temperature\" data-toc-modified-id=\"Compare-weighted-and-unweighted-mean-temperature-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Compare weighted and unweighted mean temperature</a></span><ul class=\"toc-item\"><li><ul class=\"toc-item\"><li><span><a href=\"#Data\" data-toc-modified-id=\"Data-1.0.1\"><span class=\"toc-item-num\">1.0.1&nbsp;&nbsp;</span>Data</a></span></li><li><span><a href=\"#Creating-weights\" data-toc-modified-id=\"Creating-weights-1.0.2\"><span class=\"toc-item-num\">1.0.2&nbsp;&nbsp;</span>Creating weights</a></span></li><li><span><a href=\"#Weighted-mean\" data-toc-modified-id=\"Weighted-mean-1.0.3\"><span class=\"toc-item-num\">1.0.3&nbsp;&nbsp;</span>Weighted mean</a></span></li><li><span><a href=\"#Plot:-comparison-with-unweighted-mean\" data-toc-modified-id=\"Plot:-comparison-with-unweighted-mean-1.0.4\"><span class=\"toc-item-num\">1.0.4&nbsp;&nbsp;</span>Plot: comparison with unweighted mean</a></span></li></ul></li></ul></li></ul></div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Compare weighted and unweighted mean temperature\n",
"\n",
"\n",
"Author: [Mathias Hauser](https://github.com/mathause/)\n",
"\n",
"\n",
"We use the `air_temperature` example dataset to calculate the area-weighted temperature over its domain. This dataset has a regular latitude/ longitude grid, thus the gridcell area decreases towards the pole. For this grid we can use the cosine of the latitude as proxy for the grid cell area.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2020-03-17T14:43:57.222351Z",
"start_time": "2020-03-17T14:43:56.147541Z"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import cartopy.crs as ccrs\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"import xarray as xr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data\n",
"\n",
"Load the data, convert to celsius, and resample to daily values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2020-03-17T14:43:57.831734Z",
"start_time": "2020-03-17T14:43:57.651845Z"
}
},
"outputs": [],
"source": [
"ds = xr.tutorial.load_dataset(\"air_temperature\")\n",
"\n",
"# to celsius\n",
"air = ds.air - 273.15\n",
"\n",
"# resample from 6-hourly to daily values\n",
"air = air.resample(time=\"D\").mean()\n",
"\n",
"air"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot the first timestep:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2020-03-17T14:43:59.887120Z",
"start_time": "2020-03-17T14:43:59.582894Z"
}
},
"outputs": [],
"source": [
"projection = ccrs.LambertConformal(central_longitude=-95, central_latitude=45)\n",
"\n",
"f, ax = plt.subplots(subplot_kw=dict(projection=projection))\n",
"\n",
"air.isel(time=0).plot(transform=ccrs.PlateCarree(), cbar_kwargs=dict(shrink=0.7))\n",
"ax.coastlines()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating weights\n",
"\n",
"For a for a rectangular grid the cosine of the latitude is proportional to the grid cell area."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2020-03-17T14:44:18.777092Z",
"start_time": "2020-03-17T14:44:18.736587Z"
}
},
"outputs": [],
"source": [
"weights = np.cos(np.deg2rad(air.lat))\n",
"weights.name = \"weights\"\n",
"weights"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Weighted mean"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2020-03-17T14:44:52.607120Z",
"start_time": "2020-03-17T14:44:52.564674Z"
}
},
"outputs": [],
"source": [
"air_weighted = air.weighted(weights)\n",
"air_weighted"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2020-03-17T14:44:54.334279Z",
"start_time": "2020-03-17T14:44:54.280022Z"
}
},
"outputs": [],
"source": [
"weighted_mean = air_weighted.mean((\"lon\", \"lat\"))\n",
"weighted_mean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plot: comparison with unweighted mean\n",
"\n",
"Note how the weighted mean temperature is higher than the unweighted."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2020-03-17T14:45:08.877307Z",
"start_time": "2020-03-17T14:45:08.673383Z"
}
},
"outputs": [],
"source": [
"weighted_mean.plot(label=\"weighted\")\n",
"air.mean((\"lon\", \"lat\")).plot(label=\"unweighted\")\n",
"\n",
"plt.legend()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 4
}
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ Breaking changes
New Features
~~~~~~~~~~~~

- Weighted array reductions are now supported via the new :py:meth:`DataArray.weighted`
and :py:meth:`Dataset.weighted` methods. See :ref:`comput.weighted`. (:issue:`422`, :pull:`2922`).
By `Mathias Hauser <https://github.com/mathause>`_
- Added support for :py:class:`pandas.DatetimeIndex`-style rounding of
``cftime.datetime`` objects directly via a :py:class:`CFTimeIndex` or via the
:py:class:`~core.accessor_dt.DatetimeAccessor`.
Expand Down
Loading

0 comments on commit df614b9

Please sign in to comment.