Feature/weighted (#2922)

* weighted for DataArray * remove some commented code * pep8 and faulty import tests * add weighted sum, replace 0s in sum_of_wgt * weighted: overhaul tests * weighted: pep8 * weighted: pep8 lines * weighted update docs * weighted: fix typo * weighted: pep8 * undo changes to avoid merge conflict * add weighted to dataarray again * remove super * overhaul core/weighted.py * add DatasetWeighted class * _maybe_get_all_dims return sorted tuple * work on: test_weighted * black and flake8 * Apply suggestions from code review (docs) * restructure interim * restructure classes * update weighted.py * black * use map; add keep_attrs * implement expected_weighted; update tests * add whats new * undo changes to whats-new * F811: noqa where? * api.rst * add to computation * small updates * add example to gallery * typo * another typo * correct docstring in core/common.py * typos * adjust review * clean tests * add test nonequal coords * comment on use of dot * fix erroneous merge * update tests * move example to notebook * move whats-new entry to 15.1 * some doc updates * dot to own function * simplify some tests * Doc updates * very minor changes. * fix & add references * doc: return 0/NaN on 0 weights * Update xarray/core/common.py Co-authored-by: dcherian <deepak@cherian.net> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
pydata · Mar 19, 2020 · df614b9 · df614b9
1 parent 65a5bff
commit df614b9
Show file tree

Hide file tree

Showing 10 changed files with 922 additions and 1 deletion.
diff --git a/doc/api.rst b/doc/api.rst
@@ -165,6 +165,7 @@ Computation
    Dataset.groupby_bins
    Dataset.rolling
    Dataset.rolling_exp
+   Dataset.weighted
    Dataset.coarsen
    Dataset.resample
    Dataset.diff
@@ -340,6 +341,7 @@ Computation
    DataArray.groupby_bins
    DataArray.rolling
    DataArray.rolling_exp
+   DataArray.weighted
    DataArray.coarsen
    DataArray.dt
    DataArray.resample
@@ -577,6 +579,22 @@ Rolling objects
    core.rolling.DatasetRolling.reduce
    core.rolling_exp.RollingExp
 
+Weighted objects
+================
+
+.. autosummary::
+   :toctree: generated/
+
+   core.weighted.DataArrayWeighted
+   core.weighted.DataArrayWeighted.mean
+   core.weighted.DataArrayWeighted.sum
+   core.weighted.DataArrayWeighted.sum_of_weights
+   core.weighted.DatasetWeighted
+   core.weighted.DatasetWeighted.mean
+   core.weighted.DatasetWeighted.sum
+   core.weighted.DatasetWeighted.sum_of_weights
+
+
 Coarsen objects
 ===============
 

diff --git a/doc/computation.rst b/doc/computation.rst
@@ -1,3 +1,5 @@
+.. currentmodule:: xarray
+
 .. _comput:
 
 ###########
@@ -241,12 +243,94 @@ You can also use ``construct`` to compute a weighted rolling sum:
   To avoid this, use ``skipna=False`` as the above example.
 
 
+.. _comput.weighted:
+
+Weighted array reductions
+=========================
+
+:py:class:`DataArray` and :py:class:`Dataset` objects include :py:meth:`DataArray.weighted`
+and :py:meth:`Dataset.weighted` array reduction methods. They currently
+support weighted ``sum`` and weighted ``mean``.
+
+.. ipython:: python
+
+  coords = dict(month=('month', [1, 2, 3]))
+
+  prec = xr.DataArray([1.1, 1.0, 0.9], dims=('month', ), coords=coords)
+  weights = xr.DataArray([31, 28, 31], dims=('month', ), coords=coords)
+
+Create a weighted object:
+
+.. ipython:: python
+
+  weighted_prec = prec.weighted(weights)
+  weighted_prec
+
+Calculate the weighted sum:
+
+.. ipython:: python
+
+  weighted_prec.sum()
+
+Calculate the weighted mean:
+
+.. ipython:: python
+
+        weighted_prec.mean(dim="month")
+
+The weighted sum corresponds to:
+
+.. ipython:: python
+
+  weighted_sum = (prec * weights).sum()
+  weighted_sum
+
+and the weighted mean to:
+
+.. ipython:: python
+
+  weighted_mean = weighted_sum / weights.sum()
+  weighted_mean
+
+However, the functions also take missing values in the data into account:
+
+.. ipython:: python
+
+  data = xr.DataArray([np.NaN, 2, 4])
+  weights = xr.DataArray([8, 1, 1])
+
+  data.weighted(weights).mean()
+
+Using ``(data * weights).sum() / weights.sum()`` would (incorrectly) result
+in 0.6.
+
+
+If the weights add up to to 0, ``sum`` returns 0:
+
+.. ipython:: python
+
+  data = xr.DataArray([1.0, 1.0])
+  weights = xr.DataArray([-1.0, 1.0])
+
+  data.weighted(weights).sum()
+
+and ``mean`` returns ``NaN``:
+
+.. ipython:: python
+
+  data.weighted(weights).mean()
+
+
+.. note::
+  ``weights`` must be a :py:class:`DataArray` and cannot contain missing values.
+  Missing values can be replaced manually by ``weights.fillna(0)``.
+
 .. _comput.coarsen:
 
 Coarsen large arrays
 ====================
 
-``DataArray`` and ``Dataset`` objects include a
+:py:class:`DataArray` and :py:class:`Dataset` objects include a
 :py:meth:`~xarray.DataArray.coarsen` and :py:meth:`~xarray.Dataset.coarsen`
 methods. This supports the block aggregation along multiple dimensions,
 

diff --git a/doc/examples.rst b/doc/examples.rst
@@ -6,6 +6,7 @@ Examples
 
     examples/weather-data
     examples/monthly-means
+    examples/area_weighted_temperature
     examples/multidimensional-coords
     examples/visualization_gallery
     examples/ROMS_ocean_model

diff --git a/doc/examples/area_weighted_temperature.ipynb b/doc/examples/area_weighted_temperature.ipynb
@@ -0,0 +1,226 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "toc": true
+   },
+   "source": [
+    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
+    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Compare-weighted-and-unweighted-mean-temperature\" data-toc-modified-id=\"Compare-weighted-and-unweighted-mean-temperature-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Compare weighted and unweighted mean temperature</a></span><ul class=\"toc-item\"><li><ul class=\"toc-item\"><li><span><a href=\"#Data\" data-toc-modified-id=\"Data-1.0.1\"><span class=\"toc-item-num\">1.0.1&nbsp;&nbsp;</span>Data</a></span></li><li><span><a href=\"#Creating-weights\" data-toc-modified-id=\"Creating-weights-1.0.2\"><span class=\"toc-item-num\">1.0.2&nbsp;&nbsp;</span>Creating weights</a></span></li><li><span><a href=\"#Weighted-mean\" data-toc-modified-id=\"Weighted-mean-1.0.3\"><span class=\"toc-item-num\">1.0.3&nbsp;&nbsp;</span>Weighted mean</a></span></li><li><span><a href=\"#Plot:-comparison-with-unweighted-mean\" data-toc-modified-id=\"Plot:-comparison-with-unweighted-mean-1.0.4\"><span class=\"toc-item-num\">1.0.4&nbsp;&nbsp;</span>Plot: comparison with unweighted mean</a></span></li></ul></li></ul></li></ul></div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Compare weighted and unweighted mean temperature\n",
+    "\n",
+    "\n",
+    "Author: [Mathias Hauser](https://github.com/mathause/)\n",
+    "\n",
+    "\n",
+    "We use the `air_temperature` example dataset to calculate the area-weighted temperature over its domain. This dataset has a regular latitude/ longitude grid, thus the gridcell area decreases towards the pole. For this grid we can use the cosine of the latitude as proxy for the grid cell area.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-03-17T14:43:57.222351Z",
+     "start_time": "2020-03-17T14:43:56.147541Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import cartopy.crs as ccrs\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "\n",
+    "import xarray as xr"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data\n",
+    "\n",
+    "Load the data, convert to celsius, and resample to daily values"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-03-17T14:43:57.831734Z",
+     "start_time": "2020-03-17T14:43:57.651845Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "ds = xr.tutorial.load_dataset(\"air_temperature\")\n",
+    "\n",
+    "# to celsius\n",
+    "air = ds.air - 273.15\n",
+    "\n",
+    "# resample from 6-hourly to daily values\n",
+    "air = air.resample(time=\"D\").mean()\n",
+    "\n",
+    "air"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Plot the first timestep:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-03-17T14:43:59.887120Z",
+     "start_time": "2020-03-17T14:43:59.582894Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "projection = ccrs.LambertConformal(central_longitude=-95, central_latitude=45)\n",
+    "\n",
+    "f, ax = plt.subplots(subplot_kw=dict(projection=projection))\n",
+    "\n",
+    "air.isel(time=0).plot(transform=ccrs.PlateCarree(), cbar_kwargs=dict(shrink=0.7))\n",
+    "ax.coastlines()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Creating weights\n",
+    "\n",
+    "For a for a rectangular grid the cosine of the latitude is proportional to the grid cell area."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-03-17T14:44:18.777092Z",
+     "start_time": "2020-03-17T14:44:18.736587Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "weights = np.cos(np.deg2rad(air.lat))\n",
+    "weights.name = \"weights\"\n",
+    "weights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Weighted mean"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-03-17T14:44:52.607120Z",
+     "start_time": "2020-03-17T14:44:52.564674Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "air_weighted = air.weighted(weights)\n",
+    "air_weighted"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-03-17T14:44:54.334279Z",
+     "start_time": "2020-03-17T14:44:54.280022Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "weighted_mean = air_weighted.mean((\"lon\", \"lat\"))\n",
+    "weighted_mean"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Plot: comparison with unweighted mean\n",
+    "\n",
+    "Note how the weighted mean temperature is higher than the unweighted."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-03-17T14:45:08.877307Z",
+     "start_time": "2020-03-17T14:45:08.673383Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "weighted_mean.plot(label=\"weighted\")\n",
+    "air.mean((\"lon\", \"lat\")).plot(label=\"unweighted\")\n",
+    "\n",
+    "plt.legend()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6"
+  },
+  "toc": {
+   "base_numbering": 1,
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": true,
+   "toc_position": {},
+   "toc_section_display": true,
+   "toc_window_display": true
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -25,6 +25,9 @@ Breaking changes
 New Features
 ~~~~~~~~~~~~
 
+- Weighted array reductions are now supported via the new :py:meth:`DataArray.weighted`
+  and :py:meth:`Dataset.weighted` methods. See :ref:`comput.weighted`. (:issue:`422`, :pull:`2922`).
+  By `Mathias Hauser <https://github.com/mathause>`_
 - Added support for :py:class:`pandas.DatetimeIndex`-style rounding of
   ``cftime.datetime`` objects directly via a :py:class:`CFTimeIndex` or via the
   :py:class:`~core.accessor_dt.DatetimeAccessor`.