Seriously reconsider the `pint` implementation for units conversion/computations #1687

guidocioni · 2021-02-03T10:53:00Z

guidocioni
Feb 3, 2021

I've been a 'happy' user of metpy since years (I think since the early days of 0.7) and I've been mainly using it to postprocess model data in a operational setup where speed and reliability are really important (you can find some examples here https://github.com/guidocioni/icon_globe).
I think it is a fantastic library given the many interpolation/computation functions and unit conversions. However I cannot wrap my head around some arbitrary choices that were made in the implementation of unit arrays with pint.

Let's face it, unit arrays are quite a cool idea but they are a pain to use and have many unexpected consequences. First of all, printing unit arrays in a jupyter notebook make the whole browser unresponsive, probably because the printing is not optimized for large arrays as in xarray.
Second, unit arrays are not correctly interpreted by some of the functions in xarray or numpy: sometimes the cast to an array happen, sometimes not and I always have to make sure that I convert result of computations in metpy to data array or numpy array in my workflow to avoid problems: this is extremely inefficient!

More of a practical exmaple: I want to compute equivalent potential temperature and add it back as DataArray to the original dataset. Right now what I have to do (I'm still using 0.12 as 1.0 introduces many changes that are not compatible with my pipeline) is

pres =  dset['plev'].metpy.unit_array
theta = mpcalc.potential_temperature(pres[:, None, None], dset[tvar])

theta = xr.DataArray(theta.magnitude,
                       coords= dset[tvar].coords,
                       attrs={'standard_name': 'Potential Temperature',
                              'units': theta.units},
                        name='theta')

out = xr.merge([dset, theta])
out.attrs = dset.attrs

as you see I have to manually create the DataArray from the output of the computation to merge it back in the original dataset.

To make the matter even worse some metpy computation functions accept only unit arrays, others only DataArrays while the result is almost always a unit array, at least in the old 0.12 version.

The new metpy release (1.0) contains some attempt to fix these inconsistencies by rewriting all the computation functions to accept xarray DataArray or Dataset as input, however the output of most computation functions is now a unit array wrapped into a DataArray which again makes things even more complicated to interpret. I understand there is now a builtin method to go back to a normal DataArray (.dequantify()) but I believe the implementation strategy of unit array should be revised.

Here is my humble proposal: if a dataset is parsed with metpy.parse_cf() it always has a unit attribute which is already used in the computation today. If this is the case, why not let the function returns directly a DataArray with the magnitude component of the unit array and the unit in the 'unit' attribute of the data array? I think this would also be the best CF compliant way and would be compatible with all the other functions that accept DataArray as inputs. In case the input is a unit array, instead, the output should be a unit array. But anyway this unit array conversion should happen in the background and not exposed to the user when using only xarray components.

Again this is only my personal experience after using the library for so long. I admire the work that has been put into writing the library but I felt like contributing to the discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seriously reconsider the `pint` implementation for units conversion/computations #1687

{{title}}

Replies: 0 comments

Select a reply

Seriously reconsider the pint implementation for units conversion/computations #1687

guidocioni Feb 3, 2021

Replies: 0 comments

Seriously reconsider the `pint` implementation for units conversion/computations #1687

guidocioni
Feb 3, 2021