Seriously reconsider the pint
implementation for units conversion/computations
#1687
guidocioni
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been a 'happy' user of metpy since years (I think since the early days of 0.7) and I've been mainly using it to postprocess model data in a operational setup where speed and reliability are really important (you can find some examples here https://github.com/guidocioni/icon_globe).
I think it is a fantastic library given the many interpolation/computation functions and unit conversions. However I cannot wrap my head around some arbitrary choices that were made in the implementation of unit arrays with
pint
.Let's face it, unit arrays are quite a cool idea but they are a pain to use and have many unexpected consequences. First of all, printing unit arrays in a jupyter notebook make the whole browser unresponsive, probably because the printing is not optimized for large arrays as in xarray.
Second, unit arrays are not correctly interpreted by some of the functions in xarray or numpy: sometimes the cast to an array happen, sometimes not and I always have to make sure that I convert result of computations in metpy to data array or numpy array in my workflow to avoid problems: this is extremely inefficient!
More of a practical exmaple: I want to compute equivalent potential temperature and add it back as DataArray to the original dataset. Right now what I have to do (I'm still using 0.12 as 1.0 introduces many changes that are not compatible with my pipeline) is
as you see I have to manually create the DataArray from the output of the computation to merge it back in the original dataset.
To make the matter even worse some metpy computation functions accept only unit arrays, others only DataArrays while the result is almost always a unit array, at least in the old 0.12 version.
The new metpy release (1.0) contains some attempt to fix these inconsistencies by rewriting all the computation functions to accept
xarray
DataArray or Dataset as input, however the output of most computation functions is now a unit array wrapped into a DataArray which again makes things even more complicated to interpret. I understand there is now a builtin method to go back to a normal DataArray (.dequantify()
) but I believe the implementation strategy of unit array should be revised.Here is my humble proposal: if a dataset is parsed with
metpy.parse_cf()
it always has a unit attribute which is already used in the computation today. If this is the case, why not let the function returns directly a DataArray with the magnitude component of the unit array and the unit in the 'unit' attribute of the data array? I think this would also be the best CF compliant way and would be compatible with all the other functions that accept DataArray as inputs. In case the input is a unit array, instead, the output should be a unit array. But anyway this unit array conversion should happen in the background and not exposed to the user when using onlyxarray
components.Again this is only my personal experience after using the library for so long. I admire the work that has been put into writing the library but I felt like contributing to the discussion.
Beta Was this translation helpful? Give feedback.
All reactions