-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a function to enrich STM using data from another dataset #66
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sarah, thanks for the nice implementation! It works well in general.
I have two comments regarding adding more checks on the incoming data. See above.
These comments come from a test I made with KNMI data. I also attached the notebook and the small example dataset here.
stmtools/stm.py
Outdated
|
||
# do selection | ||
indexers = {coord: ds[coord] for coord in list(datapoints.coords.keys())} | ||
selections = datapoints.sel(indexers, method="nearest") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a test using sel
to get nearest time location, with datetime64[ns]
format. This works under two conditions:
- For
time
dimension, in a specific combination:method=nearest
+datetime64[ns]
format, Xarray requires the time coordinates ofds
to be monotonic increasing or decreasing. - For
lat
andlon
dimension, the duplicated values should not exists in the corresponding coordinates.
Can we add check in the beginning of enrich_from_dataset
to enforce this?
An example check for time:
# This should be True
np.all(np.diff(ds['time'].values) > 0) or np.all(np.diff(ds['time'].values) < 0)
An example check for lat and lon:
# Following should be True
np.unique(ds['lat'].values).shape == ds['lat'].values.shape
np.unique(ds['lon'].values).shape == ds['lon'].values.shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. It seems that xarray does not enforce cf conventions. I added two util functions to do the checks and call them in stem _enrich_from_points_block
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that with KDTree, these two conditions are not needed. So I removed the checks from _enrich_from_points_block
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some tests for these conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rogerkuou to check if the points are unique, the test np.unique(ds['lat'].values).shape == ds['lat'].values.shape
is not enough because it only checks the duplicates in one dimension here lat
. However, for example, points can be located on one line.
Instead, we need a test if there are cases where (lat, lon, time)
are duplicated. Functions like xarray.Dataset.drop_duplicates
and pandas.DataFrame.duplicated
can be used to write a test. But these functions only work on dim
and not coords
. In our cases, lat
and lon
are coords
and space
is the dim
. So we might need to use unstack
which leads to memory problems.
As discussed, scipy KDTree
works if coordinates (lat, lon, time)
are duplicated and the values of variables e.g. temperature are the same too, I added a test for this. If the values of variables are not the same for duplicated coordinates, MacOs and linux behave differently to pick up a value related to the nearest neighbor. For this extreme case, I suggest creating a new issue and fixing it in another PR.
Co-authored-by: Ou Ku <o.ku@esciencecenter.nl>
thanks for the review. It is ready for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SarahAlidoost, thanks for the nice implementation!
I doced the follow-up issues in #75 and #76
I will merge it now.
closes #63
This PR implements two functions:
see #69 on why I changed
_io.py
in this PR.