-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate CDAT versus xcdat spatial average differences #166
Comments
@lee1043 - In the metrics package for NAFM - do you load data differently when computing metrics for NAFM (e.g., tas = f('tas', longitude=(310, 309)) or something like this)? The longitude domain goes from 310oE to 60oE (wraps around the prime meridian). I suspect when I am calling CDAT it is averaging between 60oE and 310oE (rather than wrapping around the prime meridian. |
@pochedls I think you've just found a bug in the PMP's Monsoon Wang metric's domain definition. I haven't had a chance to work with this specific metric so was not noticing the issue. In the PMP's driver for the Monsoon Wang metric, it just opens model file with regular cdms2 open, as As you suspected, the extracted subdomain for NAFM (North African Monsoon) was for 60E to 310E with mirrored, instead of 310E to 60E. I am bringing this issue to PMP issue for fix. On the other hand, although it is not expected but I think it would be nice if XCDAT handles this smartly if there is an easy way to do so. |
I've tried to ensure that |
Hi @pochedls, any progress on this issue? |
Thank you for the prompt - I haven't dug deeper, but I will devote some time to this along with the two other PR reviews. It looks like the majority of the differences are not xcdat bugs, but some of the differences I do not yet understand. |
I found the source of another difference in spatial averages: the latitude bounds. To summarize what I show below, the latitude bounds by xcdat match the
We can compare to the bounds in the original file via
All of the methods agree on the
If I drop the xcdat bounds ( Of the most recent 2373 xcdat-CDAT comparisons, there were 117 instances where the two libraries had a spatial average differences greater than 10-6 (for an individual time step). Some of these cases were larger than 10-3 (including the file outlined above). By generating the latitude bounds in xcdat instead of using the bounds in the file, 33 (of 117) differences become smaller than 10-6. This leaves 84 comparisons where the differences are still larger than 10-6. Can anyone see a reason not to use the latitude bounds in the file? |
It appears that some of the differences may be in the averager (or maybe the precision of the underlying data?). get spatial averages
Check if I get the same thing using
Problem remains. I'll try to calculate the value myself using the xcdat generated weighting matrix (which I believe is the same as the CDAT generated weight matrix):
The answer is different (and the difference is smaller). Same story using
@tomvothecoder - Do you have ideas about what may be going on. It seems like xcdat is slightly different from CDAT and my manual calculation. The data is stored as a float – maybe that is playing a role? |
It's always a good idea to have some sanity checks running >=-90.0, <=90.0, >=-180., <=360., as the garbage that I've seen in some of the (particularly early) CMIP data is always astounding - same with obs data too. The double lon values (same lon value to precision, but different values in the matrix) are another CMIP data quirk favourite of mine |
As a to do, I should check the time series differences using numpy.allclose(). |
This overloaded version may also be useful https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_allclose.html - there are some really useful functions that have appeared in the more recent numpy releases |
Reporting back on this...I looped over 18585 file/domain combinations.
For the 288 files:
Overall, I don't think there are issues with xcdat's spatial averaging functionality. @tomvothecoder - should I break some of this into separate issues or just keep updating this thread until I understand each problem? |
Nice @pochedls! It might be useful to add non-atmospheric grids as your problem counts will jump markedly |
@pochedls nice work! I'm recently looping over models with E3SM Diags, and found that ICON-ESM-LR somehow submitted data on unstructured grids, so it is expected that either xarray nor cdms2 could open its five ensembles. |
The spatial averaging functionality was designed to work with rectilinear grids.
Thanks @chengzhuzhang - yeah - I think most of the problems I list above are due to specific model problems. |
Great work @pochedls! I think we can continue updating this thread until we have enough information to warrant new GH issues. |
@tomvothecoder - I'm slowly whittling down this list. I wanted to see if you could look at one of these to see if it is a simple problem or warrants a ticket. When I call
For some reason, cf_xarray isn't generating an axes list. This seems to be a one-off problem with this file, though it isn't immediately obvious why (I don't see a problem with the metadata).
|
In that dataset, the |
We can probably improve the |
You also have a Thinking about this a little, it is basically what I had in mind with #200, so more than happy to engage in the case that rather than just solving this spatial averaging issue, it's done in a more complete (all coords) way. As an FYI, I have been polling out full CMIP6 archive, and have been collecting a bunch of weird edge cases that have barfed with |
Thanks @durack1. Currently, XCDAT's spatial averaging only operates on rectilinear grids. If we extend spatial averaging to support other axes/dimensions, we will make sure to update the Notes in the
You also mentioned #200 being related to #183. If it makes sense, we'll need to figure out a way to handle the different axes that might share the same attributes (e.g.,
It might be a good idea to open up a separate issue and list the errors thrown by Also, I plan on making bounds generation optional in #220. |
No problem, when you get to this, happy to engage. Currently, I have 27 edge cases to follow up on, and I am yet to ascertain whether this is related to |
@tomvothecoder: This comment is updated with an explanation for each difference. Based on today's conversation, I don't think this small subset of issues can generically be addressed in xcdat. I'm closing this issue. |
In order to continue to validate the spatial averaging code in
xcdat
, I looped over CMIP5/6 tas historical files (110 models) and 20+ domains here.227 (227/2486 = 9%) combinations had a CDAT-xcdat difference larger than 10-6 here. 110 were specific to the NAFM region. This drops to 140 (and 110 for NAFM) if I look at differences larger than 10-3.
Two models produced errors for xcdat (but not CDAT): NorCPM1 and MCM-UA-1-0. In the case of NorCPM, it is related to Issue #162. In the case of MCM-UA-1-0, the problem is likely related to #163. There were no cases where CDAT throws an error and xcdat does not. Both packages produce an error for ICON-ESM-LR, because tas has a shape of [1980, 20480] (non-rectilinear grid).
This ticket is intended to document my findings in investigating these differences.
The text was updated successfully, but these errors were encountered: