Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate CDAT versus xcdat spatial average differences #166

Closed
pochedls opened this issue Nov 19, 2021 · 22 comments
Closed

Investigate CDAT versus xcdat spatial average differences #166

pochedls opened this issue Nov 19, 2021 · 22 comments
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@pochedls
Copy link
Collaborator

pochedls commented Nov 19, 2021

In order to continue to validate the spatial averaging code in xcdat, I looped over CMIP5/6 tas historical files (110 models) and 20+ domains here.

227 (227/2486 = 9%) combinations had a CDAT-xcdat difference larger than 10-6 here. 110 were specific to the NAFM region. This drops to 140 (and 110 for NAFM) if I look at differences larger than 10-3.

Two models produced errors for xcdat (but not CDAT): NorCPM1 and MCM-UA-1-0. In the case of NorCPM, it is related to Issue #162. In the case of MCM-UA-1-0, the problem is likely related to #163. There were no cases where CDAT throws an error and xcdat does not. Both packages produce an error for ICON-ESM-LR, because tas has a shape of [1980, 20480] (non-rectilinear grid).

This ticket is intended to document my findings in investigating these differences.

@pochedls
Copy link
Collaborator Author

@lee1043 - In the metrics package for NAFM - do you load data differently when computing metrics for NAFM (e.g., tas = f('tas', longitude=(310, 309)) or something like this)? The longitude domain goes from 310oE to 60oE (wraps around the prime meridian). I suspect when I am calling CDAT it is averaging between 60oE and 310oE (rather than wrapping around the prime meridian.

@lee1043
Copy link
Collaborator

lee1043 commented Nov 20, 2021

@pochedls I think you've just found a bug in the PMP's Monsoon Wang metric's domain definition. I haven't had a chance to work with this specific metric so was not noticing the issue. In the PMP's driver for the Monsoon Wang metric, it just opens model file with regular cdms2 open, as f = cdms2.open(modelFile)

As you suspected, the extracted subdomain for NAFM (North African Monsoon) was for 60E to 310E with mirrored, instead of 310E to 60E.

NAFM

I am bringing this issue to PMP issue for fix. On the other hand, although it is not expected but I think it would be nice if XCDAT handles this smartly if there is an easy way to do so.

@tomvothecoder tomvothecoder added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Nov 22, 2021
@pochedls
Copy link
Collaborator Author

I am bringing this issue to PMP issue for fix. On the other hand, although it is not expected but I think it would be nice if XCDAT handles this smartly if there is an easy way to do so.

I've tried to ensure that xcdat handles situations where the bounds cross the prime meridian. You need to specify the bounds as (lower, upper). If you specify a bound of (310, 60), it knows that you are wrapping around the prime meridian (i.e., 310 - 360 and 0 - 60) where as (60, 310) would not span the prime meridian (it includes only 60 - 310).

@tomvothecoder
Copy link
Collaborator

Hi @pochedls, any progress on this issue?

@pochedls
Copy link
Collaborator Author

pochedls commented Jan 5, 2022

Hi @pochedls, any progress on this issue?

Thank you for the prompt - I haven't dug deeper, but I will devote some time to this along with the two other PR reviews. It looks like the majority of the differences are not xcdat bugs, but some of the differences I do not yet understand.

@pochedls
Copy link
Collaborator Author

pochedls commented Jan 20, 2022

I found the source of another difference in spatial averages: the latitude bounds. To summarize what I show below, the latitude bounds by xcdat match the lat_bnds array in the original file. If I drop the xcdat bounds (ds.drop_vars(['lat_bnds'])) and regenerate them (ds = ds.bounds.add_missing_bounds()) I match the bounds in CDAT. This resolves 33 of the 117 instances where CDAT-versus-xcdat spatial averaging yield differences larger than 10-6.

fn = '/p/user_pub/xclim/CMIP5/CMIP/historical/atmos/mon/tas/CMIP5.CMIP.historical.BNU.BNU-ESM.r1i1p1.mon.tas.atmos.glb-z1-gu.1.0000000.0.xml'
f = cdms2.open(fn)
tas = f('tas', time=slice(0, 1))
f.close()
lat_bnds = tas.getLatitude().getBounds()
print(lat_bnds)

array([[-89.24743652, -86.48016358],
[-86.48016358, -83.70471955],
[-83.70471955, -80.91925812],
[-80.91925812, -78.13125229],
[-78.13125229, -75.34220887],
[-75.34220887, -72.55263519],
...

fn = '/p/css03/cmip5_css02/data/cmip5/output1/BNU/BNU-ESM/historical/mon/atmos/Amon/r1i1p1/tas/1/tas_Amon_BNU-ESM_historical_r1i1p1_185001-200512.nc'
ds = xcdat.open_dataset(fn)
lat_bnds = ds.bounds.get_bounds('lat')
ds.close()
print(lat_bnds)

array([[-87.8638 , -86.480164],
[-86.480164, -83.70472 ],
[-83.70472 , -80.919258],
[-80.919258, -78.131256],
[-78.131256, -75.342209],
[-75.342209, -72.552635],
...

We can compare to the bounds in the original file via ncdump -v lat_bnds tas_Amon_BNU-ESM_historical_r1i1p1_185001-200512.nc

lat_bnds =
-87.8638000488281, -86.4801635742188,
-86.4801635742188, -83.704719543457,
-83.704719543457, -80.9192581176758,
-80.9192581176758, -78.1312561035156,
-78.1312561035156, -75.3422088623047,
-75.3422088623047, -72.5526351928711,
...

All of the methods agree on the lat values themselves:

lat = -87.8638000488281, -85.0965270996094, -82.3129119873047,
-79.5256042480469, -76.7369003295898, -73.9475173950195,
-71.1577529907227, -68.3677597045898, -65.5776062011719,
-62.787353515625, -59.9970207214355, -57.2066307067871,
-54.4161987304688, -51.625732421875, -48.8352394104004,
...

If I drop the xcdat bounds (ds.drop_vars(['lat_bnds'])) and regenerate them (ds = ds.bounds.add_missing_bounds()) I match the bounds in CDAT.

Of the most recent 2373 xcdat-CDAT comparisons, there were 117 instances where the two libraries had a spatial average differences greater than 10-6 (for an individual time step). Some of these cases were larger than 10-3 (including the file outlined above). By generating the latitude bounds in xcdat instead of using the bounds in the file, 33 (of 117) differences become smaller than 10-6. This leaves 84 comparisons where the differences are still larger than 10-6.

Can anyone see a reason not to use the latitude bounds in the file?

@pochedls
Copy link
Collaborator Author

pochedls commented Jan 20, 2022

It appears that some of the differences may be in the averager (or maybe the precision of the underlying data?).

get spatial averages

import xcdat
import cdms2
import cdutil
import numpy as np

fn = '/p/css03/esgf_publish/CMIP6/CMIP/IPSL/IPSL-CM5A2-INCA/historical/r1i1p1f1/Amon/tas/gr/v20200729/tas_Amon_IPSL-CM5A2-INCA_historical_r1i1p1f1_gr_185001-201412.nc'

# calculate CDAT spatial average

f = cdms2.open(fn)
tas = f('tas', time=slice(0, 1))
f.close()
tas = tas[0]
tas_cdat_weighted = cdutil.averager(tas, axis='xy', weights='weighted')

# calculate xcdat spatial average
ds = xcdat.open_dataset(fn)
tas_xcdat_weighted = ds.xcdat.spatial_avg().tas.isel(time=0).load().data

# there is a global average difference
print('cdat value', tas_cdat_weighted)
print('xcdat value', tas_xcdat_weighted)
print('difference', tas_xcdat_weighted - tas_cdat_weighted)

cdat value 284.1609407146412
xcdat value 284.1607
difference -0.0002351482349354228

Check if I get the same thing using _get_weights and _averager:

# I can get the same thing manually getting the weights
# and calling the averager
weights = ds.spatial._get_weights(axis=['lat', 'lon'], lat_bounds=[-90, 90], lon_bounds=[0, 360])
weighted_averager_with__averager = ds.spatial._averager(ds.tas[0], axis=['lat', 'lon'], weights=weights).load().data

print('xcdat _averager call', weighted_averager_with__averager)
print('difference', weighted_averager_with__averager - tas_cdat_weighted)

xcdat _averager call 284.16067729983047
difference -0.00026341481071767703

Problem remains. I'll try to calculate the value myself using the xcdat generated weighting matrix (which I believe is the same as the CDAT generated weight matrix):

# try manual calculation with xcdat-generated weights
weighted_average = (np.sum(ds.tas[0] * weights) / np.sum(weights)).data
print('steve manual calculation', weighted_average)
print('difference', weighted_average - tas_cdat_weighted)

steve manual calculation 284.16095
difference 8.992390064577194e-06

The answer is different (and the difference is smaller).

Same story using np.average:

# Try numpy weighted averager
weighted_averager_with_numpy = np.average(ds.tas[0], weights=weights)

print('numpy weighted average', weighted_averager_with_numpy)
print('difference', weighted_averager_with_numpy - tas_cdat_weighted)

numpy weighted average 284.16095
difference 8.992390064577194e-06

@tomvothecoder - Do you have ideas about what may be going on. It seems like xcdat is slightly different from CDAT and my manual calculation. The data is stored as a float – maybe that is playing a role?

@durack1
Copy link
Collaborator

durack1 commented Jan 21, 2022

Can anyone see a reason not to use the latitude bounds in the file?

It's always a good idea to have some sanity checks running >=-90.0, <=90.0, >=-180., <=360., as the garbage that I've seen in some of the (particularly early) CMIP data is always astounding - same with obs data too. The double lon values (same lon value to precision, but different values in the matrix) are another CMIP data quirk favourite of mine

@pochedls
Copy link
Collaborator Author

As a to do, I should check the time series differences using numpy.allclose().

@durack1
Copy link
Collaborator

durack1 commented Feb 17, 2022

This overloaded version may also be useful https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_allclose.html - there are some really useful functions that have appeared in the more recent numpy releases

@pochedls
Copy link
Collaborator Author

pochedls commented Mar 30, 2022

Reporting back on this...I looped over 18585 file/domain combinations.

  • 288 files had a problem (either didn't load or a spatial average wasn't computed for CDAT/xcdat/both). This left us with 18297 files to compare.
  • 18243 / 18297 (99.7%) has CDAT/xcdat spatial average time series within tolerance (within ~0.003 for each time step).
  • If I dropped the bounds and manually computed the bounds, 18243/18243 (100%) comparisons were within tolerance.
  • xcdat was faster 9684 times (53%) and CDAT was faster 8613 times (47%), but the speed was strongly correlated (r = 0.82)

For the 288 files:

  • The following directories did not load with xcdat.open_mfdataset()

    • /p/css03/esgf_publish/cmip5/output1/ICHEC/EC-EARTH/historical/mon/atmos/Amon/r13i1p1/v20131231/tas/
    • /p/css03/esgf_publish/cmip5/output1/ICHEC/EC-EARTH/historical/mon/atmos/Amon/r14i1p1/v20130315/tas/
    • /p/css03/cmip5_css01/data/cmip5/output1/ICHEC/EC-EARTH/historical/mon/atmos/Amon/r7i1p1/v20120502/tas/
    • /p/css03/esgf_publish/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r3i1p1f1/Amon/tas/gr/v20200730/
    • /p/css03/scratch/cmip6/CMIP/NCC/NorCPM1/historical/r10i1p1f1/Amon/tas/gn/v20190914/
    • Update: The first three of these have repeated time coordinates. The fourth one has latitude values that are not the same in each file (three have bounds that are off order 10-6). The fifth one is covered in Issue conflicting values for variable 'lat_bnds' #162.
  • xcdat failed to produce spatial averages for the following (all 21 domains):

    • /p/css03/cmip5_css02/data/cmip5/output1/ICHEC/EC-EARTH/historical/mon/atmos/Amon/r1i1p1/tas/1/
    • /p/css03/esgf_publish/CMIP6/CMIP/UA/MCM-UA-1-0/historical/r1i1p1f1/Amon/tas/gn/v20190731/
    • /p/css03/esgf_publish/CMIP6/CMIP/UA/MCM-UA-1-0/historical/r1i1p1f2/Amon/tas/gn/v20190731/
    • Update: The MCM-UA-1-0 files have overlapping grid cells ([-1.875 to 1.875] conflicts with [354.375 to 360.])
    • Update: The EC-EARTH file needs further scrutiny. I am getting the following error: KeyError: "The data variable 'tas' is missing the '['lat', 'lon']' dimension, which is required for spatial averaging."
  • Both CDAT and xcdat failed to average the following (all 21 domains):

    • /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/ICON-ESM-LR/historical/r1i1p1f1/Amon/tas/gn/v20210215/
    • /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/ICON-ESM-LR/historical/r2i1p1f1/Amon/tas/gn/v20210215/
    • /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/ICON-ESM-LR/historical/r3i1p1f1/Amon/tas/gn/v20210215/
    • /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/ICON-ESM-LR/historical/r4i1p1f1/Amon/tas/gn/v20210215/
    • /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/ICON-ESM-LR/historical/r5i1p1f1/Amon/tas/gn/v20210215/
    • Update: These files are on a curvilinear grid (we don't support this for spatial averaging).
  • The yielded different results in CDAT and xcdat for a subset of domains (Sahel, GoG, SAmo):

    • /p/css03/esgf_publish/CMIP6/CMIP/NUIST/NESM3/historical/r1i1p1f1/Amon/tas/gn/v20190630/
    • /p/css03/esgf_publish/CMIP6/CMIP/NUIST/NESM3/historical/r2i1p1f1/Amon/tas/gn/v20190628/
    • /p/css03/esgf_publish/CMIP6/CMIP/NUIST/NESM3/historical/r3i1p1f1/Amon/tas/gn/v20190630/
    • /p/css03/esgf_publish/CMIP6/CMIP/NUIST/NESM3/historical/r4i1p1f1/Amon/tas/gn/v20190629/
    • /p/css03/esgf_publish/CMIP6/CMIP/NUIST/NESM3/historical/r5i1p1f1/Amon/tas/gn/v20190628/
    • Update: I believe this is a CDAT problem. When it sub-selects the Sahel region (with longitude bounds of -10 to 10) it appears to be omitting a little weight for the grid cell centered at 350.625 with bounds of 348.75 to 350.625 (I don't know why).

Overall, I don't think there are issues with xcdat's spatial averaging functionality. @tomvothecoder - should I break some of this into separate issues or just keep updating this thread until I understand each problem?

@durack1
Copy link
Collaborator

durack1 commented Mar 30, 2022

Nice @pochedls!

It might be useful to add non-atmospheric grids as your problem counts will jump markedly

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Mar 30, 2022

@pochedls nice work! I'm recently looping over models with E3SM Diags, and found that ICON-ESM-LR somehow submitted data on unstructured grids, so it is expected that either xarray nor cdms2 could open its five ensembles.

@pochedls
Copy link
Collaborator Author

Nice @pochedls!

It might be useful to add non-atmospheric grids as your problem counts will jump markedly

The spatial averaging functionality was designed to work with rectilinear grids.

@pochedls nice work! I'm recently looping over models with E3SM Diags, and found that ICON-ESM-LR somehow submitted data on unstructured grids, so it is expected that either xarray nor cdms2 could open its five ensembles.

Thanks @chengzhuzhang - yeah - I think most of the problems I list above are due to specific model problems.

@tomvothecoder
Copy link
Collaborator

Overall, I don't think there are issues with xcdat's spatial averaging functionality. @tomvothecoder - should I break some of this into separate issues or just keep updating this thread until I understand each problem?

Great work @pochedls! I think we can continue updating this thread until we have enough information to warrant new GH issues.

@pochedls
Copy link
Collaborator Author

@tomvothecoder - I'm slowly whittling down this list. I wanted to see if you could look at one of these to see if it is a simple problem or warrants a ticket. When I call spatial_avg on the following file, I get:

KeyError: "The data variable 'tas' is missing the '['lat', 'lon']' dimension, which is required for spatial averaging."

For some reason, cf_xarray isn't generating an axes list. This seems to be a one-off problem with this file, though it isn't immediately obvious why (I don't see a problem with the metadata).

ds = xcdat.open_mfdataset('/p/css03/cmip5_css02/data/cmip5/output1/ICHEC/EC-EARTH/historical/mon/atmos/Amon/r1i1p1/tas/1/*.nc')
ds.tas.cf.axes

{}

@tomvothecoder
Copy link
Collaborator

KeyError: "The data variable 'tas' is missing the '['lat', 'lon']' dimension, which is required for spatial averaging."

For some reason, cf_xarray isn't generating an axes list. This seems to be a one-off problem with this file, though it isn't immediately obvious why (I don't see a problem with the metadata).

In that dataset, the axis attribute might not be set for those dimensions so cf_xarray isn't listing them under ds.tas.cf.axes. I'll take a look and let you know if this is true.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Apr 14, 2022

ds = xcdat.open_mfdataset('/p/css03/cmip5_css02/data/cmip5/output1/ICHEC/EC-EARTH/historical/mon/atmos/Amon/r1i1p1/tas/1/*.nc')

# Check the dataset's CF metadata with cf_xarray -- CF Axes is n/a
# -----------------------------------------------------------------
ds.cf

Coordinates:
- CF Axes:   X, Y, Z, T: n/a

- CF Coordinates: * longitude: ['lon']
                  * latitude: ['lat']
                    vertical, time: n/a

- Cell Measures:   area, volume: n/a

- Standard Names: * latitude: ['lat']
                  * longitude: ['lon']

- Bounds:   n/a

Data Variables:
- Cell Measures:   area, volume: n/a

- Standard Names:   air_temperature: ['tas']

- Bounds:   lat: ['lat_bnds']
            latitude: ['lat_bnds']
            lon: ['lon_bnds']
            longitude: ['lon_bnds']
            time: ['time_bnds']

# Check the attributes for "lat" and "lon" -- confirmed no "axis" attr set
# -----------------------------------------------------------------
ds.lat.attrs

{'long_name': 'latitude',
 'units': 'degrees_north',
 'standard_name': 'latitude',
 'bounds': 'lat_bnds'}

ds.lon.attrs

{'long_name': 'longitude',
 'units': 'degrees_east',
 'standard_name': 'longitude',
 'bounds': 'lon_bnds'}

# Set "axis" attribute
# -----------------------------------------------------------------
ds.lat.attrs["axis"] = "Y"
ds.lon.attrs["axis"] = "X"

# Check the dataset with cf_xarray again -- the axes is now found
# -----------------------------------------------------------------
ds.cf

Coordinates:
- CF Axes: * X: ['lon']
           * Y: ['lat']
             Z, T: n/a

- CF Coordinates: * longitude: ['lon']
                  * latitude: ['lat']
                    vertical, time: n/a

- Cell Measures:   area, volume: n/a

- Standard Names: * latitude: ['lat']
                  * longitude: ['lon']

- Bounds:   n/a

Data Variables:
- Cell Measures:   area, volume: n/a

- Standard Names:   air_temperature: ['tas']

- Bounds:   X: ['lon_bnds']
            Y: ['lat_bnds']
            lat: ['lat_bnds']
            latitude: ['lat_bnds']
            lon: ['lon_bnds']
            longitude: ['lon_bnds']
            time: ['time_bnds']

We can probably improve the KeyError so that it says something like:
KeyError: "Could not find an X and/or Y axis for spatial averaging. Make sure the data variable 'tas' has X and Y axis coordinates and the 'axis' attribute is set for both."

@durack1
Copy link
Collaborator

durack1 commented Apr 14, 2022

You also have a T coord, so in addition to the KeyError: "Could not find an X and/or Y axis for spatial averaging might want to add T(and Z).

Thinking about this a little, it is basically what I had in mind with #200, so more than happy to engage in the case that rather than just solving this spatial averaging issue, it's done in a more complete (all coords) way. As an FYI, I have been polling out full CMIP6 archive, and have been collecting a bunch of weird edge cases that have barfed with xcdat/xr reads, after which I default to cdms2 before skipping if that fails. In many instances when xr has failed, cdms2 works

@tomvothecoder
Copy link
Collaborator

You also have a T coord, so in addition to the KeyError: "Could not find an X and/or Y axis for spatial averaging might want to add T(and Z).

Thanks @durack1. Currently, XCDAT's spatial averaging only operates on rectilinear grids. If we extend spatial averaging to support other axes/dimensions, we will make sure to update the KeyError.

Notes in the _get_weights() method:
https://github.com/XCDAT/xcdat/blob/c01b7d298a181e4385ec3aba3d265007a172cafe/xcdat/spatial.py#L317-L322

Thinking about this a little, it is basically what I had in mind with #200, so more than happy to engage in the case that rather than just solving this spatial averaging issue, it's done in a more complete (all coords) way

You also mentioned #200 being related to #183. If it makes sense, we'll need to figure out a way to handle the different axes that might share the same attributes (e.g., zlon and lon). I plan on looking into these issues soon.

As an FYI, I have been polling out full CMIP6 archive, and have been collecting a bunch of weird edge cases that have barfed with xcdat/xr reads, after which I default to cdms2 before skipping if that fails. In many instances when xr has failed, cdms2 works

It might be a good idea to open up a separate issue and list the errors thrown by xcdat/xr. For example, at the moment, xcdat.open_dataset()/open_mfdataset() automatically attempts to generate bounds for lat/lon/time for the Dataset. Maybe in some cases, cf_xarray isn't finding these axes because their attributes aren't CF compliant or they conflict?

Also, I plan on making bounds generation optional in #220.

@durack1
Copy link
Collaborator

durack1 commented Apr 19, 2022

I plan on looking into these issues soon.

No problem, when you get to this, happy to engage. Currently, I have 27 edge cases to follow up on, and I am yet to ascertain whether this is related to xcdat (ala #215) or an underlying xr.*open* issue. My process should be complete in a ~week, so can coalesce that information when required

@pochedls
Copy link
Collaborator Author

@tomvothecoder: This comment is updated with an explanation for each difference. Based on today's conversation, I don't think this small subset of issues can generically be addressed in xcdat. I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

5 participants