Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: open_dataset error with "Y" axis bounds #215

Closed
durack1 opened this issue Apr 1, 2022 · 5 comments · Fixed by #262
Closed

[Bug]: open_dataset error with "Y" axis bounds #215

durack1 opened this issue Apr 1, 2022 · 5 comments · Fixed by #262
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@durack1
Copy link
Collaborator

durack1 commented Apr 1, 2022

What happened?

A poorly formed (non-CF compliant) netcdf4 file has had issues with the xcdat.open_dataset. The log output below will catch this edge case, seems like it is related to the bounds.py functionality.

As an FYI, this appears to have 2 CF-compliance errors, and a warning:

CHECKING NetCDF FILE: /tmp/15373.nc
=====================
Using CF Checker Version 4.1.0
Checking against CF Version CF-1.7
Using Standard Name Table Version 79 (2022-03-19T15:25:54Z)
Using Area Type Table Version 10 (23 June 2020)
Using Standardized Region Name Table Version 4 (18 December 2018)


------------------
Checking variable: lat
------------------
ERROR: (3.3): Invalid standard_name: Latitude
ERROR: (3.3): Invalid standard_name modifier: axis
WARN: (3.1): units attribute should be present

------------------
Checking variable: plev
------------------

------------------
Checking variable: time
------------------

------------------
Checking variable: time_bounds
------------------

------------------
Checking variable: o3
------------------
INFO: attribute history is being used in a non-standard way

ERRORS detected: 2
WARNINGS given: 1
INFORMATION messages: 1

What did you expect to happen?

The file would be opened with a warning

Minimal Complete Verifiable Example

(xcdat02cdms315spy) bash-4.2$ python
Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:39:04) [GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xcdat
>>> xcdat.__version__
'0.2.0'
>>> from xcdat import open_dataset
>>> f = "/p/css03/esgf_publish/CMIP6/PMIP/IPSL/IPSL-CM6A-LR/midPliocene-eoi400/r1i1p1f1/AERmonZ/o3/grz/v20190118/o3_AERmonZ_IPSL-CM6A-LR_midPliocene-eoi400_r1i1p1f1_grz_185001-204912.nc"
>>> xH = open_dataset(f)
Traceback (most recent call last):
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 142, in get_bounds
    bounds = self._dataset.cf.get_bounds(axis)
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/cf_xarray/accessor.py", line 2023, in get_bounds
    raise KeyError(f"No results found for {key!r}.")
KeyError: "No results found for 'Y'."

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 104, in add_missing_bounds
    self.get_bounds(axis)
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 144, in get_bounds
    raise KeyError(f"{axis} bounds were not found, they must be added.")
KeyError: 'Y bounds were not found, they must be added.'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 142, in get_bounds
    bounds = self._dataset.cf.get_bounds(axis)
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/cf_xarray/accessor.py", line 2023, in get_bounds
    raise KeyError(f"No results found for {key!r}.")
KeyError: "No results found for 'Y'."

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 172, in add_bounds
    self.get_bounds(axis)
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 144, in get_bounds
    raise KeyError(f"{axis} bounds were not found, they must be added.")
KeyError: 'Y bounds were not found, they must be added.'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/dataset.py", line 113, in open_dataset
    ds = ds.bounds.add_missing_bounds()
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 107, in add_missing_bounds
    self._dataset = self.add_bounds(axis)
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 177, in add_bounds
    dataset = self._add_bounds(axis, width)
  File "~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/xcdat/bounds.py", line 240, in _add_bounds
    and "degree" in da_coord.attrs["units"]
KeyError: 'units'

Relevant log output

See above

Anything else we need to know?

Nope

Environment

xr.show_versions()
~/anaconda3/envs/xcdat02cdms315spy/lib/python3.10/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:39:04) [GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.42.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.3.0
pandas: 1.4.1
numpy: 1.22.3
scipy: None
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.03.0
distributed: 2022.3.0
matplotlib: installed
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.02.0
cupy: None
pint: None
sparse: None
setuptools: 61.2.0
pip: 22.0.4
conda: None
pytest: None
IPython: 7.32.0
sphinx: 4.5.0

@durack1 durack1 added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Apr 1, 2022
@pochedls
Copy link
Collaborator

In working on #244, I am using observational datasets, which have lots of quirks. I keep hitting this issue. I don't know if this is a bug (presumably these datasets do not have complete metadata), but if a dataset has a dimension called "lat" or "lon" I want xcdat to figure out that this corresponds to the X/Y dimension.

fn = "/p/user_pub/climate_work/pochedley1/surface/gistemp1200_GHCNv4_ERSSTv5.nc"
ds = xcdat.open_dataset(fn)
dsa = ds.spatial.average(data_var="tempanomaly")

KeyError: "A 'X' axis dimension was not found in the dataset. Make sure the dataset has 'X' axis coordinates and the coordinates' 'axis' attribute is set to 'X'."

But the dataset does have a longitude axis:

ds.lon

<xarray.DataArray 'lon' (lon: 180)>
array([-179., -177., -175., ..., 175., 177., 179.],
dtype=float32)
Coordinates:

  • lon (lon) float32 -179.0 -177.0 -175.0 -173.0 ... 175.0 177.0 179.0
    Attributes:
    standard_name: longitude
    long_name: Longitude
    units: degrees_east
    bounds: lon_bnds

@tomvothecoder - should this be considered a bug or is this behaving properly (and we should have a feature request to infer the X/Y attributes)?

@tomvothecoder
Copy link
Collaborator

Hi @pochedls, our current implementation of spatial averaging requires the CF-compliant axis attribute to be set on the desired coordinate variable so that cf_xarray can perform the mapping.

In your case, ds.lon must have the attribute axis="X":

    <xarray.DataArray 'lon' (lon: 180)>
    array([-179., -177., -175., ..., 175., 177., 179.],
    dtype=float32)
    Coordinates:

        lon (lon) float32 -179.0 -177.0 -175.0 -173.0 ... 175.0 177.0 179.0
        Attributes:
        axis: X <------------------------------ THIS IS REQUIRED
        standard_name: longitude
        long_name: Longitude
        units: degrees_east
        bounds: lon_bnds

cf_xarray also supports interpretation of CF-compliant standard_name attributes, but we haven't extended our APIs to support it yet.

I am exploring a way for xcdat to interpret and return the dimension coordinates if either axis or standard_name is set in #260. More info here: https://cf-xarray.readthedocs.io/en/latest/coord_axes.html

@pochedls
Copy link
Collaborator

I'm revisiting the code snippet from this comment, which I thought was addressed in 0.3.0 (here), but I am getting the same error...it is totally possible I am making a mistake somewhere, but conda list gives me xcdat 0.3.0.

@pochedls pochedls reopened this Jun 28, 2022
@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Jun 28, 2022

@pochedls I tested the code snipped from this comment using the latest main and v0.3.0. Both did not reproduce the error in this GH issue.

One possibility is you're executing the code using the dev env (xcdat_dev) and an outdated version main that doesn't have the commit that fixes this issue (assuming you're in the repo directory of xcdat), rather than the environment that has v0.3.0.

I'll try to help debug in our upcoming meeting.

@pochedls
Copy link
Collaborator

@tomvothecoder is correct and explained that although I was using the new environment installed via conda (xcdat 0.3.0), my import xcdat command was prioritizing the local xcdat files (over the conda version of xcdat), which were not up-to-date. This was confirmed with:

In[1]: xcdat.__file__

Out[1]: '/home/pochedley1/code/xcdat/xcdat/init.py'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants