Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reading files with pattern-defying CF Conventions value format #295

Open
sadielbartholomew opened this issue May 17, 2024 · 0 comments · May be fixed by #296
Open

Error reading files with pattern-defying CF Conventions value format #295

sadielbartholomew opened this issue May 17, 2024 · 0 comments · May be fixed by #296
Assignees
Labels
bug Something isn't working netCDF read Relating to reading netCDF datasets

Comments

@sadielbartholomew
Copy link
Member

I recently encountered a file (from investigating spammy warnings which brought me to Unidata/cftime#328 - see the file attached in the opening comment there as an example) which had the Conventions global attribute value of :Conventions = "CF-1.6/CF-1.7" (checked via ncdump -h), , a compound form which isn't standard that cfdm can't read it because it errors on processing the version in a naive way, taking whatever it finds after matching the first "CF-" pattern if found:

>>> import cfdm
/home/slb93/git-repos/cfdm/cfdm/read_write/netcdf/netcdfread.py:1028: SyntaxWarning: invalid escape sequence '\s'
  all_conventions = re.split(",\s*", Conventions)
>>> cfdm.read("~/Downloads/subset.nc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/slb93/git-repos/cfdm/cfdm/read_write/read.py", line 328, in read
    fields = netcdf.read(
             ^^^^^^^^^^^^
  File "/home/slb93/git-repos/cfdm/cfdm/decorators.py", line 171, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/slb93/git-repos/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 1056, in read
    g["file_version"] = Version(file_version)
                        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/slb93/miniconda3/envs/cf-env-312/lib/python3.12/site-packages/packaging/version.py", line 200, in __init__
    raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '1.6/CF-1.7'

Whilst a weird-y value such as 'CF-1.6/CF-1.7' is not the CF-compliant value to set on that attribute and we shouldn't account for any weirdness that data may possess, IMO it shouldn't mean such files can't be read in at all. I looked at the logic of the 'Conventions' property processing and concluded that it isn't very robust and should be improved so that weird edge cases don't error and instead any non-standard and therefore ambiguous values such as this are ignored - the files can be read in but the CF version is considered ambiguous therefore gets set by our default logic for lack of known/set version.

PR to follow, which makes the Conventions attribute versions processing more robust through regular expressions.

@sadielbartholomew sadielbartholomew added the bug Something isn't working label May 17, 2024
@sadielbartholomew sadielbartholomew self-assigned this May 17, 2024
@sadielbartholomew sadielbartholomew added the netCDF read Relating to reading netCDF datasets label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working netCDF read Relating to reading netCDF datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant