Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify root cause of netcdf-c errors in logs #13

Open
JimCircadian opened this issue Jan 18, 2023 · 7 comments
Open

Identify root cause of netcdf-c errors in logs #13

JimCircadian opened this issue Jan 18, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@JimCircadian
Copy link
Member

Struggling to narrow in on the cause of this, but using conda to manage the environment is creating HDF5 library incompatibilities. This might be the result of stored data and its preparation, as the BAS and JASMIN environments are the same (last time I checked) dependency wise but the errors are really disruptive in logging.

HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 4:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitRoundNumberOfSignificantBits'
    major: Attribute
    minor: Object not found

Definitely keen to see if this is seen by those running their own environments and pipelines from scratch. icenet_data commands are from previous environments are the likely culprit and these warnings aren't anything more than an annoyance.

@JimCircadian
Copy link
Member Author

Easy to clear out with shell foo, for readable logs, but doesn't get away from the problem 😛

find logs/ \( -name '*.log' -o -name '*.out' \) \
  -print -exec sed -ri -e '
    /^HDF5-DIAG/ d
    /^  \#[0-9]{3}/ d
    /^    (major|minor)\:/ d
    /^thread [0-9]+/ d' {} \;

@JimCircadian
Copy link
Member Author

This actually happened with an icenet_data_sic download, so it's not necessarily something resulting from data stored by previous environments. More testing will definitely be required

@tom-andersson
Copy link
Collaborator

FYI, if you ever start getting more nefarious errors, ensure you don't have parallel=True in xr.open_mfdataset!

@JimCircadian
Copy link
Member Author

JimCircadian commented Feb 2, 2023

There are occasions when parallel=True works very well @tom-andersson, especially when reading. This is a known issue with divergent library dependencies, it arises from not managing things entirely with conda. Thanks for the heads up though

@tom-andersson
Copy link
Collaborator

That's interesting @JimCircadian, thanks for sharing. I had weeks of stochastic NetCDF/HDF errors and managed to narrow it down to the use of parallel=True. I am indeed using both conda and pip in for the environment in question. Anyway, I diverge...

@JimCircadian
Copy link
Member Author

JimCircadian commented Feb 21, 2023

Found that this attribute in question relates to changes relating to compression, in the underlying netcdf C library, which is great to know. I wouldn't be surprised if the error is coming because we're spanning multiple datasets created using different versions using open_mfdataset, but without understanding the interactions between open_mfdataset and the build in question causing the issue (see below) I won't be able to narrow in on whether this is a data problem or a library interoperability issue.

There is talk that this is why we should be relying on conda for everything, but that is a cop out to me. Massive static conda requirements being set in stone were really troublesome when we started with the library redevelopment. This can be solved and hopefully even improved within the xarray <--> netcdf-c dependency chain once I understand what's happening. I like this sort of thing, so assigning this to myself again.

Build in question (4.9.0 actually contains the PR referenced)...

libnetcdf                 4.8.1           nompi_h261ec11_106    conda-forge

The good thing is that it's relatively clear this doesn't have any manifest effect, which I doubted it would.

@JimCircadian JimCircadian self-assigned this Feb 21, 2023
@JimCircadian JimCircadian changed the title Clear up HDF5 errors / rerun data saving on BAS Identify root cause of netcdf-c errors in logs Feb 21, 2023
@JimCircadian JimCircadian added the bug Something isn't working label Feb 21, 2023
@JimCircadian JimCircadian removed their assignment Mar 1, 2023
@JimCircadian
Copy link
Member Author

Interestingly whilst looking at iris processing, the HDF error becomes apparent:

cube *= 9.5  # This produces no HDF5 warnings, with the operation wiping the metadata
cube.data *= 9.5 # This produces the HDF5 warnings, with no wiping of metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants