PrePARE UnicodeDecodeError when scanning dummy NO_ATTRIBUTES.nc netCDF file #643

atmodatcode · 2022-01-14T19:49:38Z

Hello,
when exploring how the PrePARE checker is working, I tried it out on various files.
While it works perfectly on a ps_3hr_MPI-ESM1-2-HR_historical_r6i1p1f1_gn_201007120000-201007122100.nc file, there is a strange UnicodeDecodeError when using it for scanning the dummy netcdf file https://raw.githubusercontent.com/AtMoDat/demo_data/main/NO_ATTRIBUTES.nc.

PrePARE.py --variable ps NO_ATTRIBUTES.nc --table-path /pool/data/CMIP6/cmip6-cmor-tables/Tables/CMIP6_3hr.json
Traceback (most recent call last):
  File "/mnt/lustre01/work/bm0021/conda-envs/quality-assurance/lib/python3.9/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 1022, in <module>
    main()
  File "/mnt/lustre01/work/bm0021/conda-envs/quality-assurance/lib/python3.9/site-packages/cmip6_cv/PrePARE/PrePARE.py", line 935, in main
    log_text = f.read()
  File "/mnt/lustre01/work/bm0021/conda-envs/quality-assurance/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9f in position 18771: invalid start byte

The dummy file is a netCDF file with no metadata, but otherwise a valid netCDF file.
E.g. output when scanning with the latest CEDA CF Checker

cfchecks NO_ATTRIBUTES.nc
CHECKING NetCDF FILE: NO_ATTRIBUTES.nc
=====================
Using CF Checker Version 4.1.0
Checking against CF Version CF-1.8
Using Standard Name Table Version 78 (2021-09-21T11:55:06Z)
Using Area Type Table Version 10 (23 June 2020)
Using Standardized Region Name Table Version 4 (18 December 2018)
[...]
ERRORS detected: 0
WARNINGS given: 9
INFORMATION messages: 2

Thanks and best regards

The text was updated successfully, but these errors were encountered:

matthew-mizielinski · 2022-01-18T09:27:40Z

Hi @atmodatcode, the file you are working with looks to be a long way from what we would expect to be passed in.

If you give the file a name that has some of the information expected, even just ps_3hr.nc rather than NO_ATTRIBUTES.nc, and run PrePARE--table-path <location> ps_3hr.nc you'll get past the failure.

The code itself could probably handle this if the two open(logfile, 'r') calls in PrePARE.py (lines 934 and 958) were modified to include encoding='utf8', errors='ignore' arguments -- I'm guessing that the C code is outputting some characters that upsets python3.

mauzey1 · 2022-01-21T22:56:47Z

I have discovered that there was an error message being generated using a string that wasn't initialized since the experiment_id attribute was missing from the dataset. This introduced an invalid character into the error message, which caused PrePARE to crash when it tried to read the log file.

cmor/Src/cmor_CV.c

Lines 1035 to 1038 in df7fc34

    
           snprintf(msg, CMOR_MAX_STRING, 
        
                    "Your experiment_id \"%s\" defined in your input file\n! " 
        
                    "could not be found in your Control Vocabulary file.(%s)\n! ", 
        
                    szExperiment_ID, CV_Filename);

I have added some code to check if the attribute exists before proceeding to code that requires the attribute's value. I will create a pull request to merge it.

taylor13 · 2022-01-21T23:05:55Z

I wonder if there are similar cases to this, since the original coder may not have anticipated handling such an egregiously out-of-conformance file. I realize this only occurred because experiment_id was missing, so maybe there are no other instances of imbedded dependencies like this.

durack1 · 2022-01-22T07:15:13Z

@taylor13 agreed, it would be great if we could catch such an error and stop - at the very least what @matthew-mizielinski suggests and what @mauzey1 has implemented is a step in the right direction

mauzey1 mentioned this issue Jan 21, 2022

Check if experiment_id attribute is present in current dataset when being processed by PrePARE #644

Merged

mauzey1 closed this as completed in #644 Jan 24, 2022

mauzey1 mentioned this issue Aug 18, 2022

CMOR 3.7.0 #668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrePARE UnicodeDecodeError when scanning dummy NO_ATTRIBUTES.nc netCDF file #643

PrePARE UnicodeDecodeError when scanning dummy NO_ATTRIBUTES.nc netCDF file #643

atmodatcode commented Jan 14, 2022

matthew-mizielinski commented Jan 18, 2022 •

edited

Loading

mauzey1 commented Jan 21, 2022 •

edited

Loading

taylor13 commented Jan 21, 2022

durack1 commented Jan 22, 2022

PrePARE UnicodeDecodeError when scanning dummy NO_ATTRIBUTES.nc netCDF file #643

PrePARE UnicodeDecodeError when scanning dummy NO_ATTRIBUTES.nc netCDF file #643

Comments

atmodatcode commented Jan 14, 2022

matthew-mizielinski commented Jan 18, 2022 • edited Loading

mauzey1 commented Jan 21, 2022 • edited Loading

taylor13 commented Jan 21, 2022

durack1 commented Jan 22, 2022

matthew-mizielinski commented Jan 18, 2022 •

edited

Loading

mauzey1 commented Jan 21, 2022 •

edited

Loading