Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMOR segfaults with mip cmor tables and CMIP6Plus CV.json #718

Closed
matthew-mizielinski opened this issue Nov 3, 2023 · 14 comments
Closed

Comments

@matthew-mizielinski
Copy link

In the attached zip there is an example where using the CMIP6Plus CVs or mip cmor tables causes a segfault in CMOR in the call to cmor.load_table (python api)

Note that I've had to amend the dimension field in the mip-cmor-tables to give something that CMOR v3.7.2 can read.

I think there must be an issue in the CVs file, but the segfault prevents me testing this further

minimal_example.zip

@matthew-mizielinski
Copy link
Author

We've worked out what is wrong in the minimal example being used here -- credit to @piotr-florek-mohc for debugging this

The CMIP6Plus_CV.json contains null entries for some parent experiment ids -- I think this is the cause of the segfaults as CMOR may be attempting to do string operations on None which will almost certainly fail.

Once this is corrected then CMOR behaves itself again. One option might be to prescan every JSON file read and fail if a null or None value is found anywhere.

The other fixes to get the above to work are; correction to mip table id in the table_id section of the CVs file and an update to the table_id in the header of the mip table itself (to include the MIP_ prefix.

@durack1
Copy link
Contributor

durack1 commented Nov 3, 2023

Just noting the associated issue PCMDI/mip-cmor-tables#27

@mauzey1
Copy link
Collaborator

mauzey1 commented Nov 8, 2023

@matthew-mizielinski @durack1 Since the issues occurring with the example can be fixed within the MIP tables, should we close this issue?

@durack1
Copy link
Contributor

durack1 commented Nov 8, 2023

@mauzey1 thanks for circling. As it's a segfault, it would be great to catch this issue and ensure that such a poorly defined use case doesn't segfault.

We also wanted to make sure that the structure of the json input files made sense, and then CMOR could read this more sensible format - in the case of a list type vs space-separated string type.

So it would be good to make some tweaks to CMOR to deal with the segfault, and separately deal with the json type formats as well

@mauzey1
Copy link
Collaborator

mauzey1 commented Nov 8, 2023

How should we handle null in the tables? Throw an error? Ignore them? Treat as empty strings?

@durack1
Copy link
Contributor

durack1 commented Nov 8, 2023

In a situation where poorly constructed tables/jsons inputs are provided, it would be great to note the problem and throw a traceback, rather than a segfault - if CMOR doesn't know about it, then we shouldn't have to deal with it.

@matthew-mizielinski may have a subtly different take which I would like to hear

@matthew-mizielinski
Copy link
Author

@mauzey1, @durack1

the output from CMOR when it segfaults is pretty unhelpful, and it is pretty easy to sink a lot of time trying to work out where the issue is. For example, I think the example I attached implies that the tables are at fault, in that the failing call is load_table(), so I spent a fair amount of time with the tables before Piotr identified the issue.

I don't think there is a valid situation where a null value should appear in any of the input files used by CMOR (feel free to correct me if I've missed something), so we could have a sanity checking function that walks the dictionary immediately after loading the tables/CVs and raises an error that a null has been found.

Note that I don't think that this is a critical thing to fix and release immediately. We'll get something into any CVs/MIP tables code to prevent null's appearing, but given we've identified this issue it would be good to cover it off.

@mauzey1
Copy link
Collaborator

mauzey1 commented Nov 17, 2023

I was looking at where the segfault was happening in CMOR and found it within the cmor_CV_set_att function in cmor_CV.c. It happens when it tries to read a null value from an array as a string.

cmor/Src/cmor_CV.c

Lines 74 to 89 in 047fd2c

} else if (json_object_is_type(joValue, json_type_array)) {
int i;
pArray = json_object_get_array(joValue);
length = array_list_length(pArray);
CV->aszValue = (char **) malloc(length * sizeof(char**));
for(i=0; i < length; i++) {
CV->aszValue[i] = (char *) malloc(sizeof(char) * CMOR_MAX_STRING);
}
json_object *joItem;
CV->anElements = length;
for (k = 0; k < length; k++) {
joItem = (json_object *) array_list_get_idx(pArray, k);
strcpy(CV->aszValue[k], json_object_get_string(joItem));
}
CV->type = CV_stringarray;

However, I noticed in this function that there is a case where it will ignore a null value found inside a JSON object.

cmor/Src/cmor_CV.c

Lines 33 to 35 in 047fd2c

if (json_object_is_type(joValue, json_type_null)) {
printf("Will not save NULL JSON type from CV.json\n");

I'm guessing there have been cases where null was found in tables, but just wasn't anticipated to be found inside arrays.

Either way, I agree that there shouldn't be a valid case for having null values in tables. I will go ahead with the idea of having a check for every JSON table that throws an error at the first instance of a null value.

@durack1
Copy link
Contributor

durack1 commented Nov 17, 2023

@mauzey1 perfect, I think such a check would make the software more robust, particularly as we're anticipating increased use over time with people text editing json files in the worst cases - if it has a problem reading the file, it should point out the issue to whatever granularity is relatively easily possible, and exit, throwing the error and alerting a user to where to look to fix it

@durack1
Copy link
Contributor

durack1 commented Nov 21, 2023

The additional tweak is to the format of the dimensions, modeling_realm attributes, from a string to a json list type

CMIP6

        "tas": {
            "frequency": "mon", 
            "modeling_realm": "atmos", 
            "standard_name": "air_temperature", 
            "units": "K", 
            "cell_methods": "area: time: mean", 
            "cell_measures": "area: areacella", 
            "long_name": "Near-Surface Air Temperature", 
            "comment": "near-surface (usually, 2 meter) air temperature", 
            "dimensions": "longitude latitude time height2m", 
            "out_name": "tas", 
            "type": "real", 
            "positive": "", 
            "valid_min": "", 
            "valid_max": "", 
            "ok_min_mean_abs": "", 
            "ok_max_mean_abs": ""
        }, 

https://github.com/PCMDI/cmip6-cmor-tables/blob/main/Tables/CMIP6_Amon.json#L1151-L1168

CMIP6Plus

    "tas": {
      "cell_measures": "area: areacella",
      "cell_methods": "area: time: mean",
      "comment": "near-surface (usually, 2 meter) air temperature",
      "dimensions": [
        "longitude",
        "latitude",
        "time",
        "height2m"
      ],
      "frequency": "mon",
      "long_name": "Near-Surface Air Temperature",
      "modeling_realm": [
        "atmos"
      ],
      "ok_max_mean_abs": 295.0,
      "ok_min_mean_abs": 255.0,
      "out_name": "tas",
      "positive": "",
      "standard_name": "air_temperature",
      "type": "real",
      "units": "K",
      "valid_max": 350.0,
      "valid_min": 170.0
    },

https://github.com/PCMDI/mip-cmor-tables/blob/c46c240e448ad17c2cfec31fcf639aea1f51d1f0/Tables/MIP_APmon.json#L3077-L3101

@wolfiex ping

@taylor13
Copy link
Collaborator

I vaguely recall that a json list type doesn't care about order, but for dimensions and modeling_realm we definitely do care. Is this a problem?

@mauzey1
Copy link
Collaborator

mauzey1 commented Jan 8, 2024

Getting back to this issue, I have so far made some changes for finding null values in JSON tables and throwing an error message when found. Are there other invalid values that we should look for in tables?

When I run CMOR with the mip-cmor-tables, I get a lot of warning messages for the attributes provenance, validation, and branded_variable_name. We could extract ok_max_mean_abs, ok_min_mean_abs, valid_max, and valid_min out of validation, however I'm not sure how to useprovenance and branded_variable_name. Should CMOR just ignore them?

@durack1
Copy link
Contributor

durack1 commented Jan 8, 2024

Thanks @mauzey1 we need to get onto finalizing this format early this year, so @wolfiex did you want to update us on 2024 status so we can plot?

@matthew-mizielinski
Copy link
Author

Some of these might be best as their own issues, but the small tweaks we've discussed are;

  • remove product field from output - currently defaults to model-output with the value at the top of a table overwriting this
  • avoid further_info_url automatic creation as this isn't going to be used beyond CMIP6 for now
  • (possibly) remove short_description from output -- it relies on mip_era from tables (now removed) and this field will do something strange when the CORDEX tables are picked up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants