Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiple ensemble style syntax used for datasets in fx variable descriptions for pre processor #1081

Closed
thomascrocker opened this issue Apr 23, 2021 · 10 comments
Assignees
Labels
bug Something isn't working data issue enhancement New feature or request preprocessor Related to the preprocessor

Comments

@thomascrocker
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I'm writing recipes using a large number of the CORDEX EUR-11 models, and need to do some land sea masking. Unfortunately, for some (not all) of the CORDEX models there are not separate sftlf fx files for each ensemble member, and instead, the fx fields exist under ensemble r0i0p0 whereas the rest of the variables are under their own ensemble numbers r1i1p1 etc. See for example: https://esgf-index1.ceda.ac.uk/esg-search/search/?offset=0&limit=10&type=Dataset&replica=false&latest=true&domain=EUR-11&rcm_name=RCA4&time_frequency=fx%2Cmon&variable=sftlf%2Ctas&project=CORDEX&facets=rcm_name%2Cproject%2Cproduct%2Cdomain%2Cinstitute%2Cdriving_model%2Cexperiment%2Cexperiment_family%2Censemble%2Crcm_version%2Ctime_frequency%2Cvariable%2Cvariable_long_name%2Ccf_standard_name%2Cdata_node&format=application%2Fsolr%2Bjson
This causes the preprocessor to break because it is searching for the fx files under the same ensemble number as the rest of the variables (and they don't exist).
What would be great, would be if the user could use the syntax developed for specifying multiple ensembles in datasets at https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/overview.html#recipe-section-datasets e.g. by specifying something like:

mask_landsea:
      mask_out: sea
      fx_variables:
        - { short_name: sftlf, ensemble: "r(0:9)i(0:9)p(0:9)" }

In the preprocessor. However, the preprocessor interprets the string literally and fails to find the files.
It's not possible to use the Natural Earth dataset with this data, since the data is on a rotated pole and thus has multidimensional "latitude" and "longitude" co-ordinates which is not yet supported for use with shapefiles. (That's another issue I need to look into soon).

Would you be able to help out?
Would you have the time and skills to implement the solution yourself?
I can have a look, but I'll need some pointers on where to get started etc.

@thomascrocker thomascrocker added the enhancement New feature or request label Apr 23, 2021
@thomascrocker
Copy link
Contributor Author

Tagging @zklaus since I know he has been involved with the regional model work.

@thomascrocker thomascrocker self-assigned this Apr 23, 2021
@thomascrocker thomascrocker added bug Something isn't working data issue preprocessor Related to the preprocessor labels Apr 23, 2021
@thomascrocker
Copy link
Contributor Author

This appears to be applicable to large numbers of the CMIP5 and 6 datasets too.. i.e. the fx variables are stored under a seperate ensemble r0i0p0 from the other variables

@thomascrocker
Copy link
Contributor Author

thomascrocker commented Apr 23, 2021

Ahhh.. Looks like when not specifying the fx_variables: key the pre processor knows to go search different ensemble members.. OK, maybe this is actually an issue with the CORDEX CMOR tables and if I tweak those I can get away with not needing the fx_variables key...
esmvalcore._recipe_checks.RecipeError: Requested fx variable 'sftof' not available in any 'fx'-related CMOR table (['mon', 'fx']) for 'CORDEX'

@thomascrocker
Copy link
Contributor Author

Hmmm. So I updated the CORDEX FX table to include the missing sftof entry. Now if I just have the pre processor set as

mask_landsea:
      mask_out: sea

The recipe still fails to find the fx file stored under r0i0p0 and then fails because the natural earth shapefile can't be used with the rotated grid

2021-04-23 16:14:06,577 UTC [44120] DEBUG   esmvalcore._config._config:154 Retrieving CORDEX configuration
2021-04-23 16:14:06,577 UTC [44120] DEBUG   esmvalcore._recipe:63 If not present: adding keys from CMOR table to {'short_name': 'sftlf', 'preprocessor': 'test', 'start_year': 1980, 'end_year': 1999, 'exp': 'historical', 'variable_group': 'sftlf', 'diagnostic': 'stats', 'institute': 'SMHI', 'driver': 'NCC-NorESM1-M', 'dataset': 'RCA4', 'project': 'CORDEX', 'ensemble': 'r1i1p1', 'mip': 'fx', 'rcm_version': 'v1', 'domain': 'EUR-11', 'recipe_dataset_index': 0, 'alias': 'RCA4', 'original_short_name': 'tas', 'standard_name': 'air_temperature', 'long_name': 'Near-Surface Air Temperature', 'units': 'K', 'modeling_realm': ['atmos'], 'frequency': 'mon', 'filename': '/net/home/h02/tcrocker/code/EUCP_WP5_Lines_of_Evidence/esmvaltool/esmvaltool_output/recipe_cordex_test_20210423_161405/preproc/stats/tas/tas_EUR-11_NCC-NorESM1-M_RCA4_historical_r1i1p1_v1_mon_1980-1999.nc'}
2021-04-23 16:14:06,577 UTC [44120] DEBUG   esmvalcore._recipe:383 For fx variable 'sftlf', found table 'fx'
2021-04-23 16:14:06,577 UTC [44120] DEBUG   esmvalcore._config._config:154 Retrieving CORDEX configuration
2021-04-23 16:14:06,577 UTC [44120] DEBUG   esmvalcore._data_finder:102 {'exp', 'latestversion', 'domain', 'ensemble', 'dataset', 'institute', 'driver', 'mip', 'short_name', 'rcm_version'}
2021-04-23 16:14:06,597 UTC [44120] DEBUG   esmvalcore._data_finder:220 Skipping non-existent /project/champ/data/cordex/output/EUR-11/SMHI/NCC-NorESM1-M/historical/r1i1p1/RCA4/v1/fx/sftlf/{latestversion}
2021-04-23 16:14:06,598 UTC [44120] DEBUG   esmvalcore._config._config:154 Retrieving CORDEX configuration
2021-04-23 16:14:06,598 UTC [44120] DEBUG   esmvalcore._data_finder:102 {'exp', 'domain', 'ensemble', 'dataset', 'institute', 'driver', 'mip', 'short_name', 'rcm_version'}
2021-04-23 16:14:06,598 UTC [44120] DEBUG   esmvalcore._data_finder:23 Looking for files matching ['sftlf_EUR-11_NCC-NorESM1-M_historical_r1i1p1_SMHI-RCA4_v1_fx*.nc'] in []
2021-04-23 16:14:06,598 UTC [44120] WARNING esmvalcore._recipe:401 Missing data for fx variable 'sftlf'

@thomascrocker
Copy link
Contributor Author

Ahh, OK, I understand what is going on here. There is a fix hardcoded for CMIP5 here

# change ensemble to fixed r0i0p0 for fx variables

So something similar needs to happen for CORDEX, and possibly CMIP6. Trouble is I'm not sure the fix is so easy. I.e. some models have chosen to store fx variables under the same ensemble as the regular variables, whereas others have kept with the CMIP5 convention and left fx variables under r0i0p0.
I'll have a closer look through the datasets and see if I can summarise the situation.

@thomascrocker
Copy link
Contributor Author

thomascrocker commented Apr 26, 2021

OK, I think I have developed a fix for this now. Just putting together a PR.
Tagging @valeriupredoi and @schlunma since it involves a small modification to

def get_input_filelist(variable, rootpath, drs):
which they appear to be the authors of

@thomascrocker
Copy link
Contributor Author

OK. PR up.
As an aside. Are there suggestions for linting / code formatting for ESMValTool development? I use VSCode as my IDE and usually have flake8 linting, and black formatting enabled for python. It played havoc with saving the file, changing all the single quotes to double, and also modifying some line wrapping of long lines etc. so in the end I turned it all off.

@zklaus
Copy link

zklaus commented May 7, 2021

As an aside. Are there suggestions for linting / code formatting for ESMValTool development? I use VSCode as my IDE and usually have flake8 linting, and black formatting enabled for python. It played havoc with saving the file, changing all the single quotes to double, and also modifying some line wrapping of long lines etc. so in the end I turned it all off.

Flake8 should be ok, but we don't use black at the moment. Instead, yapf performs a similar function. There is a discussion going on on that topic though. See ESMValGroup/ESMValTool#2161

@bouweandela
Copy link
Member

Are there suggestions for linting / code formatting for ESMValTool development?

Please have a look at our contribution guidelines.

@bouweandela
Copy link
Member

Fixed by #1609.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data issue enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants