Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of the art FX variables handling without preprocessing #557

Merged
merged 25 commits into from
May 5, 2020

Conversation

valeriupredoi
Copy link
Contributor

This PR is the abridged version of #511 and includes all the modifications from #511 and #439 that @schlunma and myself have implemented and @zklaus toiled to review and suggest a number of very useful changes; it however doesn't include the hacky way of preprocessing fx data as in #511 and #439 since we may want to hold that in the background until implementing the iris cell_measures parsing as @bouweandela suggested; it also includes functionality to solve @ChrisJones-MOHC issue reported in #547
Most of it has been reviewed by @zklaus and should be in good shape, but alas, another pass and testing is needed; review for tests is still needed as well.
Cheers 🍺

@valeriupredoi valeriupredoi added enhancement New feature or request preprocessor Related to the preprocessor labels Mar 6, 2020
@valeriupredoi
Copy link
Contributor Author

I will keep maintaining #439 because @ledm and @LisaBock are still using it to get IPCC stuffs done, but I reckon that upon approval of this PR we can safely close #511 and start thinking how to get us rid of ancestors and implement the fx vars preprocessing via cell_measures and iris magic. @bouweandela and myself can do that in the near future 🍺

@ChrisJones-MOHC
Copy link

thanks @valeriupredoi I tried some testing.

It seems to work perfectly for UKESM1. I processed historical data and it correctly picked up the piControl sftlf file. I used pre-processor:
preproc_weighting:
weighting_landsea_fraction:
area_type: land
fx_files: [{short_name: sftlf, mip: fx, exp: piControl}]

I have also combined it with annual meaning in the pre-processor:
preproc_weighting:
weighting_landsea_fraction:
area_type: land
fx_files: [{short_name: sftlf, mip: fx, exp: piControl}]
annual_statistics:
operator: mean

Both of these bit-compare with raw CMIP6 files I pulled back and processed by hand.

I'll also give it a try with some other models.

@valeriupredoi
Copy link
Contributor Author

Note that now you can throw nothing or anything including the kitchen sink or anything inbetween at the preprocessors that use fx_files, see an example:

  prepocessor_namey:
    annual_statistics:
      operator: mean
    mask_landsea:
      mask_out: sea
      fx_files: [{short_name: sftlf, mip: fx, exp: piControl, ensemble: r1i1p1f2, grid: gn, activity: CMIP}]
    mask_landseaice:
      mask_out: sea
      fx_files: [{short_name: mrsofc, exp: piControl}]
    area_statistics:
      operator: mean
    volume_statistics:
      operator: mean
      fx_files: [{short_name: areacella, mip: fx, exp: piControl, ensemble: r1i1p1f2, grid: gn, activity: CMIP}]

@valeriupredoi
Copy link
Contributor Author

cheers muchly @ChrisJones-MOHC - was just writing a comment-example 🍺

@ChrisJones-MOHC
Copy link

just to confirm secondary test - it works also on CNRM-CM6-1 where it picked up sftlf from amip. Again results agree with processing manually.

esmvalcore/_recipe.py Outdated Show resolved Hide resolved
for fxvar in var['fx_files']}
settings[step]['fx_files'] = fx_files_dict
logger.info(msg, step, pformat(fx_files_dict))
"""Update fx settings depending on the needed method."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is needlessly complicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hang on, looking at it in a jiffy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok simplified now courtesy to @zklaus last suggestions, happy Bouwe or unhappy Bouwe? 🍺

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, much better! I still think the function inside a function _get_fx_vars_from_attribute is not really needed, or do you have a particular reason for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deal with it 😆

@schlunma schlunma added this to the IPCC AR6 milestone Mar 10, 2020
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
valeriupredoi and others added 4 commits March 10, 2020 14:30
Co-Authored-By: Bouwe Andela <bouweandela@users.noreply.github.com>
Co-Authored-By: Klaus Zimmermann <klaus.zimmermann@smhi.se>
@valeriupredoi
Copy link
Contributor Author

waddup, guys? 🍺

weighting_landsea_fraction:
area_type: land
exclude: ['CanESM2', 'reference_dataset']
fx_files: [{'short_name': 'sftlf', 'exp': 'piControl'}, {'short_name': 'sftof', 'exp': 'piControl'}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to use the keyword fx_variables instead of fx_files? The current name suggests you have to give it a path to a certain file which might be confusing (see for example https://esmvaltool.readthedocs.io/projects/esmvalcore/en/latest/recipe/preprocessor.html#extract-shape with the shapefile keyword.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this looks like it could be kept more readable by using the same format as for data sets:

datasets:
  - {dataset: CanESM2, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
  - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, start_year: 2001, end_year: 2004, grid: gn}
  - {dataset: EC-EARTH3, alias: custom_alias, project: CMIP6, exp: historical, ensemble: r1i1p1f1, start_year: 2001, end_year: 2004, grid: gn}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with fx_variables -> the user can specify whatever keys they want and in whatever order they want them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the user can specify whatever keys they want and in whatever order they want them

sorry for not being clear, I didn't mean the should add these specific keys and order, just the way it is formatted, so instead of the current:
fx_files: [{'short_name': 'sftlf', 'exp': 'piControl'}, {'short_name': 'sftof', 'exp': 'piControl'}]
have the other format in the documentation:

fx_files: 
  - {'short_name': 'sftlf', 'exp': 'piControl'}
  - {'short_name': 'sftof', 'exp': 'piControl'}

But it's just a suggestion. Polishing nail extensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I saw you now replaced the argument fx_variables on lots of other places as well, but the docstrings still say that these fx_variables then should be a dictionary with {field: filename of fx variable}. There's also some other places where still the name fx_files is used. maybe check whether in these instances that you now changed it's really not the full filename but just the variable name? If you change these couple, then change the other occurrences as well. (e.g def tile_grid_areas(cube, fx_files):)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @valeriupredoi how was the pizza? Did you have a look at this already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great pizza, cheers, I can't remember the toppings though 😁 So here's the thing: the way the user can ask for fx_variables in the recipe (as arg of the preprocessor step) can be either as a list of short names eg fx_variables: ['sftlf', 'sftof'] or as a list of dicts eg fx_variables: [{short_name: sftlf, mip: fx}, {short_name: sftof, mip: fx}] so making the listing look like the variable dictionary will mean that the user will be confused as to why it could also be just a simple list; tile_grid_areas actually takes as argument the fx files and not the variables, so that's why I left it with fx_files in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch though - I just realized I missed adding documentation for these funcs (now added) 🍺

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, if you think this won't cause confusion I'm perfectly fine with this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fine, RTFM will play a good part here 😁

var = dict(variable)
var_project = variable['project']
get_project_config(var_project)
Copy link
Contributor

@JaroCamphuijsen JaroCamphuijsen Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't seem to use the returned value from get_project_config() here, do you intend to only use the check and error raising? And if so, can't we make it more clear that you're doing that?

Copy link
Contributor Author

@valeriupredoi valeriupredoi Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with an actual RecipeError'

esmvalcore/_recipe.py Outdated Show resolved Hide resolved
@bouweandela
Copy link
Member

ping @valeriupredoi

@valeriupredoi
Copy link
Contributor Author

Soorry folks, on Easter hols (staying at home this year 😁 ), will revisit this bright and early on Tuesday 🐰

@valeriupredoi
Copy link
Contributor Author

ayt, guys, have now plugged them suggestions, what say yous? 🍺

Copy link
Contributor

@JaroCamphuijsen JaroCamphuijsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think it's ready to merge.

@valeriupredoi
Copy link
Contributor Author

cheers @JaroCamphuijsen mate, nice review 🍺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants