Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to provide a '*' to the definition of a dataset in a recipe #589

Closed
jservonnat opened this issue Mar 26, 2020 · 5 comments · Fixed by #1609
Closed

Possibility to provide a '*' to the definition of a dataset in a recipe #589

jservonnat opened this issue Mar 26, 2020 · 5 comments · Fixed by #1609
Labels
enhancement New feature or request preprocessor Related to the preprocessor

Comments

@jservonnat
Copy link

Hello everyone,
this issue follows our discussion during the Is-ENES3 GA with @valeriupredoi
With CliMAF we found very interesting to be able to specify a wildcard '' to our dataset definitions, like for instance model='', realization='' to work on all the models or realizations available.
In the same way, we implemented the possibility to specify period='last_XXY', 'first_XXY' or '
', with XX being a number of years, to retrieve the last XX, first XX years available, or the full period.
Do you guys think you could consider adding this functionality?
Cheers,
J.

@valeriupredoi valeriupredoi transferred this issue from ESMValGroup/ESMValTool Mar 26, 2020
@valeriupredoi
Copy link
Contributor

nice @jservonnat 🍺 I shall have a look and start implementing this idea: my take would be:

  • wildcard for datasets; we can define an except option;
  • wildcard for experiments used with the option exp: ie user asks for an experiment but if data is unavailable for that experiment the code can choose from others (similar to the current exp: [list] but less restrictive);
  • wildcard for ensemble (CMIP6);
  • wildcard for years - all available years - but this one is tricky since we'll have to harmonize time boundaries for stuff like eg multimodel so we don't have to analyze a whole lot of time and discard it at zonal/meridional/multimodel stats

What do you guys reckon @mattiarighi @bouweandela @jvegasbsc 🍺

@valeriupredoi
Copy link
Contributor

PS - @jservonnat I moved your issue to ESMValCore since this deals with the data finder and logistics within the Core

@valeriupredoi
Copy link
Contributor

any suggestions/approvals/nay's @bouweandela @jvegasbsc @mattiarighi ? I am planning on starting work on this 🍺

@jvegreg
Copy link
Contributor

jvegreg commented Apr 2, 2020

  • wildcard for datasets; we can define an except option;

This can be useful, but I fear that the except will grow quite a bit

  • wildcard for experiments used with the option exp: ie user asks for an experiment but if data is unavailable for that experiment the code can choose from others (similar to the current exp: [list] but less restrictive);

This is more tricky, usually you don't have interchangeable experiments: you have ensemble members for that. Unless you are thinking on things like make equivalent CMIP6's historical and HighResMIP's highressst-present, but in this case using two different lines to define CMIP6 and HighResMIP datasets is

  • wildcard for ensemble (CMIP6);

This is becoming mandatory, there are datasets in CMIP6 with lots of members and its becoming a problem to keep track of all of them

  • wildcard for years - all available years - but this one is tricky since we'll have to harmonize time boundaries for stuff like eg multimodel so we don't have to analyze a whole lot of time and discard it at zonal/meridional/multimodel stats

This is mandatory to ease definitions for DCPP and similar decadal and seasonal experiments. It will be a pain to specify all DCPP startdates if we have to provide the exact data range for each one.

@bouweandela
Copy link
Member

I agree that it would be a very nice feature to be able to use glob patterns in the variable/dataset definitions in the recipe. It will be some work to implement this though.

We will also need to think a bit about how we want to make recipes in the ESMValtool repository reproducible if we use this feature. At the moment @mattiarighi tests if a recipe works with the variables and datasets that are part of it, but if this starts to depend on the data available, it becomes a bit harder to test that stuff actually works, so maybe we would not want to allow this for those recipes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
5 participants