Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obs4MIPs dataset and version are outdated in recipes #2974

Open
bouweandela opened this issue Dec 15, 2022 · 19 comments
Open

obs4MIPs dataset and version are outdated in recipes #2974

bouweandela opened this issue Dec 15, 2022 · 19 comments

Comments

@bouweandela
Copy link
Member

bouweandela commented Dec 15, 2022

Many of the recipes in ESMValTool use "legacy" names for the dataset and version number of obs4MIPs datasets. The obs4MIPs specification is from 2017, so the naming scheme in those recipes is considered "legacy" for over 5 years now. I think it's time we update those recipes so they are in line with the current specification. Updating has the advantage that downloading the data automatically from ESGF will actually work.

@bouweandela
Copy link
Member Author

@ESMValGroup/esmvaltool-coreteam I could make a pull request to update those recipes, if there are no objections?

@remi-kazeroni
Copy link
Contributor

Thanks for opening the discussion @bouweandela. I think it would be good to align our recipes with the current specifications. Do I understand it correctly that we would not necessarily specify the obs4MIPs dataset versions (i.e. vYYYYMMDD) in our recipes but assume the latest version will be downloaded if not available locally? I'm not sure how important it is to get the dataset versions that were used at the time the recipes were developed. For example, the most used data in recipes is dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-7 which is no longer available on ESGF but version: Ed2-8 is. The bottom line is: we may need to ask recipe maintainers if their recipes need special care w.r.t the input obs4MIPs datasets, especially if no data are available on ESGF (which I suspect will occur in a few cases...)

@bouweandela
Copy link
Member Author

bouweandela commented Jan 5, 2023

Do I understand it correctly that we would not necessarily specify the obs4MIPs dataset versions (i.e. vYYYYMMDD) in our recipes but assume the latest version will be downloaded if not available locally?

Yes

I'm not sure how important it is to get the dataset versions that were used at the time the recipes were developed. For example, the most used data in recipes is dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-7 which is no longer available on ESGF but version: Ed2-8 is.

Our broken recipe policy states that recipes can use only data that is publicly available, so recipes that use data that is not or no longer publicly available will need to be updated or marked as broken. To get a better overview of the issue, I created the following table.

Here is a mapping table from datasets and versions currently used in recipes to ESGF datasets:

Old dataset Old version New dataset (as used on ESGF) notes
AIRS RetStd-v5 AIRS-2-1 guess based on the available variables, but the filenames on ESGF differ from those on Levante so there might be some differences
AIRS-2-0 v2 AIRS-2-0 already up to date
ATSR ARC-v1.1.1 ARC-SST-1-1
CERES-EBAF Ed2-7 not available on ESGF
CERES-EBAF Ed2-8 CERES-EBAF for variables rlut, rlutcs, rsdt, rsut, rsutcs
CERES-EBAF Ed2-8 CERES-EBAF_Surface for variables rlds, rldscs, rlus, rsds, rsdscs, rsus, rsuscs
GPCP-SG v2.2 GPCP-V2.2
GPCP-SG v2.3 GPCP-V2.3
ISCCP V1.0 ISCCP already up to date
MODIS C5 MODIS-1-0
SSMI RSSv07r00 RSS-v7 filenames on ESGF differ from those on Levante so there might be some differences
SSMI-MERIS v1-00 SSMI-MERIS already up to date
TRMM-L3 7A TRMM

Note that the version facet needs to be removed or updated to a valid version string as used on ESGF. The level facet can be also be removed.

And here is a list of affected recipes:

recipe_smpi.yml #2991
GPCP-SG v2.2 ['pr']

recipe_perfmetrics_CMIP5.yml
AIRS RetStd-v5 ['hus']
CERES-EBAF Ed2-7 ['lwcre (derived from rlut, rlutcs)', 'rlut', 'rsut', 'swcre (derived from rsut, rsutcs)']
GPCP-SG v2.2 ['pr']

recipe_ecs_scatter.yml
AIRS RetStd-v5 ['hus', 'husStderr']
CERES-EBAF Ed2-7 ['rsdt', 'rsut', 'rsutcs']
TRMM-L3 7A ['pr', 'prStderr']

recipe_quantilebias.yml
GPCP-SG v2.3 ['pr']

recipe_clouds_bias.yml
GPCP-SG v2.2 ['pr']
MODIS C5 ['clt']

recipe_schlund20esd.yml
AIRS RetStd-v5 ['hus']
AIRS-2-0 v2 ['hur']
CERES-EBAF Ed2-7 ['rsdt', 'rsut', 'rsutcs']
GPCP-SG v2.2 ['pr']

recipe_flato13ipcc.yml
CERES-EBAF Ed2-7 ['lwcre (derived from rlut, rlutcs)', 'netcre (derived from rlut, rlutcs, rsut, rsutcs)', 'rlut', 'swcre (derived from rsut, rsutcs)']
GPCP-SG v2.2 ['pr']

recipe_perfmetrics_CMIP5_4cds.yml
AIRS RetStd-v5 ['hus']
CERES-EBAF Ed2-7 ['rlut', 'rsut']
GPCP-SG v2.2 ['pr']

recipe_wenzel16jclim.yml
CERES-EBAF Ed2-7 ['asr (derived from rsdt, rsut)']

recipe_lauer13jclim.yml
CERES-EBAF Ed2-7 ['lwcre (derived from rlut, rlutcs)', 'swcre (derived from rsut, rsutcs)']
GPCP-SG v2.2 ['pr']
MODIS C5 ['clt']

recipe_cmug_h2o.yml
CERES-EBAF Ed2-8 ['rsnstcsnorm (derived from rsdscs, rsdt, rsuscs, rsutcs)']

recipe_ecs_constraints.yml
AIRS RetStd-v5 ['hus']
AIRS-2-0 v2 ['hur']
CERES-EBAF Ed2-7 ['rsdt', 'rsut', 'rsutcs']
GPCP-SG v2.2 ['pr']

recipe_autoassess_landsurface_surfrad.yml
CERES-EBAF Ed2-7 ['rlns (derived from rlds, rlus)', 'rsns (derived from rsds, rsus)']

recipe_smpi_4cds.yml
GPCP-SG v2.2 ['pr']

recipe_validation.yml #3002
CERES-EBAF Ed2-7 ['rsut', 'rtnt (derived from rlut, rsdt, rsut)']

recipe_clouds_ipcc.yml
CERES-EBAF Ed2-7 ['lwcre (derived from rlut, rlutcs)', 'netcre (derived from rlut, rlutcs, rsut, rsutcs)', 'swcre (derived from rsut, rsutcs)']

recipe_ocean_quadmap.yml
ATSR ARC-v1.1.1 ['tos']

recipe_deangelis15nat.yml
CERES-EBAF Ed2-8 ['rsnstcsnorm (derived from rsdscs, rsdt, rsuscs, rsutcs)']
SSMI RSSv07r00 ['prw']

recipe_perfmetrics_land_CMIP5.yml
CERES-EBAF Ed2-8 ['rlds', 'rlus', 'rsds', 'rsus']

bock20jgr/recipe_bock20jgr_fig_6-7.yml
AIRS RetStd-v5 ['hus']
CERES-EBAF Ed2-8 ['lwcre (derived from rlut, rlutcs)', 'rlut', 'rsut', 'swcre (derived from rsut, rsutcs)']
GPCP-SG v2.2 ['pr']

bock20jgr/recipe_bock20jgr_fig_8-10.yml
CERES-EBAF Ed2-7 ['swcre (derived from rsut, rsutcs)']

bock20jgr/recipe_bock20jgr_fig_1-4.yml
GPCP-SG v2.3 ['pr']

ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_atmosphere.yml
GPCP-SG v2.3 ['pr']

recipe_radiation_budget
CERES-EBAF Ed2-7 ['rsut', 'rsutcs', 'rlut', 'rlutcs']

@bouweandela
Copy link
Member Author

@remi-kazeroni Would you prefer a single pull request with all updates, or a pull request per recipe? And what about the outdated level facet, do we want to keep it or remove it?

@bouweandela bouweandela added this to the v2.8.0 milestone Feb 27, 2023
@remi-kazeroni
Copy link
Contributor

remi-kazeroni commented Feb 28, 2023

@remi-kazeroni Would you prefer a single pull request with all updates, or a pull request per recipe?

I would prefer to have one pull request per recipe or group of recipes. There are 26 recipes in this list. I think that if we have a single PR for all recipes and need to wait for all recipes to be reviewed by their maintainers/experienced users, this would probably take too long for the release. It would be a good start if we could change a few of these recipes for v2.8.

Also, I now realize that this could potentially have a large impact on our obs4MIPs data pool at DKRZ which has more than 3000 files (90 GB) and which are shipped to JASMIN. Given that these data pools can be used by recipe developers or users of previous versions of the ESMValTool modules, I do not plan to make changes to these data pools. Instead, I am thinking to benefit from the auto-download feature of the Tool to generate a new pool of obs4MIPs data at DKRZ. This could be in our shared download_dir and be later shipped to JASMIN in order to minimize the issues for our users.

And what about the outdated level facet, do we want to keep it or remove it?

I'm not really knowledgeable about this facet and would suggest to ask advice from the @ESMValGroup/scientific-lead-development-team.

@bouweandela
Copy link
Member Author

I am thinking to benefit from the auto-download feature of the Tool to generate a new pool of obs4MIPs data at DKRZ. This could be in our shared download_dir and be later shipped to JASMIN in order to minimize the issues for our users.

Sounds good to me! If you're concerned about the file size and you're sure it's the same file, you could also create some symlinks so the directory with the new version name points to the directory containing the old data.

@remi-kazeroni
Copy link
Contributor

It looks to me that it is getting a bit late in the release process to start a major update of the OBS4MIPs entries in our recipe. I would prefer to focus on fixing broken recipes, properly documenting the release and retesting recipes, rather than this work.

I think it would be great to "ESGFize" our recipes so that ESMValTool could download more data automatically and the tool could be more easily ported on different clusters and data pools. But this would take more than the one week we have left for v2.8. I'd be happy to contribute to these efforts once the release is finished. @bouweandela, would it be agreeable to bump this issue and related PRs to v2.9?

@rbeucher
Copy link
Contributor

Hi All,

I encountered that issue trying to set up ESMValTool on our system here at ACCESS-NRI.
@remi-kazeroni I would like to set up a pool of Obs4mip datasets. Can we work on this together?

@hot007
Copy link

hot007 commented Jul 12, 2023

Hey @rbeucher , I assume you're aware of qv56? the only problem is it isn't maintained, so as with CREATE-IP we may also need to ask NCI to update their data holdings... using a central ESMValTool pool of obs4MIPs data might be better though!

@rbeucher
Copy link
Contributor

Hi @hot007, Yes I am. I think that using a pool of data aligned with the ESMValTool group is probably easier to get for now...

@rbeucher
Copy link
Contributor

@bouweandela @remi-kazeroni . What do you do for derived variables like lwcre? Do you derived the variable and save it to a netCDF file?
Example is recipe_perfmetrics_CMIP5 that queries ESGF for lwcre in my case.

@remi-kazeroni
Copy link
Contributor

Hi @rbeucher, thanks for your interest in the Tool. My recommendation to set up an pool of obs4MIPs data would be to create it by using the automatic download from ESGF.

The problem here is that quite a few of our obs4MIPs entries in recipes do not follow ESGF standards but older, outdated ESGF ones. In many cases, that can be fixed by changing the dataset name (as explained in this #2974 (comment)) and removing the version facet completely. I would recommend to continue opening issues for each problematic recipe or dataset and then try a fix in a PR. I do not really have the capacity to work on that myself at the moment. I could help reviewing PRs. There we would try to make sure that recipes are runnable using obs4MIPs data from ESGF only and also in our testing facility at DKRZ.

But I suspect we may first need to clarify drs issues on your side, please see this #3293 (comment)

@rbeucher
Copy link
Contributor

Happy to help with this. It affects quite a lot of recipes so I think it is important we solve the issue.

@bouweandela
Copy link
Member Author

Pull requests with updates would be most welcome! You may want to have a look at our contribution guidelines to get more familiar with the process.

@rbeucher
Copy link
Contributor

rbeucher commented Aug 1, 2023

Sure. I'll submit PRs. I'm looking into it.

@zklaus
Copy link

zklaus commented Nov 14, 2023

Thanks to @rbeucher, we have a huge update. I'm closing this issue as resolved, though it's not impossible that further instances surface in the future. If that happens, we'll address it in a new issue.

🎉 Thanks, @rbeucher! 🎉

@zklaus zklaus closed this as completed Nov 14, 2023
@bouweandela
Copy link
Member Author

Thanks for the massive effort @rbeucher!

With these changes, there are 10 recipes remaining that are still affected by this issue. I'll re-open the issue so we can keep track of it. @ESMValGroup/esmvaltool-recipe-maintainers If your recipe is listed in this comment, could you please update your recipes according to the table above? #2974 (comment)

esmvaltool/recipes/recipe_schlund20esd.yml:1401:          - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-7, tier: 1}
esmvaltool/recipes/recipe_wenzel16jclim.yml:306:          - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-7, tier: 1, start_year: 2001}
esmvaltool/recipes/recipe_autoassess_landsurface_surfrad.yml:45:          - {dataset: CERES-EBAF,  project: obs4MIPs,  level: L3B,  version: Ed2-7,  start_year: 2001,  end_year: 2012, tier: 1}
esmvaltool/recipes/recipe_autoassess_landsurface_surfrad.yml:53:          - {dataset: CERES-EBAF,  project: obs4MIPs,  level: L3B,  version: Ed2-7,  start_year: 2001,  end_year: 2012, tier: 1}
esmvaltool/recipes/ipccwg1ar5ch9/recipe_flato13ipcc_figure_96.yml:217:      - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-7, tier: 1, start_year: 2003, end_year: 2011}
esmvaltool/recipes/ipccwg1ar5ch9/recipe_flato13ipcc_figure_96.yml:303:      - {dataset: GPCP-SG, project: obs4MIPs, level: L3, version: v2.2, tier: 1}
esmvaltool/recipes/ipccwg1ar5ch9/recipe_flato13ipcc_figure_96.yml:385:      - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-7, tier: 1, start_year: 2003, end_year: 2011}
esmvaltool/recipes/ipccwg1ar5ch9/recipe_flato13ipcc_figures_92_95.yml:495:      - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-7,
esmvaltool/recipes/ipccwg1ar5ch9/recipe_flato13ipcc_figures_92_95.yml:599:      - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-7,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_42_a.yml:347:      - {dataset: GPCP-SG, project: obs4MIPs, level: L3, version: v2.3, tier: 1}
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_42_a.yml:465:      - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-8,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_42_a.yml:506:      - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-8,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_42_a.yml:550:      - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-8,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_42_a.yml:623:      - {dataset: CERES-EBAF, project: obs4MIPs, level: L3B, version: Ed2-8,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_43.yml:263:      - {dataset: GPCP-SG, project: obs4mips, level: L3, version: v2.2, tier: 1}
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_43.yml:336:      - {dataset: CERES-EBAF, project: obs4mips, level: L3B, version: Ed2-8,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_43.yml:370:      - {dataset: CERES-EBAF, project: obs4mips, level: L3B, version: Ed2-8,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_43.yml:404:      - {dataset: CERES-EBAF, project: obs4mips, level: L3B, version: Ed2-8,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_fig_3_43.yml:437:      - {dataset: CERES-EBAF, project: obs4mips, level: L3B, version: Ed2-8,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_atmosphere.yml:928:      - {dataset: GPCP-SG, project: obs4MIPs, level: L3, version: v2.3,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_atmosphere.yml:957:      - {dataset: GPCP-SG, project: obs4MIPs, level: L3, version: v2.3,
esmvaltool/recipes/ipccwg1ar6ch3/recipe_ipccwg1ar6ch3_atmosphere.yml:1031:      - {dataset: GPCP-SG, project: obs4MIPs, level: L3, version: v2.3,
esmvaltool/recipes/recipe_smpi_4cds.yml:241:      - {dataset: GPCP-SG, project: obs4MIPs, level: L3, version: v2.2, tier: 1}
esmvaltool/recipes/recipe_ocean_quadmap.yml:59:       - {dataset: ATSR,  project: obs4MIPs,  level: L3,  version: ARC-v1.1.1,  start_year: 2001,  end_year: 2003, tier: 1}

@bouweandela bouweandela reopened this Nov 14, 2023
rbeucher added a commit to ACCESS-NRI/ESMValTool that referenced this issue Nov 14, 2023
rbeucher added a commit to ACCESS-NRI/ESMValTool that referenced this issue Nov 14, 2023
@valeriupredoi
Copy link
Contributor

massive massive good work from @rbeucher - very many thanks! This can moved to M2.11 since there are still a few open PRs - even though those are nearly ready, am fairly sure the release boys @zklaus and @bouweandela would prefer to release this week before 🎅 comes along

@valeriupredoi valeriupredoi modified the milestones: v2.10.0, v2.11.0 Dec 14, 2023
rbeucher added a commit that referenced this issue Feb 20, 2024
@mo-gill
Copy link
Contributor

mo-gill commented May 2, 2024

Hi, we are currently working on the ESMValTool release for v2.11.0. We're wondering if you'd be able to finalise this issue by the end of next week (Friday 10th May).

Otherwise, please let us know, and we'll move it into the next milestone for you 🙂

@mo-gill mo-gill modified the milestones: v2.11.0, v2.12.0 May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants