Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

Comparison of LDEO / ESGF / CEDA Cloud holdings #8

Open
rabernat opened this issue Oct 30, 2020 · 8 comments
Open

Comparison of LDEO / ESGF / CEDA Cloud holdings #8

rabernat opened this issue Oct 30, 2020 · 8 comments

Comments

@rabernat
Copy link
Contributor

@naomi-henderson and @aradhakrishnanGFDL did a very interesting "diff" comparison of the two different cloud data stores.

I'm linking to Naomi's gist to have a persistent record of this: https://gist.github.com/naomi-henderson/d2ea493e8bc705f5551f7fce9c0402e5

It would be great to get CEDA included in this as well once they are up and running (cc @RuthPetrie). This exercise is somewhat related to the discussion of directory structure / catalog format in #7.

@rabernat
Copy link
Contributor Author

Actually, one useful step would be to try to schedule this notebook so it runs automatically on a cron job via a github worklow.

There are four catalog files references in the notebook:

Of these four, only one (the http one) is available online and can thus be read as part of a CI job. What would it take to get the others online? Can we automate the generation of all of the catalogs?

@RuthPetrie
Copy link

RuthPetrie commented Oct 30, 2020

@naomi-henderson the dataset list we are currently working on can be found at https://github.com/cedadev/cmip6-object-store/blob/master/data/cmip6-datasets_2020-10-27.csv.gz, (let me know if you have access issues) note this isn't the full IPCC AR6 WG1 priority list but a subset of this. Not all are yet available via our S3 store but this is what we are aiming for in the short term, hopefully it will grow in the future. Your analysis is very interesting (and surprising) but it will help to guide us as we start populating our CMIP6 holdings.

@aradhakrishnanGFDL
Copy link

  • local.csv
    Of these four, only one (the http one) is available online and can thus be read as part of a CI job. What would it take to get the others online? Can we automate the generation of all of the catalogs?

Hi @rabernat found below is the CSV (intake-esm friendly) I passed along to Naomi. This should correspond to local.csv in the notebook I think. https://github.com/aradhakrishnanGFDL/CatalogBuilder/blob/master/intakebuilder/test/intake_uda.csv.gz
Since my csv had multiple versions listed for some of the datasets (which we need to fix at our internal UDA- Unified Data Archive level) ), Naomi must have created GFDL-ESGF-lastVersion.csv.

@naomi-henderson
Copy link
Contributor

@RuthPetrie , thanks. The csv files looks just right for this purpose - thanks for the sizes as well!

@rabernat
Copy link
Contributor Author

@naomi-henderson--if you can refactor your notebook to only pull data from HTTP, I'll set it up to run via a github workflow on this repo.

❤️ automation.

@naomi-henderson
Copy link
Contributor

@rabernat - yes, working on it ...

@naomi-henderson
Copy link
Contributor

@rabernat - Here is same notebook with a 3-way comparison using the HTTP available catalogs: https://gist.github.com/naomi-henderson/4876a860e262c48209f6e981f6d1fe47

@naomi-henderson
Copy link
Contributor

Hi all - it is very exciting to see all of this data being made so much more available! In particular, we have long regretted that much of the GFDL contribution to CMIP6 was difficult to obtain. The main LLNL-ESGF Search API often only returns URLs for the GFDL thredds server (https://esgf-data1.llnl.gov/thredds) which is problematic. Specifying a list of shards which excludes this server is possible, but a pain in the neck. So researchers who do not have access to the GFDL file system have often just skipped the GFDL models in their analysis.

So the fantastic GFDL models should now have increased visibility! Thanks to all, and hopefully other ESGF Data Nodes will be able to do the same.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants