-
Notifications
You must be signed in to change notification settings - Fork 5
Comparison of LDEO / ESGF / CEDA Cloud holdings #8
Comments
Actually, one useful step would be to try to schedule this notebook so it runs automatically on a cron job via a github worklow. There are four catalog files references in the notebook:
Of these four, only one (the http one) is available online and can thus be read as part of a CI job. What would it take to get the others online? Can we automate the generation of all of the catalogs? |
@naomi-henderson the dataset list we are currently working on can be found at https://github.com/cedadev/cmip6-object-store/blob/master/data/cmip6-datasets_2020-10-27.csv.gz, (let me know if you have access issues) note this isn't the full IPCC AR6 WG1 priority list but a subset of this. Not all are yet available via our S3 store but this is what we are aiming for in the short term, hopefully it will grow in the future. Your analysis is very interesting (and surprising) but it will help to guide us as we start populating our CMIP6 holdings. |
Hi @rabernat found below is the CSV (intake-esm friendly) I passed along to Naomi. This should correspond to local.csv in the notebook I think. https://github.com/aradhakrishnanGFDL/CatalogBuilder/blob/master/intakebuilder/test/intake_uda.csv.gz |
@RuthPetrie , thanks. The csv files looks just right for this purpose - thanks for the sizes as well! |
@naomi-henderson--if you can refactor your notebook to only pull data from HTTP, I'll set it up to run via a github workflow on this repo. ❤️ automation. |
@rabernat - yes, working on it ... |
@rabernat - Here is same notebook with a 3-way comparison using the HTTP available catalogs: https://gist.github.com/naomi-henderson/4876a860e262c48209f6e981f6d1fe47 |
Hi all - it is very exciting to see all of this data being made so much more available! In particular, we have long regretted that much of the GFDL contribution to CMIP6 was difficult to obtain. The main LLNL-ESGF Search API often only returns URLs for the GFDL thredds server (https://esgf-data1.llnl.gov/thredds) which is problematic. Specifying a list of shards which excludes this server is possible, but a pain in the neck. So researchers who do not have access to the GFDL file system have often just skipped the GFDL models in their analysis. So the fantastic GFDL models should now have increased visibility! Thanks to all, and hopefully other ESGF Data Nodes will be able to do the same. |
@naomi-henderson and @aradhakrishnanGFDL did a very interesting "diff" comparison of the two different cloud data stores.
I'm linking to Naomi's gist to have a persistent record of this: https://gist.github.com/naomi-henderson/d2ea493e8bc705f5551f7fce9c0402e5
It would be great to get CEDA included in this as well once they are up and running (cc @RuthPetrie). This exercise is somewhat related to the discussion of directory structure / catalog format in #7.
The text was updated successfully, but these errors were encountered: