Hourly reanalysis downloading #296

ejsimley · 2024-06-04T16:02:58Z

This pull request adds two new functions to the utils/downloader module to download hourly reanalysis data: get_era5_hourly and get_merra2_hourly. These functions are modified from get_era5_monthly and get_merra2_monthly contributed by @charlie9578, and similarly download data from the CDS and GES DISC services.

Note that it can take a long time to download historical data (~1 day for a 20-year time series). Downloading era5 data seems to be faster than merra2 for me, though.

codecov-commenter · 2024-06-04T16:11:49Z

Codecov Report

Attention: Patch coverage is 0% with 124 lines in your changes missing coverage. Please review.

Project coverage is 70.14%. Comparing base (a53308e) to head (33c940f).
Report is 13 commits behind head on develop.

Files	Patch %	Lines
openoa/utils/downloader.py	0.00%	124 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #296      +/-   ##
===========================================
- Coverage    72.49%   70.14%   -2.35%     
===========================================
  Files           29       29              
  Lines         3690     3815     +125     
  Branches       796      819      +23     
===========================================
+ Hits          2675     2676       +1     
- Misses         826      950     +124     
  Partials       189      189

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RHammond2

Thanks for putting this together, it's a great addition to the toolkit! The only comment holding me back from an approval was the one regarding the monthly files from ERA5, and then all the base files from the MERRA2. I'm thinking we don't really need them, and should ditch them, but I could be swayed towards leaving it as-is as well.

RHammond2 · 2024-07-01T22:58:29Z

openoa/utils/downloader.py

+            lat + node_spacing,
+            lon - node_spacing,
+            lat - node_spacing,
+            lon + node_spacing,


I don't think that I quite follow the node_spacing logic, could you just explain that a bit?

RHammond2 · 2024-07-01T23:06:33Z

openoa/utils/downloader.py

+    # set up cds-api client
+    try:
+        c = cdsapi.Client()
+    except Exception as e:


Is this what's raised from the API? I thought it was more specific for some reason, or are you just trying to capture any failed connection, regardless of reason?

RHammond2 · 2024-07-01T23:08:42Z

openoa/utils/downloader.py

+        ],
+        "year": None,
+        "month": None,
+        "day": [


I wonder if we could do [f"{i:02d}" for i in range(1, 32)] just to trim the size of the list

RHammond2 · 2024-07-01T23:09:29Z

openoa/utils/downloader.py

+            "31",
+        ],
+        "time": [
+            "00:00",


Same here, but with [f"{i:02d}:00" for i in range(24)].

RHammond2 · 2024-07-01T23:15:47Z

openoa/utils/downloader.py

+    As well as returning the data as a dataframe, the data is also saved as monthly NetCDF files and
+    a csv file with the concatenated data. These are located in the "save_pathname" directory, with
+    "save_filename" prefix. This allows future loading without download from the CDS service.


I was thinking about this, and I think we should just delete the monthly files, and save as one single netcdf and csv file as it's less onerous on the user who'll have to delete the originals if they're no longer required.

RHammond2 · 2024-07-01T23:18:11Z

openoa/utils/downloader.py

+
+        for month in months:
+            # get the file names from the GES DISC site for the year
+            result = requests.get(base_url + str(year) + "/%02d" % month)


The string portion could get tidied up to f"{base_url}{year}/{month:02d}"
if I followed that correctly.

RHammond2 · 2024-07-01T23:19:30Z

openoa/utils/downloader.py

+            months = list(range(1, end_date.month + 1, 1))
+        else:
+            months = list(range(1, 12 + 1, 1))


The extra 1 at the end is the default, even when the starting point is changed. Though if that's just a style thing, then it doesn't bother me, but thought I'd flag it just in case.

ejsimley added 6 commits May 6, 2024 13:04

adding merra2 hourly downloader

de15147

adding era5 hourly downloader

9e248a3

removing 2 month lag in requested reanalysis data to download

4fa9adb

updating downloader documenentation

21d5521

adding downloader to sphinx documentation

6939e3b

Merge branch 'develop' into enhancement/hourly_reanalysis_downloading

33c940f

ejsimley added the enhancement label Jun 4, 2024

ejsimley requested a review from RHammond2 June 4, 2024 16:02

RHammond2 reviewed Jul 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hourly reanalysis downloading #296

Hourly reanalysis downloading #296

ejsimley commented Jun 4, 2024

codecov-commenter commented Jun 4, 2024 •

edited

Loading

RHammond2 left a comment

RHammond2 Jul 1, 2024

RHammond2 Jul 1, 2024

RHammond2 Jul 1, 2024

RHammond2 Jul 1, 2024

RHammond2 Jul 1, 2024

RHammond2 Jul 1, 2024

RHammond2 Jul 1, 2024

Hourly reanalysis downloading #296

Are you sure you want to change the base?

Hourly reanalysis downloading #296

Conversation

ejsimley commented Jun 4, 2024

codecov-commenter commented Jun 4, 2024 • edited Loading

Codecov Report

RHammond2 left a comment

Choose a reason for hiding this comment

RHammond2 Jul 1, 2024

Choose a reason for hiding this comment

RHammond2 Jul 1, 2024

Choose a reason for hiding this comment

RHammond2 Jul 1, 2024

Choose a reason for hiding this comment

RHammond2 Jul 1, 2024

Choose a reason for hiding this comment

RHammond2 Jul 1, 2024

Choose a reason for hiding this comment

RHammond2 Jul 1, 2024

Choose a reason for hiding this comment

RHammond2 Jul 1, 2024

Choose a reason for hiding this comment

codecov-commenter commented Jun 4, 2024 •

edited

Loading