-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hourly reanalysis downloading #296
base: develop
Are you sure you want to change the base?
Hourly reanalysis downloading #296
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #296 +/- ##
===========================================
- Coverage 72.49% 70.14% -2.35%
===========================================
Files 29 29
Lines 3690 3815 +125
Branches 796 819 +23
===========================================
+ Hits 2675 2676 +1
- Misses 826 950 +124
Partials 189 189 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, it's a great addition to the toolkit! The only comment holding me back from an approval was the one regarding the monthly files from ERA5, and then all the base files from the MERRA2. I'm thinking we don't really need them, and should ditch them, but I could be swayed towards leaving it as-is as well.
lat + node_spacing, | ||
lon - node_spacing, | ||
lat - node_spacing, | ||
lon + node_spacing, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that I quite follow the node_spacing logic, could you just explain that a bit?
# set up cds-api client | ||
try: | ||
c = cdsapi.Client() | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what's raised from the API? I thought it was more specific for some reason, or are you just trying to capture any failed connection, regardless of reason?
], | ||
"year": None, | ||
"month": None, | ||
"day": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could do [f"{i:02d}" for i in range(1, 32)]
just to trim the size of the list
"31", | ||
], | ||
"time": [ | ||
"00:00", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, but with [f"{i:02d}:00" for i in range(24)]
.
As well as returning the data as a dataframe, the data is also saved as monthly NetCDF files and | ||
a csv file with the concatenated data. These are located in the "save_pathname" directory, with | ||
"save_filename" prefix. This allows future loading without download from the CDS service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this, and I think we should just delete the monthly files, and save as one single netcdf and csv file as it's less onerous on the user who'll have to delete the originals if they're no longer required.
|
||
for month in months: | ||
# get the file names from the GES DISC site for the year | ||
result = requests.get(base_url + str(year) + "/%02d" % month) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string portion could get tidied up to f"{base_url}{year}/{month:02d}"
if I followed that correctly.
months = list(range(1, end_date.month + 1, 1)) | ||
else: | ||
months = list(range(1, 12 + 1, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra 1 at the end is the default, even when the starting point is changed. Though if that's just a style thing, then it doesn't bother me, but thought I'd flag it just in case.
This pull request adds two new functions to the
utils/downloader
module to download hourly reanalysis data:get_era5_hourly
andget_merra2_hourly
. These functions are modified fromget_era5_monthly
andget_merra2_monthly
contributed by @charlie9578, and similarly download data from the CDS and GES DISC services.Note that it can take a long time to download historical data (~1 day for a 20-year time series). Downloading era5 data seems to be faster than merra2 for me, though.