-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update load processing; part of #67 #170
Update load processing; part of #67 #170
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks @JanFrederickUnnewehr for contributing again, good job! I reviewed the first bit. Next is coming
scripts/build_load_data.py
Outdated
|
||
# Save location | ||
to_fn = Path(f"{rootpath}/data/time_series_60min_singleindex.csv") | ||
|
||
logger.info(f"Downloading load data from '{url}'.") | ||
|
||
progress_retrieve(url, to_fn) | ||
|
||
logger.info(f"Raw load data available at '{to_fn}'.") | ||
|
||
opsd_load = (load_timeseries_opsd(years = slice(*pd.date_range(freq='y', **snakemake.config['snapshots'])[[0,-1]].year.astype(str)), | ||
fn=to_fn, | ||
countries = snakemake.config['countries'], | ||
source = snakemake.config['load']['source'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Save location | |
to_fn = Path(f"{rootpath}/data/time_series_60min_singleindex.csv") | |
logger.info(f"Downloading load data from '{url}'.") | |
progress_retrieve(url, to_fn) | |
logger.info(f"Raw load data available at '{to_fn}'.") | |
opsd_load = (load_timeseries_opsd(years = slice(*pd.date_range(freq='y', **snakemake.config['snapshots'])[[0,-1]].year.astype(str)), | |
fn=to_fn, | |
countries = snakemake.config['countries'], | |
source = snakemake.config['load']['source'])) | |
opsd_load = (load_timeseries_opsd(years = slice(*pd.date_range(freq='y', **snakemake.config['snapshots'])[[0,-1]].year.astype(str)), | |
fn=url, | |
countries = snakemake.config['countries'], | |
source = snakemake.config['load']['source'])) |
Directly reading from url is fine, as we do not really need the raw data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have also thought that it might be useful to separate the loading and processing of load data. Then we could control the automatic loading of the data like in the data bundle. Maybe this is a bit too much work just for the load data. Maybe it makes sense to integrate everything into the load data bundle function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to do it here. Its good to have as many things as possible outside of the databundle. It causes extra work to maintain it properly. The more is retrieved automatically, the better.
Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>
Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>
Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>
Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>
…ickUnnewehr/pypsa-eur into update_load_processing
Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>
maybe we could add a switch if we want to manually fill the gaps as in the old timeseries_opsd? See PR in FRESNA https://github.com/FRESNA/vresutils/pull/14/files ("manual_alterations"), as there's many gaps lasting for a couple of hours only, but others as well for many months. Maybe we could add a switch in the config manual_load_alterations (or similar): True/False? Then it's up to the user |
Moin, check out the new version of |
you copy often from start to stop, maybe for that you can use my function as in vresutils:
and then use it as, for example, also, if you interpolate, you can add a limit of how many hours should be interpolated at max. Before, we used 4 hours. You update Kosovo and Albania twice, line 142+
and then in 370+
but with different factors. which one is correct? you can try to use my suggestions from the PR in vresutils I mentioned before, I also discussed them a while ago with @coroa. But your "alterations" seem more complete. Did you take into account holidays, weekends, etc.? Now we have a code duplicate once in vresutils and here... not sure what to do with it. Is someone else using vresutils? If its just PyPSA-Eur, then we can delete it there and keep only this version otherwise we have two different sources people could use and they are not exactly the same, so produce different results. Super bad for debugging later on |
scripts/add_electricity.py
Outdated
def load_opsd_loaddata(load_fn=None, countries=None): | ||
if load_fn is None: | ||
load_fn = snakemake.input.load | ||
|
||
if countries is None: | ||
countries = snakemake.config['countries'] | ||
|
||
load = pd.read_csv(load_fn, index_col=0, parse_dates=True) | ||
load = load.filter(items=countries) | ||
|
||
return (load) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed anymore, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right!
scripts/build_load_data.py
Outdated
# # check the number and lenght of gaps | ||
nan_stats = nan_statistics(opsd_load) | ||
|
||
gap_filling_threshold = snakemake.config['load']['gap_filling_threshold'] | ||
|
||
if nan_stats.consecutive.max() > gap_filling_threshold: | ||
logger.warning(f"Load data contains consecutive gaps of longer than '{gap_filling_threshold}' hours! Check dataset carefully!") | ||
|
||
# adjust gaps and interpolate load data | ||
logger.info(f"Gaps of {gap_filling_threshold} hours filled with data from previous week. Smaler gaps interpolated linearly.") | ||
opsd_load = opsd_load.apply(fill_large_gaps, gapsize=gap_filling_threshold).interpolate(method='linear', limit=gap_filling_threshold) | ||
|
||
# adjust gaps manuel | ||
if snakemake.config['load']['adjust_gaps_manuel']: | ||
logger.info(f"Load data are adjusted manual.") | ||
opsd_load = manual_adjustment(load=opsd_load, source=snakemake.config['load']['source']) | ||
|
||
# check the number and lenght of gaps after adjustment and interpolating | ||
nan_stats = nan_statistics(opsd_load) | ||
|
||
if nan_stats.consecutive.max() > gap_filling_threshold: | ||
logger.warning(f'Load data contains gaps after manuel adjustment. Modify manual_adjustment() function!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're almost there. I would restructure this part a bit (sorry for many iterations).
- manual corrections if enabled
- interpolate gaps < gap_size_interpolated (with info) if nan exist
- fill by weekly shift (with a warning) if nan exist
- raise a error if nan still exist
Like this we are prioritizing the manual corrections and do the heuristic sanitizing afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we leave the warning at the beginning that the data set contains gaps? In line 411 & 412
The new order is a good idea!
sorry I hit the wrong bottom :D |
That is a good idea. I will integrate in my code.
Limit is now implemented as
The correction are for two different years. The first one only for years before 2016 the secound one for year 2018.
Holidays are not included. Any idea, how to integrate it in an automated way?
When the changes are transferred to the master, the previous used functions "vresutils.load" are simply no longer imported. Is that what you mean? |
Will that work, if you run i.e. 2013-2018? Wouldn't then years 2013-2017 be adapted with the 2018 value?
No, sorry. Unfortunately I did it brute force. But maybe, if you add an "holidays" array, and then check if the gap you want to fill is in range of this array, the timedelta increases by one. Also, a weekend days should not be filled with a working day... would that work?
I think we should remove one of them, otherwise it's just more code to be maintained, which basically should be 1:1 the same and that's just confusing |
Is it intended that the simulation period of one calculation (one model run) is longer than one year? If so, the function
I would ignore that for the moment.
Surely we should have only one version of load processing at the end. Let's see what @coroa has to say about this. |
After a fruitful discussion with @FabianHofmann we agreed on the following procedure: Code structure:
Is there a way to automate the generation of the file |
Hey @JanFrederickUnnewehr, I'm closing this in the favour of #211 . If you still want to integrate the power statistics time series there should be a good way. But we can discuss it in the other PR or via mail/phone. |
Is part of #67
Changes proposed in this Pull Request
The following update concerns the processing of load data in pypsa-eur.
Added a rule (
build_load_data
) that downloads the latest load data from the OPSD website.The resulting data are cleaned and gaps filled based on manual filling methods. Before and after filling gaps the rule proved information about the gaps (length, frequency) by the function
nan_statistics(df)
.Checklist
environment.yaml
andenvironment.docs.yaml
.config.default.yaml
,config.tutorial.yaml
, andtest/config.test1.yaml
.doc/configtables/*.csv
and line references are adjusted indoc/configuration.rst
anddoc/tutorial.rst
.doc/release_notes.rst
is amended in the format of previous release notes.Is there a way to automate the generation of the file
doc/configuration.rst
.Lets discuss first the proposed changes before I will add it to the
doc/release_notes.rst
.