Update load processing; part of #67 #170

JanFrederickUnnewehr · 2020-07-10T09:46:41Z

Is part of #67

Changes proposed in this Pull Request

The following update concerns the processing of load data in pypsa-eur.
Added a rule (build_load_data) that downloads the latest load data from the OPSD website.
The resulting data are cleaned and gaps filled based on manual filling methods. Before and after filling gaps the rule proved information about the gaps (length, frequency) by the function nan_statistics(df).

Checklist

I tested my contribution locally and it seems to work fine.
Code and workflow changes are sufficiently documented.
Newly introduced dependencies are added to environment.yaml and environment.docs.yaml.
Changes in configuration options are added in all of config.default.yaml, config.tutorial.yaml, and test/config.test1.yaml.
Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
A note for the release notes doc/release_notes.rst is amended in the format of previous release notes.

Is there a way to automate the generation of the file doc/configuration.rst.
Lets discuss first the proposed changes before I will add it to the doc/release_notes.rst.

FabianHofmann

Many thanks @JanFrederickUnnewehr for contributing again, good job! I reviewed the first bit. Next is coming

scripts/add_electricity.py

scripts/build_load_data.py

FabianHofmann · 2020-07-10T15:53:04Z

scripts/build_load_data.py

+
+    # Save location
+    to_fn = Path(f"{rootpath}/data/time_series_60min_singleindex.csv")
+
+    logger.info(f"Downloading load data from '{url}'.")
+
+    progress_retrieve(url, to_fn)
+
+    logger.info(f"Raw load data available at '{to_fn}'.")
+
+    opsd_load = (load_timeseries_opsd(years = slice(*pd.date_range(freq='y', **snakemake.config['snapshots'])[[0,-1]].year.astype(str)),
+                                 fn=to_fn,
+                                 countries = snakemake.config['countries'],
+                                 source = snakemake.config['load']['source']))


Suggested change

# Save location

to_fn = Path(f"{rootpath}/data/time_series_60min_singleindex.csv")

logger.info(f"Downloading load data from '{url}'.")

progress_retrieve(url, to_fn)

logger.info(f"Raw load data available at '{to_fn}'.")

opsd_load = (load_timeseries_opsd(years = slice(*pd.date_range(freq='y', **snakemake.config['snapshots'])[[0,-1]].year.astype(str)),

fn=to_fn,

countries = snakemake.config['countries'],

source = snakemake.config['load']['source']))

opsd_load = (load_timeseries_opsd(years = slice(*pd.date_range(freq='y', **snakemake.config['snapshots'])[[0,-1]].year.astype(str)),

fn=url,

countries = snakemake.config['countries'],

source = snakemake.config['load']['source']))

Directly reading from url is fine, as we do not really need the raw data

OK, I have also thought that it might be useful to separate the loading and processing of load data. Then we could control the automatic loading of the data like in the data bundle. Maybe this is a bit too much work just for the load data. Maybe it makes sense to integrate everything into the load data bundle function.

I think it's fine to do it here. Its good to have as many things as possible outside of the databundle. It causes extra work to maintain it properly. The more is retrieved automatically, the better.

Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>

…ickUnnewehr/pypsa-eur into update_load_processing

Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>

martacki · 2020-07-16T10:25:01Z

maybe we could add a switch if we want to manually fill the gaps as in the old timeseries_opsd? See PR in FRESNA https://github.com/FRESNA/vresutils/pull/14/files ("manual_alterations"), as there's many gaps lasting for a couple of hours only, but others as well for many months. Maybe we could add a switch in the config manual_load_alterations (or similar): True/False? Then it's up to the user

JanFrederickUnnewehr · 2020-07-16T10:48:11Z

Moin, check out the new version of build_load_data . I have tried to include all suggestions in the code. Did I forget anything? We could leave the big manual adjustment part as an example but also shorten it, as it is not complete. See 'GB'

martacki · 2020-07-16T12:04:42Z

you copy often from start to stop, maybe for that you can use my function as in vresutils:

def copy_timeslice(load, cntry, start, stop, delta):
    start = pd.Timestamp(start)
    stop = pd.Timestamp(stop)
    if start in load.index and stop in load.index:
        load.loc[start:stop, cntry] = load.loc[start-delta:stop-delta, cntry].values
    return load

and then use it as, for example, copy_timeslice(load, 'GR', '2015-08-11 21:00', '2015-08-15 20:00', pd.Timedelta(weeks=1)). Saves many many lines of code and makes it more readable?

also, if you interpolate, you can add a limit of how many hours should be interpolated at max. Before, we used 4 hours.
load[interpolate_countries] = load[interpolate_countries].interpolate(limit=4).

You update Kosovo and Albania twice, line 142+

load['KV'] = load['RS'] * (4.8 / 27.)
load['AL'] = load['MK'] * (4.1 / 7.4)

and then in 370+

load['KV'] = load['RS'] * (5. / 33.)
load['AL'] = load['MK'] * (6.0 / 7.0)

but with different factors. which one is correct?

you can try to use my suggestions from the PR in vresutils I mentioned before, I also discussed them a while ago with @coroa. But your "alterations" seem more complete. Did you take into account holidays, weekends, etc.?

Now we have a code duplicate once in vresutils and here... not sure what to do with it. Is someone else using vresutils? If its just PyPSA-Eur, then we can delete it there and keep only this version otherwise we have two different sources people could use and they are not exactly the same, so produce different results. Super bad for debugging later on

FabianHofmann · 2020-07-16T11:19:51Z

scripts/add_electricity.py

+def load_opsd_loaddata(load_fn=None, countries=None):
+    if load_fn is None:
+        load_fn = snakemake.input.load
+
+    if countries is None:
+        countries = snakemake.config['countries']
+
+    load = pd.read_csv(load_fn, index_col=0, parse_dates=True)
+    load = load.filter(items=countries)
+
+    return (load)


this is not needed anymore, right?

Yes, right!

FabianHofmann · 2020-07-16T12:57:14Z

scripts/build_load_data.py

+    # # check the number and lenght of gaps
+    nan_stats = nan_statistics(opsd_load)
+
+    gap_filling_threshold = snakemake.config['load']['gap_filling_threshold']
+
+    if nan_stats.consecutive.max() > gap_filling_threshold:        
+        logger.warning(f"Load data contains consecutive gaps of longer than '{gap_filling_threshold}' hours! Check dataset carefully!")
+
+    # adjust gaps and interpolate load data
+    logger.info(f"Gaps of {gap_filling_threshold} hours filled with data from previous week. Smaler gaps interpolated linearly.")
+    opsd_load = opsd_load.apply(fill_large_gaps, gapsize=gap_filling_threshold).interpolate(method='linear', limit=gap_filling_threshold)
+
+    # adjust gaps manuel
+    if snakemake.config['load']['adjust_gaps_manuel']:
+        logger.info(f"Load data are adjusted manual.")
+        opsd_load = manual_adjustment(load=opsd_load, source=snakemake.config['load']['source'])
+
+    # check the number and lenght of gaps after adjustment and interpolating
+    nan_stats = nan_statistics(opsd_load)
+
+    if nan_stats.consecutive.max() > gap_filling_threshold:        
+        logger.warning(f'Load data contains gaps after manuel adjustment. Modify manual_adjustment() function!')


I think we're almost there. I would restructure this part a bit (sorry for many iterations).

manual corrections if enabled

interpolate gaps < gap_size_interpolated (with info) if nan exist

fill by weekly shift (with a warning) if nan exist

raise a error if nan still exist

Like this we are prioritizing the manual corrections and do the heuristic sanitizing afterwards

Should we leave the warning at the beginning that the data set contains gaps? In line 411 & 412
The new order is a good idea!

FabianHofmann · 2020-07-16T13:00:37Z

sorry I hit the wrong bottom :D

JanFrederickUnnewehr · 2020-07-16T14:02:51Z

you copy often from start to stop, maybe for that you can use my function as in vresutils:
def copy_timeslice(load, cntry, start, stop, delta):
    start = pd.Timestamp(start)
    stop = pd.Timestamp(stop)
    if start in load.index and stop in load.index:
        load.loc[start:stop, cntry] = load.loc[start-delta:stop-delta, cntry].values
    return load
and then use it as, for example, copy_timeslice(load, 'GR', '2015-08-11 21:00', '2015-08-15 20:00', pd.Timedelta(weeks=1)). Saves many many lines of code and makes it more readable?

That is a good idea. I will integrate in my code.

also, if you interpolate, you can add a limit of how many hours should be interpolated at max. Before, we used 4 hours.
load[interpolate_countries] = load[interpolate_countries].interpolate(limit=4).

Limit is now implemented as gap_filling_threshold

You update Kosovo and Albania twice, line 142+
load['KV'] = load['RS'] * (4.8 / 27.)
load['AL'] = load['MK'] * (4.1 / 7.4)
and then in 370+
load['KV'] = load['RS'] * (5. / 33.)
load['AL'] = load['MK'] * (6.0 / 7.0)
but with different factors. which one is correct?

The correction are for two different years. The first one only for years before 2016 the secound one for year 2018.
"scale parameter selected by energy consumption ratio from IEA Data browser for the year 2017"

you can try to use my suggestions from the PR in vresutils I mentioned before, I also discussed them a while ago with @coroa. But your "alterations" seem more complete. Did you take into account holidays, weekends, etc.?

Holidays are not included. Any idea, how to integrate it in an automated way?

Now we have a code duplicate once in vresutils and here... not sure what to do with it. Is someone else using vresutils? If its just PyPSA-Eur, then we can delete it there and keep only this version otherwise we have two different sources people could use and they are not exactly the same, so produce different results. Super bad for debugging later on

When the changes are transferred to the master, the previous used functions "vresutils.load" are simply no longer imported. Is that what you mean?

martacki · 2020-07-16T14:17:40Z

The correction are for two different years. The first one only for years before 2016 the secound one for year 2018.
"scale parameter selected by energy consumption ratio from IEA Data browser for the year 2017"

Will that work, if you run i.e. 2013-2018? Wouldn't then years 2013-2017 be adapted with the 2018 value?

Holidays are not included. Any idea, how to integrate it in an automated way?

No, sorry. Unfortunately I did it brute force. But maybe, if you add an "holidays" array, and then check if the gap you want to fill is in range of this array, the timedelta increases by one. Also, a weekend days should not be filled with a working day... would that work?

When the changes are transferred to the master, the previous used functions "vresutils.load" are simply no longer imported. Is that what you mean?

I think we should remove one of them, otherwise it's just more code to be maintained, which basically should be 1:1 the same and that's just confusing

JanFrederickUnnewehr · 2020-07-16T15:00:17Z

Will that work, if you run i.e. 2013-2018? Wouldn't then years 2013-2017 be adapted with the 2018 value?

Is it intended that the simulation period of one calculation (one model run) is longer than one year? If so, the function manual_adjustment() must be modified. The old code allows more years but only one data source ("ENTSOE_power_statistics") and the manual adjustment is not complete for all countries and years. This code currently only allows one year at a time but both data sources ("ENTSOE_transparency" or "ENTSOE_power_statistics").
"ENTSOE_power_statistics" is only available until mid of 2019, as far as I know

Holidays are not included. Any idea, how to integrate it in an automated way?

I would ignore that for the moment.

"vresutils.load"

Surely we should have only one version of load processing at the end. Let's see what @coroa has to say about this.

JanFrederickUnnewehr · 2020-07-21T15:12:49Z

After a fruitful discussion with @FabianHofmann we agreed on the following procedure:
-When downloading the data, they are automatically filtered for the simulation period (snapshots from config)
-missing countries are "added" in the manual adjustment function
-the scaling parameters are not automatically adjusted by the selected simulation year
-the scaling factor for the manual "adding" of countries must be changed manually by the user in the code
-in the manual adjustment function, a distinction is still made between the two sources in order to be able to include "ENTSOE_transparency" data for future load time series

Code structure:

check raw data and show warning if data contains gaps longer than "gap_filling_threshold"
manual adjustment of the data (if the user sets adjust_gaps_manual: true)
for larger gaps (min= 3 hours, max = 7 days) copy the previous week
interpolate the data (considering the gap_filling_threshold: 3 #hours)
test if there are still gaps, if yes stop with warning and point to manual adjustment function

Is there a way to automate the generation of the file doc/configuration.rst ?
Lets discuss first the proposed changes before I will add it to the doc/release_notes.rst ?

FabianHofmann · 2020-12-02T12:17:29Z

Hey @JanFrederickUnnewehr, I'm closing this in the favour of #211 . If you still want to integrate the power statistics time series there should be a good way. But we can discuss it in the other PR or via mail/phone.

JanFrederickUnnewehr added 7 commits July 9, 2020 18:35

build_load_data

931c915

Add documentation

1e41504

updating load data import

aff13a4

Update Config files

a626b2c

Update load.csv

26a234c

Update add_electricity.py

6891646

change log file name

6ab3168

FabianHofmann reviewed Jul 10, 2020

View reviewed changes

scripts/add_electricity.py Outdated Show resolved Hide resolved

scripts/build_load_data.py Outdated Show resolved Hide resolved

scripts/build_load_data.py Outdated Show resolved Hide resolved

scripts/build_load_data.py Outdated Show resolved Hide resolved

FabianHofmann reviewed Jul 10, 2020

View reviewed changes

scripts/build_load_data.py Show resolved Hide resolved

FabianHofmann reviewed Jul 10, 2020

View reviewed changes

JanFrederickUnnewehr and others added 8 commits July 15, 2020 14:54

Update scripts/add_electricity.py

1091810

Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>

Update scripts/build_load_data.py

2fc6ac6

Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>

Update scripts/build_load_data.py

9a6b7b2

Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>

Update scripts/build_load_data.py

59b5374

Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>

Update build_load_data.py

43ea7c7

Merge branch 'update_load_processing' of https://github.com/JanFreder…

53312e8

…ickUnnewehr/pypsa-eur into update_load_processing

Update build_load_data.py

b64ca9d

Update scripts/build_load_data.py

f657f27

Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de>

JanFrederickUnnewehr added 2 commits July 16, 2020 12:42

update gap handling in build_load_data

cd50f19

Update build_load_data.py

3f0b710

JanFrederickUnnewehr added 3 commits July 16, 2020 13:09

Update config.test1.yaml

829de6f

update test.config

3072809

Update config.tutorial.yaml

8456e88

FabianHofmann reviewed Jul 16, 2020

View reviewed changes

FabianHofmann closed this Jul 16, 2020

FabianHofmann reopened this Jul 16, 2020

update load csv function for load data

2bc61d6

JanFrederickUnnewehr added 3 commits July 16, 2020 17:06

Update build_load_data.py

4b6f797

Update config.test1.yaml

3e5a64e

Update add_electricity.py

43347ab

JanFrederickUnnewehr added 2 commits July 22, 2020 13:46

Update build_load_data.py

ef82c95

Added error messages if load data contains gaps after modifications

056e751

fneum added this to the Release v0.2.1 milestone Sep 26, 2020

FabianHofmann mentioned this pull request Dec 2, 2020

Update load processing #211

Merged

5 tasks

FabianHofmann closed this Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update load processing; part of #67 #170

Update load processing; part of #67 #170

JanFrederickUnnewehr commented Jul 10, 2020

FabianHofmann left a comment

FabianHofmann Jul 10, 2020

JanFrederickUnnewehr Jul 15, 2020

FabianHofmann Jul 15, 2020

martacki commented Jul 16, 2020

JanFrederickUnnewehr commented Jul 16, 2020

martacki commented Jul 16, 2020 •

edited

Loading

FabianHofmann Jul 16, 2020

JanFrederickUnnewehr Jul 16, 2020

FabianHofmann Jul 16, 2020

JanFrederickUnnewehr Jul 16, 2020

FabianHofmann commented Jul 16, 2020 •

edited

Loading

JanFrederickUnnewehr commented Jul 16, 2020

martacki commented Jul 16, 2020 •

edited

Loading

JanFrederickUnnewehr commented Jul 16, 2020 •

edited

Loading

JanFrederickUnnewehr commented Jul 21, 2020 •

edited

Loading

FabianHofmann commented Dec 2, 2020

Update load processing; part of #67 #170

Update load processing; part of #67 #170

Conversation

JanFrederickUnnewehr commented Jul 10, 2020

Changes proposed in this Pull Request

Checklist

FabianHofmann left a comment

Choose a reason for hiding this comment

FabianHofmann Jul 10, 2020

Choose a reason for hiding this comment

JanFrederickUnnewehr Jul 15, 2020

Choose a reason for hiding this comment

FabianHofmann Jul 15, 2020

Choose a reason for hiding this comment

martacki commented Jul 16, 2020

JanFrederickUnnewehr commented Jul 16, 2020

martacki commented Jul 16, 2020 • edited Loading

FabianHofmann Jul 16, 2020

Choose a reason for hiding this comment

JanFrederickUnnewehr Jul 16, 2020

Choose a reason for hiding this comment

FabianHofmann Jul 16, 2020

Choose a reason for hiding this comment

JanFrederickUnnewehr Jul 16, 2020

Choose a reason for hiding this comment

FabianHofmann commented Jul 16, 2020 • edited Loading

JanFrederickUnnewehr commented Jul 16, 2020

martacki commented Jul 16, 2020 • edited Loading

JanFrederickUnnewehr commented Jul 16, 2020 • edited Loading

JanFrederickUnnewehr commented Jul 21, 2020 • edited Loading

FabianHofmann commented Dec 2, 2020

martacki commented Jul 16, 2020 •

edited

Loading

FabianHofmann commented Jul 16, 2020 •

edited

Loading

martacki commented Jul 16, 2020 •

edited

Loading

JanFrederickUnnewehr commented Jul 16, 2020 •

edited

Loading

JanFrederickUnnewehr commented Jul 21, 2020 •

edited

Loading