Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing pieces towards global PyPSA-Earth #445

Open
4 of 11 tasks
davide-f opened this issue Aug 23, 2022 · 6 comments
Open
4 of 11 tasks

Missing pieces towards global PyPSA-Earth #445

davide-f opened this issue Aug 23, 2022 · 6 comments
Milestone

Comments

@davide-f
Copy link
Member

davide-f commented Aug 23, 2022

Towards global PyPSA-Earth

In this issue, we track major requirements needed to successfully run the workflow at complete global scale.
I have been running parts of the model using countries=["Earth"] and in the following I resume some findings; this list is to be populated by additional comments.

  • download_osm_network needed some fixes but it generally works; however, I needed to rerun the workflow some times because the procedure got stopped several times because of download limits at the server end. When the user is interested in downloading large areas, it may be better to download the combined/continental chuncks rather than each country; however this leads to less generalizability and duplication if the user is then interested in smaller areas. Alternatively, some delays can be manually inserted to avoid such problem. Some tests may be needed when larger regions are more needed.
  • Improve parallel capabilities of build_shapes #574
  • clear_osm_network: this is the real big deal. Currently, after one day of complete execution, with the "names_by_shapes" option disabled we are still far from solved. The procedure is stuck at africa_shape.contains. I noticed that functions on polygons are super heavy to execute and we need to work hard on that. In the following some comments are provided:
    • split_cells_multiple function shall be completely rewritten #601
    • split_cells may be removed and use the more general split_cells_multiple instead (this is for clarity and to avoid duplication)
    • In build_shapes, we may preprocess better the regions to avoid extra calculations down the chain. For example, we shall perform better checks on the parameters to simplify_polys that polishes and simplify the shapes. The workflow shall work using those "cleaned" shapes although they may be slightly distorted with respect to the original ones. The original shapes may be saved anyway that can be used. Again in build shapes, we may prepare also the ext_country_shapes that is the unary_union of the country_shapes with the corresponding offshore_shapes that may be used by several scripts.
    • all geopandas function may be implemented using dask_geopandas to support multithreading
    • all shape comparison in the workflows may be revised to preprocess the geometries and avoid detailed comparisons for only where it is necessary
    • functions that compare shapes shall be approximated or simplified. For example, the country flag to a line (if necessary) can be calculated using one of the two substations the line is connected to.
  • build_osm_network is going to be another big deal as we have the functions set_substations_ids, fix_overpassing_lines and set_lines_ids that generally take a long time already for the africa model and their complexity is O(n^2), we shall improve that.
  • build_cutout: works with the feat/era5-monthly-retrieveal branch, yet the global cutout is about 300GB. The new compression features available by atlite for large cutout may be tested and used.
@davide-f davide-f changed the title Mising pieces to global PyPSA-Earth Missing pieces towards global PyPSA-Earth Aug 23, 2022
@ekatef
Copy link
Member

ekatef commented Oct 17, 2022

@davide-f, thank you for explanations regarding performance during the discussion. Preliminary results of testing workflow for China (that's not profiling rather ux test):

  1. build_shapes takes not so long -- about 12 hours but that feels not very comfortable especially during the first run as the script remains silent during all these 12 hours. (Maybe I'm missing some logs?) Could it probably make sense to attach some progress tracking at the computational cycle itself?

for elem in _:
df_gadm.loc[elem.index, "pop"] = elem["pop"]

  1. regarding parallelizing which you mentioned is not working yet for adding the population: could you please clarify a bit?

Is it imap not working there

tqdm(
pool.imap(_process_func_pop, country_codes),
total=len(country_codes),
**tqdm_kwargs,
)

or is it this piece which is not under imap but nevertheless slow and could be parallized as well?

for elem in _:
df_gadm.loc[elem.index, "pop"] = elem["pop"]

  1. in build_osm_network the limiting stage at the moment is even not (yet) set_substations_ids and set_lines_ids but fix_overpassing_lines. Currently only a half is processed for about 30 hours. [Probably, that's a good idea to switch this option off for the first quick run... :)] It feels not so bad as it's clear that something is goings on and it's possible to get an estimation for the ending time. However, probably could performance of that stage also be taken into consideration as well when working on the performance?

@davide-f davide-f added this to the Version 1.0 milestone Feb 1, 2023
@davide-f davide-f moved this to Todo in pypsa-earth Mar 9, 2023
@mnm-matin
Copy link
Member

I would like to create a PR on set_substations_ids and set_substations_ids, that is more efficient.
But the only thing holding me back is a lack of input and output dataframes. If someone can provide an input dataframe and the expected output (for the given params) that would be very helful.

@davide-f
Copy link
Member Author

davide-f commented Jul 18, 2023

Great @mnm-matin !

This task is very interesting and I'm very happy to support you. I've some ideas on how to do that and could be good to discuss on them.
This task should also be quite easy to do.
Shall we have a 30 minute chat about it?

I can provide input and output files for any country in the world. I'd recommend to start debugging with small countries and then test a large one.

A good large test case could be US or China, for a small one, maybe Nigeria should do the job.
What do you think?

@mnm-matin
Copy link
Member

mnm-matin commented Jul 18, 2023

Thanks @davide-f

That sounds great. Happy to have a meeting. The input and output files (perhaps over discord) would be awesome.
For set_substations_ids(buses, distance_crs, tol=2000),
input: buses dataframe
output: buses dataframe with the added columns

I will keep the pr limited to just set_substations_ids, but the approach should work for line_ids as well.

Large or small countries would be nice for benchmarking. Mainly, I require the input and output files just to make sure I'm getting the right results.

@davide-f
Copy link
Member Author

@davide-f
Copy link
Member Author

To track the needed improvements, this is the current time requirements in hours for using US:

rule key
download_osm_data total_time 0.102822
clean_osm_data total_time 3.603223
build_shapes total_time 4.601684
build_bus_regions total_time 0.324454
build_osm_network total_time 16.631785
build_demand_profiles total_time 0.059216
build_powerplants total_time 1.337166
build_renewable_profiles total_time 0.637599
base_network total_time 0.105632
add_electricity total_time 0.059819
simplify_network total_time 0.211443
cluster_network total_time 0.019749
solve_network total_time 0.110048
total_comp_stats total_time 30.608660
Name: US, dtype: float64

The PRs on build_osm_network by @mnm-matin can help tackle the major bottleneck.
Current PR #650 by @GridGrapher can significantly help break down computational time for build_shapes
The subsequent bottleneck is addressing clean_osm_network, in particular the function set_countryname_by_shape

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants