Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved efficiency wrf_remove_urban #71

Merged
merged 29 commits into from
Jul 14, 2022
Merged

Improved efficiency wrf_remove_urban #71

merged 29 commits into from
Jul 14, 2022

Conversation

dargueso
Copy link
Collaborator

@dargueso dargueso commented Jul 6, 2022

The function that removes urban areas and replace it with the dominant surrounding land use has been modified to improve efficiency. For each urban grid point, the function searches for most frequent land use category from the closest NPIX_NLC points that are not water or urban. To decide which grids are the closest, the function computes the distance over the entire domain. This slows down the process considerably. Now it looks over a tile of 2xNPIX_NLC by 2xNPIX_NLC grid points. The computation time is reduced considerably. This wasn't an issue over small domains, but in @matthiasdemuzere's example over Ireland, it was (partly due to missing values that shouldn't be there). Using that input files example, the time is reduced to a 10% of the original required time. Other minor changes are also introduced, mostly to define auxiliary variables in a cleaner way.

@matthiasdemuzere
Copy link
Owner

Hi @dargueso,

Thanks for this.

I have just tried this development on the d02 data from Ireland. The wrf_remove_urban routine is indeed much more efficient now, it was completed in 638.75 seconds. So that's good!

Yet the application was killed in a following step:

--> Create LCZ-based geo_em file
> Processing LP_URB2D ...
> Processing MH_URB2D ...
> Processing STDH_URB2D ...
> Processing HGT_URB2D ...
> Processing LB_URB2D ...
> Processing LF_URB2D ...
> Processing HI_URB2D ...
Killed

I am not sure what is causing this. I'll try to have a look later on.
I also noticed your optimisation did not pass our tests? So this is also something we need to look into ...

@matthiasdemuzere matthiasdemuzere marked this pull request as draft July 7, 2022 08:05
@matthiasdemuzere matthiasdemuzere added the wip work in progress label Jul 7, 2022
@dargueso
Copy link
Collaborator Author

dargueso commented Jul 7, 2022

Hi @matthiasdemuzere,
thanks for letting me know. It run well in my computer, but I will check what may kill it.
I saw this morning that it didn't pass the tests, so I will check that too.
I'll let you know how it goes.

@matthiasdemuzere
Copy link
Owner

Mmm ... interesting that it works for you?
I wonder if it could be a memory issue on my machine?
I can try to monitor this a bit better, to see what is happening ...

@dargueso
Copy link
Collaborator Author

dargueso commented Jul 7, 2022

I checked in two other machines using python 3.9 and 3.8. It all went fine.
As for the tests, all three tests failed belong to test_wrf_remove_urban, which makes sense and it's a good sign. Some of those tests were probably based on how the code was implemented before. I will look into that.

@dargueso
Copy link
Collaborator Author

dargueso commented Jul 8, 2022

Good morning,
I have fixed the issues that made the test fail. I have tested and it now pass all the tests.
@matthiasdemuzere, the error you got is likely due to a memory issue. I tried in an older machine and also got killed due to memory. In any case, this error has nothing to do with the changes implemented now in wrf_remove_urban. It is raised later on and probably due to the domain size. The issue didn't come up before because the code was slow and you didn't run it until completion. I would say the error would raise in the previous version too if run with the same domain.

@matthiasdemuzere
Copy link
Owner

since the remove urban routine can take a while, and as a user, you are left in the dark, I thought it would be a good idea to have a sense of where the routine is at?

So I added a progress bar that monitors the outer loop, and the general progress of that routine?

image

@matthiasdemuzere
Copy link
Owner

Almost there ...

Except that test test_check_hi_values_fails_sample_size_too_small fails, in python 3.7 only?
Will have to check what is going on there?

@matthiasdemuzere
Copy link
Owner

FYI @dargueso

I just tried the d02 Ireland domain on another machine with more memory, and there all works as well. So it seems indeed to be a memory problem with my (older) laptop.

I wonder whether there is something we can do in this respect? Any ideas @theendlessriver13?
It is in the section Processing HI_URB2D`, in which I apparently I reach 16+ GB of memory usage?

@jkittner
Copy link
Collaborator

jkittner commented Jul 8, 2022

I'm not sure about the test failure on 3.7, would have to look a bit deeper into that. Regarding the memory, I think there is not much we can do, this is caused by rasterio (as previous issues showed).

However, testing this patch on my machine against our Zaragoza example, it turns out to be actually slower than the previous version. I did the benchmark like this:

#! /usr/bin/env python3

import sys
import time
import subprocess

def main() -> int:
    times = []
    n = sys.argv[1]
    for i in range(int(n)):
        print(f'\033[0;32mrun {i + 1}...\033[0m')
        start = time.monotonic()
        subprocess.run(('w2w', './sample_data', 'lcz_zaragoza.tif', 'geo_em.d04.nc'))
        times.append(time.monotonic() - start)

    print(f'best of {n}: {min(times)}')
    return 0


if __name__ == '__main__':
    raise SystemExit(main())

A best of 10 test returned this result:

before patch: best of 10: 17.3824352240008 s
after patch: best of 10: 32.528999054999986 s

so it looks like it almost got 2x slower?!

It is always an option to run such processes in parallel utilizing multiple cores. Not sure if this is feasible though (also it will use more memory). Work could be distributed by latitude and finally joined together.

@jkittner
Copy link
Collaborator

jkittner commented Jul 8, 2022

this was run on python3.9 in a ubuntu VM with 6 cores and this processor (Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz)

@dargueso
Copy link
Collaborator Author

dargueso commented Jul 8, 2022

@theendlessriver13, I get the same thing on my new laptop too. I tested the optimisation over large domains, not small/standard ones. It looks like the modifications I introduced slow down the whole process. I need to check this.

@jkittner
Copy link
Collaborator

jkittner commented Jul 8, 2022

maybe this also helps a little: currently 77.84% is spent in wrf_remove_urban (prior to the patch) after it's 88.44%. These are SVGs so you can click on them and zoom in and out

  • before the patch (entire program):
    log

  • after the patch (only wrf_remove_urban)
    log

I think we somehow have to avoid looping over each coordinate; I am no expert at all in xarray, but maybe apply_ufunc might be what you need? https://docs.xarray.dev/en/stable/examples/apply_ufunc_vectorize_1d.html. but maybe to be able to use this, the parts need to be split into separate functions which are then applied...

@dargueso
Copy link
Collaborator Author

dargueso commented Jul 8, 2022

@theendlessriver13 thanks for the insight, very helpful!
I'm working on this to try to remove the loops, which I think I'm close to.

@dargueso
Copy link
Collaborator Author

dargueso commented Jul 9, 2022

Good morning,

Looking into the code in detail, the _calc_distance_coord had some issues with precision. The algorithm is know to have large rounding errors when using low floating precision on small distances. Thus, the selection of the nearest K points was not accurate enough. It wasn't a big deal when looking for a large number of K nearest neighbours, but it can be an issue when K is small. Furthermore, the new green fraction was not exactly as expected, although the error was again small when using large K.

This part of the code has been revamped. We no longer use brute force to calculate distances, but KD trees, which makes the calculation much faster for both large and small domains. We were calculating distances from a given grid points to all points in the domain, and this is clearly an overkill that slows down the process. According to my test, the sample data is now processed in ~12 seconds and the Ireland domain is processed in ~370s. This is faster than the original version even for small domains.

I wasn't able to remove the loops, but it is more efficient now.

I need to make some tests and clear up the code before committing. I removed a test on the function that calculate distance with the previous method, so I guess we need to add a test for the new function.

dargueso and others added 2 commits July 12, 2022 01:42
Using kdtree instead of brute force to search for N nearest neighbours to replace urban pixels with surrounding natural landscape. Added option to specify area size where the nearest grid points are searched (for optimization). Adapted tests. Removed ununsed functions.
@dargueso
Copy link
Collaborator Author

Hi,
I have made some significant changes to the wrf_remove_urban function. It combines optimisation for both small and large domains:

Sample data -> best of 5: 10.821772583
Ireland data (large domain) -> best of 5: 429.429523417

I couldn't avoid loops, but it know loops over a much smaller number of items. Also it only iterates over one dimension not two nested loops.

Results are not exactly the same as before, because distances are calculated slightly differently (see comment above). There is also a change in how the urban fraction of landusef is transferred to the new landuse. Before, it was searching again for nearest grid points with a slightly different condition (landusef for urban must be zero) and calculates the mode again. It is now much simpler, because it just grabs the urban fraction and adds it to the new dominant landuse category.

It has passed tests, except pre-commit.ci (@theendlessriver13, can you help with this?)
It also passes the tests in my local machine using pytests.

I think we should consider to label this with a new version number, because the output values are not the same.

Cheers,

@jkittner jkittner force-pushed the optmize_remove_urb branch from 4b7f6f0 to 5a01850 Compare July 12, 2022 07:35
Copy link
Collaborator

@jkittner jkittner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could even speed this up more:

diff --git a/w2w/w2w.py b/w2w/w2w.py
index 18d9b43..76c47e7 100644
--- a/w2w/w2w.py
+++ b/w2w/w2w.py
@@ -12,6 +12,7 @@ import scipy.spatial as spatial
 import os, sys
 import argparse
 from argparse import RawTextHelpFormatter
+from multiprocessing import cpu_count
 import traceback
 from typing import Dict, Any
 from typing import List
@@ -405,7 +406,11 @@ def using_kdtree(
     data['y'] = R * np.cos(phi) * np.sin(theta)
     data['z'] = R * np.sin(phi)
     tree = spatial.KDTree(data[['x', 'y', 'z']])
-    distance, index = tree.query(data[['x', 'y', 'z']], k=kpoints)
+    distance, index = tree.query(
+        data[['x', 'y', 'z']],
+        k=kpoints,
+        workers=cpu_count() - 1
+    )
 
     return distance, index

This only takes 7 seconds for me with the sample data. However maybe this should also become an argument then, if you want to avoid using too many cores on a powerful machine. I was not able to test this on the ireland data, but this should also reduce this significantly.

w2w/w2w.py Show resolved Hide resolved
w2w/w2w.py Outdated Show resolved Hide resolved
@matthiasdemuzere
Copy link
Owner

I think we could even speed this up more:

diff --git a/w2w/w2w.py b/w2w/w2w.py
index 18d9b43..76c47e7 100644
--- a/w2w/w2w.py
+++ b/w2w/w2w.py
@@ -12,6 +12,7 @@ import scipy.spatial as spatial
 import os, sys
 import argparse
 from argparse import RawTextHelpFormatter
+from multiprocessing import cpu_count
 import traceback
 from typing import Dict, Any
 from typing import List
@@ -405,7 +406,11 @@ def using_kdtree(
     data['y'] = R * np.cos(phi) * np.sin(theta)
     data['z'] = R * np.sin(phi)
     tree = spatial.KDTree(data[['x', 'y', 'z']])
-    distance, index = tree.query(data[['x', 'y', 'z']], k=kpoints)
+    distance, index = tree.query(
+        data[['x', 'y', 'z']],
+        k=kpoints,
+        workers=cpu_count() - 1
+    )
 
     return distance, index

This only takes 7 seconds for me with the sample data. However maybe this should also become an argument then, if you want to avoid using too many cores on a powerful machine. I was not able to test this on the ireland data, but this should also reduce this significantly.

Interesting.

@theendlessriver13: in case you want to test out with the Ireland files: geo_em.d02.nc and LCZ file.

@matthiasdemuzere
Copy link
Owner

I checked on my machine, for the large Ireland domain:

  • with the multiprocessing, time to solution:
real	5m48.359s
user	7m12.460s
sys	0m37.772s
  • without mp:
real	6m47.402s
user	6m12.215s
sys	0m31.903s

So not too much time gained.

On the good side: I am now able to process the large domain on my laptop, which was not possible before. So it seems much of the memory usage came from the old verrsion of remove_urb ?

dargueso and others added 2 commits July 12, 2022 10:39
Set dkd to _, because we don't access it.
Added some comments
@dargueso
Copy link
Collaborator Author

I checked on my machine, for the large Ireland domain:

  • with the multiprocessing, time to solution:
real	5m48.359s
user	7m12.460s
sys	0m37.772s
  • without mp:
real	6m47.402s
user	6m12.215s
sys	0m31.903s

So not too much time gained.

On the good side: I am now able to process the large domain on my laptop, which was not possible before. So it seems much of the memory usage came from the old verrsion of remove_urb ?

Yep, similar here. I think we should keep it simple considering the limited impact, otherwise we would need to add new options and document them.

@matthiasdemuzere
Copy link
Owner

Yep, similar here. I think we should keep it simple considering the limited impact, otherwise we would need to add new options and document them.

Yes, @theendlessriver13 and myself agree not to have this multiprocessor function.

dargueso and others added 2 commits July 12, 2022 18:14
Added documentation of searching area for NPIX_NLC
README.md Outdated Show resolved Hide resolved
@matthiasdemuzere
Copy link
Owner

This all looks good to me. Perhaps @theendlessriver13 can have a last look, and then we can merge into main (and update version and pypi?)

Copy link
Collaborator

@jkittner jkittner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking all good now, I just found one typo (not in the new stuff) and simplified the variable unpacking a bit.

Before releasing a new version to PyPi I'd like to take a look at the import(s) (sorting). I noticed, that this isn't working as excepted and there might actually be some unused imports we can remove.

@matthiasdemuzere
Copy link
Owner

Maybe I should also update the JOSS paper already? To add the info that is now added in README? I'll do this now ...

Copy link
Collaborator

@jkittner jkittner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found 3 more typos, maybe we need to regenerate the pdf again and then this is ready to go.

JOSS/paper.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@jkittner jkittner linked an issue Jul 14, 2022 that may be closed by this pull request
@jkittner jkittner marked this pull request as ready for review July 14, 2022 07:04
@jkittner jkittner removed the wip work in progress label Jul 14, 2022
matthiasdemuzere and others added 3 commits July 14, 2022 09:38
Co-authored-by: Jonas Kittner <54631600+theendlessriver13@users.noreply.github.com>
Co-authored-by: Jonas Kittner <54631600+theendlessriver13@users.noreply.github.com>
@jkittner jkittner merged commit 8e9e85d into main Jul 14, 2022
@jkittner jkittner deleted the optmize_remove_urb branch July 14, 2022 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

revise efficiency wrf_remove_urban()
3 participants