Skip to content

Commit

Permalink
Improved efficiency wrf_remove_urban (#71)
Browse files Browse the repository at this point in the history
* Improved efficiency wrf_remove_urban

* Correct selection of tiles when near borders

* add tqdm for progress bar monitoring?

* Optimize wrf remove urban

* Using kdtree instead of brute force to search for N nearest neighbours to replace urban pixels with surrounding natural landscape. Added option to specify area size where the nearest grid points are searched (for optimization). Adapted tests. Removed ununsed functions.

* Removed unnecessary function and added annotation

* add type annotation

* Set dkd to _, because we don't access it. Added some comments

* Added documentation of searching area for NPIX_NLC

* fix typo, simplify kd tree variable unpacking

* update JOSS with readme info

* Update JOSS/paper.md

* update JOSS pdf

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: matthiasdemuzere <mdemuzere@gmail.com>
Co-authored-by: Jonas Kittner <jonas.kittner@rub.de>
Co-authored-by: dargueso <dargueso@uib.es>
  • Loading branch information
5 people authored Jul 14, 2022
1 parent 33d4747 commit 8e9e85d
Show file tree
Hide file tree
Showing 6 changed files with 122 additions and 114 deletions.
2 changes: 1 addition & 1 deletion JOSS/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ To get to that point, a number of sequential steps are required:

### Step 1: Remove the default urban land cover

The default urban land cover from MODIS is replaced with the dominant surrounding vegetation category, as done in @Li2020. This procedure affects WRF's parameters LU_INDEX, LANDUSEF and GREENFRAC. LU_INDEX is selected as the dominant category from the corresponding argument `--npix-nlc` (default = 45) nearest grid points (excluding ocean, urban and lakes). GREENFRAC is calculated as the mean over all grid points with that dominant vegetation category among the `--npix-nlc` nearest points. For each grid point, if LANDUSEF had any percentage of urban, it is set to zero and the percentage is added to the dominant vegetation category assigned to that grid point.
The default urban land cover from MODIS is replaced with the dominant surrounding vegetation category, as done in @Li2020. This procedure affects WRF's parameters LU_INDEX, LANDUSEF and GREENFRAC. First, an initial number of neighbouring pixels (corresponding argument `--npix-area`, default = `--npix-nlc` ** 2) are selected using the k-d tree algorithm, to find the nearest pixels based on the great circle arc length, specifically [the chord length formula](https://en.wikipedia.org/wiki/Great-circle_distance). Afterwards, the LU_INDEX is set by sampling the dominant category from the corresponding argument `--npix-nlc` (default = 45) nearest initial grid points (excluding ocean, urban and lakes). GREENFRAC is calculated as the mean over all grid points with that dominant vegetation category among the `--npix-nlc` nearest points. For each grid point, if LANDUSEF had any percentage of urban, it is set to zero and the percentage is added to the dominant vegetation category assigned to that grid point.

Resulting output: **geo_em.d0X_NoUrban.nc**

Expand Down
Binary file modified JOSS/paper.pdf
Binary file not shown.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Important notes
* At the end of running `W2W`, a note is displayed that indicates the `nbui_max` value, e.g. for the sample data: `Set nbui_max to 5 during compilation, in order to optimize memory storage`. This is especially relevant for users that work with the BEP or BEP+BEM urban parameterization schemes (`sf_urban_physics = 2 or 3`, respectively). See also `num_urban_nbui` in [WRF's README.namelist](https://github.com/wrf-model/WRF/blob/master/run/README.namelist) for more info.
* Make sure to set `use_wudapt_lcz=1` (default is 0) and `num_land_cat=41` (default is 21) in WRF's `namelist.input` when using the LCZ-based urban canopy parameters.
* The outputs of this tool have only been tested with the most recent [WRF version 4.3.x](https://github.com/wrf-model/WRF/releases/tag/v4.3). So we advise you to work with this version as well, which is now able to ingest the urban LCZ classes by default.
* It is possible to specify the number of nearest pixels (NPIX_NLC) to determine the dominant natural landuse in the surroundings of each urban pixel. The most frequent landuse among those pixels is used to replace the urban pixels. The distance used to find the nearest pixels is based on the great circle arc length, specifically the chord length formula (https://en.wikipedia.org/wiki/Great-circle_distance). For performance reasons, the nearest pixels are searched using a k-d tree algorithm, instead of brute forcing over all possible pixels. Because only land natural pixels are considered to calculate the most frequent land use, we need to filter out water and other urban pixels. Thus, we need to specify the initial number of pixels (NPIX_AREA) that the k-d tree algorithm will select, which will be larger than NPIX_NLC. By default, NPIX_AREA = NPIX_NLC**2 pixels will be selected and the nearest NPIX_NLC (default is 45) pixels that are not water or urban will be drawn from that initial selection. Because it is actually an area around the urban pixel, it is referred to as NPIX_AREA.


Arguments
Expand All @@ -65,8 +66,9 @@ Arguments
```
-b --built-lcz = LCZ classes considered as urban (DEFAULT: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
-l --lcz-band = Band to use from LCZ file (DEFAULT: 0). For maps produced with LCZ Generator, use 1
-f --frc-threshold = FRC_URB2D treshold value to assign pixel as urban (DEFAULT: 0.2)
-f --frc-threshold = FRC_URB2D threshold value to assign pixel as urban (DEFAULT: 0.2)
-n --npix-nlc = Number of pixels to use for sampling neighbouring natural land cover (DEFAULT: 45)
-a --npix_area = Area in number of pixels to look for the NPIX_NLC nearest number of pixels for sampling neighbouring natural land cover (DEFAULT: NPIX_NLC**2)
--lcz-ucp = Specify a custom lookup table for the LCZ-based Urban Canopy Parameters
```

Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ install_requires =
rasterio>=1.1.6
rioxarray
scipy
tqdm
xarray[io]
importlib-metadata;python_version<"3.9"
importlib-resources;python_version<"3.9"
Expand Down
19 changes: 2 additions & 17 deletions tests/w2w_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@

import w2w.w2w
from w2w.w2w import _add_frc_lu_index_2_wrf
from w2w.w2w import _calc_distance_coord
from w2w.w2w import _check_hi_values
from w2w.w2w import _check_lcz_wrf_extent
from w2w.w2w import _compute_hi_distribution
Expand Down Expand Up @@ -289,20 +288,6 @@ def test_check_lcz_integrity_clean_file_written(tmpdir, info_mock):
assert os.listdir(tmpdir) == ['Shanghai_clean.tif']


@pytest.mark.parametrize(
('coords', 'expected'),
(
pytest.param((89, 0, 90, 1), 111194, id='near northern pole'),
pytest.param((-89, 0, -90, 1), 111194, id='near southern pole'),
pytest.param((-60, -10, -70, -12), 1115764, id='southern eastern hemisphere'),
pytest.param((-1, 0, 1, 2), 314498, id='across equator'),
pytest.param((45, 179, 46, -179), 191461, id='across dateline'),
),
)
def test_calc_distance_coord(coords, expected):
assert _calc_distance_coord(*coords) == pytest.approx(expected, abs=1)


@pytest.mark.parametrize(
('dst_file', 'dst_nu_file'),
(
Expand All @@ -315,7 +300,7 @@ def test_wrf_remove_urban(tmpdir, dst_file, dst_nu_file, info_mock):
{'dst_file': dst_file, 'dst_nu_file': os.path.join(tmpdir, dst_nu_file)}
)
old_ds = xr.open_dataset(info.dst_file)
wrf_remove_urban(info=info, NPIX_NLC=9)
wrf_remove_urban(info=info, NPIX_NLC=9, NPIX_AREA=25)
ds = xr.open_dataset(info.dst_nu_file)
# check lused 13 was reclassified to 12
assert ds.LU_INDEX.values[0][2][2] == 12
Expand Down Expand Up @@ -350,7 +335,7 @@ def test_wrf_remove_urban_output_already_exists_is_overwritten(tmpdir, info_mock
)
assert os.listdir(tmpdir) == ['5by5_new.nc']
m_time_old = os.path.getmtime(info.dst_nu_file)
wrf_remove_urban(info=info, NPIX_NLC=9)
wrf_remove_urban(info=info, NPIX_NLC=9, NPIX_AREA=25)
assert m_time_old != os.path.getmtime(info.dst_nu_file)


Expand Down
210 changes: 115 additions & 95 deletions w2w/w2w.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from rasterio.warp import reproject, Resampling
from rasterio.transform import Affine
from scipy.stats import mode, truncnorm
import scipy.spatial as spatial
import os, sys
import argparse
from argparse import RawTextHelpFormatter
Expand All @@ -24,6 +25,7 @@
from scipy import stats
import pyproj
from pyproj import CRS, Transformer
from tqdm.auto import tqdm

if sys.version_info >= (3, 9): # pragma: >=3.9 cover
import importlib.metadata as importlib_metadata
Expand Down Expand Up @@ -99,7 +101,7 @@ def main(argv: Optional[Sequence[str]] = None) -> int:
metavar='',
type=float,
dest='FRC_THRESHOLD',
help='FRC_URB2D treshold value to assign pixel as urban ' '(DEFAULT: 0.2)',
help='FRC_URB2D threshold value to assign pixel as urban ' '(DEFAULT: 0.2)',
default=0.2,
)
parser.add_argument(
Expand All @@ -112,11 +114,23 @@ def main(argv: Optional[Sequence[str]] = None) -> int:
'natural land cover (DEFAULT: 45)',
default=45,
)

parser.add_argument(
'-a',
'--npix-area',
metavar='',
type=int,
dest='NPIX_AREA',
help='Area in number of pixels to look for the nearest number of pixels'
'for sampling neighbouring natural land cover (DEFAULT: NPIX_NLC**2)',
default=None,
)
parser.add_argument(
'--lcz-ucp',
type=str,
help='File with custom LCZ-based urban canopy parameters',
)

args = parser.parse_args(argv)

# check if a custom LCZ UCP file was set and read it
Expand Down Expand Up @@ -149,9 +163,11 @@ def main(argv: Optional[Sequence[str]] = None) -> int:
print(
f'{FBOLD}--> Replace WRF MODIS urban LC with ' f'surrounding natural LC{FEND}'
)

wrf_remove_urban(
info=info,
NPIX_NLC=args.NPIX_NLC,
NPIX_AREA=args.NPIX_AREA,
)

print(f'{FBOLD}--> Create LCZ-based geo_em file{FEND}')
Expand Down Expand Up @@ -365,34 +381,35 @@ def check_lcz_integrity(info: Info, LCZ_BAND: int) -> None:
lcz.rio.to_raster(info.src_file_clean, dtype=np.int32)


def _calc_distance_coord(
lat1: float, lon1: float, lat2: float, lon2: float
) -> xr.DataArray:
def using_kdtree(data: pd.DataFrame, kpoints: int) -> NDArray[np.int_]:

'''Calculate distance using coordinates
This uses the spherical law of cosines
'''
Extract nearest kpoints to each point given a list of them.
The input data contains at least latitude ('lat') and longitude ('lon').
The distance is calculated with great circle arc length.
Based on https://stackoverflow.com/q/43020919/190597
https://en.wikipedia.org/wiki/Great-circle_distance
earth_radius = 6371000 # Earth radius in m
lat1r = lat1 * np.pi / 180.0
lat2r = lat2 * np.pi / 180.0
lon1r = lon1 * np.pi / 180.0
lon2r = lon2 * np.pi / 180.0

d = (
np.arccos(
np.sin(lat1r) * np.sin(lat2r)
+ np.cos(lat1r) * np.cos(lat2r) * np.cos(lon2r - lon1r)
)
* earth_radius
)
Output:
distance: distances between nearest k points and each point using circle arch length.
index: indices of nearest k points for each point.
'''

return d
R = 6371
phi = np.deg2rad(data['lat'])
theta = np.deg2rad(data['lon'])
data['x'] = R * np.cos(phi) * np.cos(theta)
data['y'] = R * np.cos(phi) * np.sin(theta)
data['z'] = R * np.sin(phi)
tree = spatial.KDTree(data[['x', 'y', 'z']])
_, index = tree.query(data[['x', 'y', 'z']], k=kpoints)
return index


def wrf_remove_urban(
info: Info,
NPIX_NLC: int,
NPIX_AREA: int,
) -> None:

'''Remove MODIS urban extent from geo_em*.nc file'''
Expand All @@ -406,89 +423,92 @@ def wrf_remove_urban(
greenf = dst_data.GREENFRAC.squeeze()
lat = dst_data.XLAT_M.squeeze()
lon = dst_data.XLONG_M.squeeze()
orig_num_land_cat = dst_data.NUM_LAND_CAT

# New arrays to hold data without urban areas
newluse = luse.values.copy()
newluf = luf.values.copy()
newgreenf = greenf.values.copy()
orig_num_land_cat = dst_data.NUM_LAND_CAT

# Convert urban to surrounding natural characteristics
for i in dst_data.south_north:
for j in dst_data.west_east:
if luse.isel(south_north=i, west_east=j) == 13:
data_coord = pd.DataFrame(
{
'lat': lat.values.ravel(),
'lon': lon.values.ravel(),
'luse': luse.values.ravel(),
}
)

dis = _calc_distance_coord(
lat.where((luse != 13) & (luse != 17) & (luse != 21)),
lon.where((luse != 13) & (luse != 17) & (luse != 21)),
lat.isel(south_north=i, west_east=j),
lon.isel(south_north=i, west_east=j),
)
ERROR = '\033[0;31m'
ENDC = '\033[0m'

disflat = (
dis.stack(gridpoints=('south_north', 'west_east'))
.reset_index('gridpoints')
.drop_vars(['south_north', 'west_east'])
)
aux = luse.where(
dis <= disflat.sortby(disflat).isel(gridpoints=NPIX_NLC), drop=True
)
m = stats.mode(aux.values.flatten(), nan_policy='omit')[0]
newluse[i, j] = int(m)
luf_2D = luf.values.reshape(luf.shape[0], -1)
if orig_num_land_cat > 20: # USING MODIS_LAKE
data_coord['luf_natland'] = list(
(luf_2D[12, :] == 0) & (luf_2D[16, :] == 0) & (luf_2D[20, :] == 0)
)
data_coord['luf_urb'] = list((luf_2D[12, :] != 0))
else: # USING MODIS (NO LAKES)
data_coord['luf_natland'] = list((luf_2D[12, :] == 0) & (luf_2D[16, :] == 0))
data_coord['luf_urb'] = list((luf_2D[12, :] != 0))
if NPIX_AREA == None:
NPIX_AREA = NPIX_NLC**2
if NPIX_AREA > luse.size:
raise ValueError(
f'{ERROR}ERROR: The area you selected is larger than the domain size\n'
f'You chose an area of {NPIX_AREA} pixels and the domain is {luse.size} pixels\n'
f'Reduce NPIX_AREA. \n\n'
f'Exiting...{ENDC}'
)

auxg = (
greenf.where(
dis <= disflat.sortby(disflat).isel(gridpoints=NPIX_NLC),
drop=True,
)
.where(aux == newluse[i, j])
.mean(dim=['south_north', 'west_east'])
)
newgreenf[:, i, j] = auxg

if luf.isel(south_north=i, west_east=j, land_cat=12) > 0.0:
if orig_num_land_cat > 20: # USING MODIS_LAKE
dis = _calc_distance_coord(
lat.where(
(luf.isel(land_cat=12) == 0.0)
& (luf.isel(land_cat=16) == 0.0)
& (luf.isel(land_cat=20) == 0.0)
),
lon.where(
(luf.isel(land_cat=12) == 0.0)
& (luf.isel(land_cat=16) == 0.0)
& (luf.isel(land_cat=20) == 0.0)
),
lat.isel(south_north=i, west_east=j),
lon.isel(south_north=i, west_east=j),
)
else: # USING MODIS (NO LAKES)
dis = _calc_distance_coord(
lat.where(
(luf.isel(land_cat=12) == 0.0)
& (luf.isel(land_cat=16) == 0.0)
),
lon.where(
(luf.isel(land_cat=12) == 0.0)
& (luf.isel(land_cat=16) == 0.0)
),
lat.isel(south_north=i, west_east=j),
lon.isel(south_north=i, west_east=j),
)
ikd = using_kdtree(data_coord, min(luse.size, NPIX_AREA))

disflat = (
dis.stack(gridpoints=('south_north', 'west_east'))
.reset_index('gridpoints')
.drop_vars(['south_north', 'west_east'])
)
aux = luse.where(
dis <= disflat.sortby(disflat).isel(gridpoints=NPIX_NLC), drop=True
)
m = stats.mode(aux.values.flatten(), nan_policy='omit')[0]
newlu = int(m) - 1
# newlu = int(mode(aux.values.flatten())[0])-1
newluf[newlu, i, j] += luf.isel(
south_north=i, west_east=j, land_cat=12
).values
newluf[12, i, j] = 0.0
data_coord['luse_urb'] = data_coord.luse == 13
data_coord['luse_natland'] = (
(data_coord.luse != 13) & (data_coord.luse != 17) & (data_coord.luse != 21)
)

# Replacing urban pixels with surrounding dominant natural land use category
data_urb = data_coord.where(data_coord.luse_urb).dropna()

for iurb in tqdm(data_urb.index, desc='Looping through urban grid pixels'):

i, j = np.unravel_index(iurb, luse.shape)
data_luse_natland = (
data_coord.loc[ikd[iurb, :]].where(data_coord.luse_natland).dropna()
)

try:
aux_kd = data_luse_natland.iloc[:NPIX_NLC]
mkd = data_luse_natland.iloc[:NPIX_NLC].luse.mode()[0]
except (IndexError, KeyError):
err = traceback.format_exc()
print(
f'{ERROR}ERROR: Not enough natural land points in selected area\n'
f'You chose to sample {NPIX_NLC} pixels over an area of {NPIX_AREA} '
f'Increase NPIX_AREA. \n\n'
f'{err}\n'
f'Exiting ...{ENDC}'
)
sys.exit(1)

gkf = np.take(
greenf.values.reshape(greenf.shape[0], -1),
aux_kd[aux_kd.luse == mkd.item()].index,
axis=1,
).mean(axis=1)
newluse[i, j] = int(mkd)
newgreenf[:, i, j] = gkf

# Set urban LANDUSEF to 0 and move that fraction to dominant natural category
data_coord['newluse'] = newluse.ravel()
data_luf = data_coord.where(data_coord.luf_urb).dropna()

for iurb_luf in data_luf.index:
i, j = np.unravel_index(iurb_luf, luse.shape)
newluf[int(data_luf.loc[iurb_luf]['newluse']) - 1, i, j] += luf.isel(
south_north=i, west_east=j, land_cat=12
).values
newluf[12, i, j] = 0.0

dst_data.LU_INDEX.values[0, :] = newluse[:]
dst_data.LANDUSEF.values[0, :] = newluf[:]
Expand Down

0 comments on commit 8e9e85d

Please sign in to comment.