urban-environments/dataset-collection at master · mprakhar/urban-environments

History

Name		Name	Last commit message	Last commit date
parent directory ..
.ipynb_checkpoints		.ipynb_checkpoints
DeepSat-data-exploration.ipynb		DeepSat-data-exploration.ipynb
Pipeline-to-create-Urban-Environments-dataset.ipynb		Pipeline-to-create-Urban-Environments-dataset.ipynb
README.md		README.md
Urban-Atlas-vector-data-exploration.ipynb		Urban-Atlas-vector-data-exploration.ipynb
__init__.py		__init__.py
urbanatlas.py		urbanatlas.py

README.md

Creating the Urban Environments dataset

To build a consistent satellite and GIS data ingestion pipeline, we have developed, and made extensive use of, the pysatml module in Python. The development page for that package can be found here. That page offers more details and tutorials on how to use that module to streamline the processing and ingestion of typical raster and vector data formats that are common in remote-sensing and GIS data analysis. You can simply install the pysatml module via pip:

pip install pysatml

To construct training and validation datasets, we combine ground-truth labels from the Urban Atlas large-scale land use classification survey with satellite imagery obtained from Google Maps. Our process consists of the following three main steps:

Downloading land use survey files from Urban Atlas at this link. Unfortunately, this first step is a manual process.
Generating an appropriate list of locations - as (lat,lon) tuples - where to sample satellite imagery
Downloading satellite imagery from Google Maps Static API (API key needed)

We plan to make the actual dataset available after further curation and ensuring that this complies with all applicable data licenses of the dataset used. In the meantime, we describe below the detailed procedure used to construct this dataset.

The entire pipeline is implemented and documented in this notebook.

Obtaining shape files for ground truth labels

First, manually download GIS polygon data for ground truth, available as shapefiles at the Urban Atlas website. Unfortunately there is no way to automate this because of the confirmation web forms used on the Urban Atlas website.

You can save these shapefiles (along with their respective projection files) to /home/data/urban-atlas/shapefiles/, or to a folder of your choosing, and manually edit the paths in the notebook.

The paper uses the shapefiles for several cities, some of which are expected to be more "similar" to each other than others. We have experimented with data for several other cities, however we decided to only include the following six cities in the analysis in the paper:

Athens
Barcelona
Berlin
Madrid
Budapest
Roma

Ingesting and processing Urban Atlas shapefiles

To process the vector data (shapefiles), we have developed the UAShapeFile class that encapsulates much of the functionality needed for shapefile data ingestion, sampling, etc. See the code in the urbanatlas.py. This is a lighweight wrapper around a GeoDataFrame object from the geopandas Python module. A sample usage is as follows:

>>> from urbanatlas import UAShapeFile
>>> myshapefile = "ro001l_bucharest.shp"
>>> mycity = UAShapeFile(myshapefile)

Let's look at the amount of land classified by each land use class across three example cities.

As expected, there is a large disparity in the classes, with a lot more land (by surface) classified as agricultural or forests as opposed to, e.g., airports. In the figure, we show the land use classes we analyze in the paper.

Creating ground truth validation raster grids

First, let's crop a window of width W x W (in km) centered at the city center:

>>> W = 25 # in Km
>>> window = (W, W)
>>> mycity_crop = mycity.crop_centered_window(city_center, window)

The UAShapeFile class allows to compute a ground raster of a given grid size:

>>> grid_cell = 100
>>> grid_size = (grid_cell, grid_cell)
>>> raster, locations_grid, cur_classes = mycity_crop.extract_class_raster(grid_size=grid_size)

This step can be skipped in the case of the six cities above, for which the data (files sample_locations_raster.csv and ground_truth_class_raster.npz) can be found in this repository under processed-data.

Selecting appropriate samples

Now, let's generate locations to download imagery for from the shapefile, using a stratified sampling procedure that takes into account the imbalances in the classes.

>>> N_SAMPLES_PER_CLASS = 1250
>>> MAX_SAMPLES_PER_POLY = 50
>>> locations_train = mycity.generate_sampling_locations(thresh_area=thresh_area, \
                                                     n_samples_per_class=N_SAMPLES_PER_CLASS,\
                                                     max_samples=MAX_SAMPLES_PER_POLY)

This step can be skipped in the case of the six cities above, for which the data (files additional_sample_locations.csv) can be found in this repository under processed-data.

An example of polygon data and sampling locations is given in the below figure.

Downloading satellite imagery

With a list of locations to sample (and their respective classes), we can now acquire imagery at each of the locations. There are a number of different ways to go about it, depending, of course, on the type and characteristics of the imagery needed. For visual-spectrum imagery, perhaps the easiest-to-use online source is Google Maps. For this, there are two steps to follow:

Create a Google Maps Static API key
Edit the second section in the Urban Environments pipeline notebook.

Example satellite images across the 20 original classes in the Urban Atlas survey is given in the figure below.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset-collection

dataset-collection

README.md

Creating the Urban Environments dataset

Obtaining shape files for ground truth labels

Ingesting and processing Urban Atlas shapefiles

Creating ground truth validation raster grids

Selecting appropriate samples

Downloading satellite imagery

Files

dataset-collection

Directory actions

More options

Directory actions

More options

Latest commit

History

dataset-collection

Folders and files

parent directory

README.md

Creating the Urban Environments dataset

Obtaining shape files for ground truth labels

Ingesting and processing Urban Atlas shapefiles

Creating ground truth validation raster grids

Selecting appropriate samples

Downloading satellite imagery