Name		Name	Last commit message	Last commit date
parent directory ..
data		data
figures		figures
loss_functions		loss_functions
model_evaluation		model_evaluation
model_scoring		model_scoring
model_training		model_training
models		models
notebooks		notebooks
preprocessing		preprocessing
recipes_and_guides		recipes_and_guides
visualization		visualization
RASTERIO_BEST_PRACTICES.md		RASTERIO_BEST_PRACTICES.md
README.md		README.md
enums.py		enums.py
environment.yml		environment.yml

README.md

Geospatial shared code components

Capabilities we hope to develop

Visualization
Data loading (patches of labels and imagery to be used in a Random forest, PyTorch or TF model)
Data downloading (documentation walk-throughs mostly)
Interactive map for displaying and discussing input and model output
Interactive re-labeling tool (see the the landcover repo)
Bring your own data (BYOD)
"Default" models for segmentation and detection
Model evaluation metrics computataion over an area / split

Content

recipes_and_guides: various tutorials on how to do things with GDAL, Google Earth Engine, QGIS, and other Python packages.
visualization:
- RasterLabelVisualizer: a class for visualizing raster mask labels and hardmax or softmax model predictions, instantiated with a user-provided category name and color mapping.
- ImageryVisualizer: a collection of static methods for chipping and viewing multi-band spectral imagery data that rasterio can load, such as Landsat 8, Sentine-2, NAIP and SRTM DEM.

Satellite data terminology

Here we define various terms we use internally to describe satellite data. Throughout, we use the term imagery to mean both multispectral satellite/aerial imagery, and other types of raster based data sources (e.g. digital elevation maps, synthetic apeture radar data, gridded weather data, etc.) that may be consumed by a model or used as labels.

Scene

Source imagery that were sectioned by the data provider to not exceed some maximally convinient size for downloads. We do not modify these; they are stored for archival purposes.

Scenes in this sense does not have to correspond to the original "scenes" that the satellite sensor imaged (e.g. a Landsat scene); they can be cloud-masked and composited and cut along arbitrary boundaries.

Examples:

Landsat 8 imagery downloaded over certain extents from Google Earth Engine
SRTM DEM data over an extent
A GeoTIFF of commercial satellite imagery from a partner

Tile

Large-ish image patches and label masks that are generated from scenes and vector data sources, and stored as "analysis-ready" data. Labels that come as polygons in the shapefile or geoJSON format are turned into pixel masks at this stage.

These should be of a size convinient for blobfuse caching and manual inspection in desktop GIS applications. Reasonable sizes are on the order of millions of pixels, e.g. can range from 2000 by 2000 pixels to 20,000 by 20,000 pixels.

Chip

Smaller image patches and label masks sized specifically to be consumed by a model during training/evaluation. These can be cut from tiles on-the-fly during training and evaluation, or offline in a chipping step. If the latter, chips are stored in shards.

Shard

A shard is a set of homogenously sized image chips that are stored together on disk - as a numpy array, TFRecord, or pickle file, etc - with dimensions (num_chips_in_shard, channels, chip_height, chip_width).

Patch

Any unit of imagery. It can be a tile or a chip or an area that does not conform to the other more precisely defined units.

Preprocessing

Preprocessing, which may include

combining different sources of data (Landsat 8 and DEM) by resampling
re-projecting them to a common coordinate system
and normalizing and gamma correcting pixel values.

can happen either during the creation of tiles from scenes or the creation of chips from tiles.

To ensure reproducibility:

A set of tiles, and shards of chips if they were made, should be treated as immutable once an experiment has been run using them as input data
Fully document the procedures used to create tiles and chips from original scenes (ideally the steps should be entirely programmatic).
The queries and steps used to obtain the scenes from the data provider should also be well-documented and repeatable.

Preparing data to use with the interactive re-training tool

You should be aware that in order to use the interactive re-training tool, the tiles need to contain all the data required by your model. For example, if you have two overlapping scenes where one contains Landsat 8 imagery at a 30m spatial resolution and the other contains digital elevation data at a 1m spatial resolution, then you can write a script to re-project both to 1m spatial resolution in a shared coordinate system and create multiple 2,000 by 2,000 pixel GeoTIFF tiles containing the two data sources stacked together. The tool will pass extends of these tiles to your model.

Other preprocessing steps such as normalizing pixel values can still be carried out in the tiles to chips step, and you should just be aware that in that case, these steps add latency when using the tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geospatial

geospatial

README.md

Geospatial shared code components

Capabilities we hope to develop

Content

Satellite data terminology

Scene

Tile

Chip

Shard

Patch

Preprocessing

Preparing data to use with the interactive re-training tool

Files

geospatial

Directory actions

More options

Directory actions

More options

Latest commit

History

geospatial

Folders and files

parent directory

README.md

Geospatial shared code components

Capabilities we hope to develop

Content

Satellite data terminology

Scene

Tile

Chip

Shard

Patch

Preprocessing

Preparing data to use with the interactive re-training tool