Skip to content
Adam J. Stewart edited this page Jun 9, 2021 · 15 revisions

Datasets

There are many different ways in which we can classify our datasets.

Benchmark vs. Generic

  1. BenchmarkDataset: contains both images and targets (e.g. COWC, VHR-10, CV4A Kenya)
  2. GenericDataset: contains only images or targets (e.g. Landsat, Sentinel, CDL, Chesapeake)

The problem with this classification is that we want to be able to combine two "generic" datasets to get a single "benchmark" dataset. For example, we need a way for users to specify an image source (e.g. Landsat, Sentinel) and a target source (e.g. CDL, Chesapeake). It isn't yet clear how one would do this.

Chip vs. Tile vs. Region

  1. Chip: pre-defined chips/patches (e.g. COWC, VHR-10, DOTA)
  2. Tile: possibly-overlapping tiles we need to sample chips/patches from (e.g. Landsat, Sentinel, CV4A Kenya)
  3. Region: static maps of stitched-together data (e.g. CDL, Chesapeake Land Cover, static Google Earth imagery)

Again, we need to be able to combine datasets from different categories into a single data loader.

Idea: what if we make our own DataLoader class that takes one or more Datasets! As long as we have a standard method for indexing these datasets, we can handle this. I've never seen a custom DataLoader before, but it should be doable to implement as a subclass.

Transforms

Models

Clone this wiki locally