Skip to content

rio merge alternative optimized for large RGBA rasters

Notifications You must be signed in to change notification settings

mapbox/rio-merge-rgba

Repository files navigation

rio merge-rgba

Build Status

Installation for development

Python 3.6+ required.

pip install -r requirements-dev.txt
pre-commit install # for formatter options

Description

A rio merge alternative optimized for large RGBA tifs

rio merge-rgba is a CLI with nearly identical arguments to rio merge. They accomplish the same task, merging many rasters into one.

$ rio merge-rgba --help
Usage: rio merge-rgba [OPTIONS] INPUTS... OUTPUT

Options:
  -o, --output PATH            Path to output file (optional alternative to a
                               positional arg for some commands).
  --bounds FLOAT...            Output bounds: left bottom right top.
  -r, --res FLOAT              Output dataset resolution in units of
                               coordinate reference system. Pixels assumed to
                               be square if this option is used once,
                               otherwise use: --res pixel_width --res
                               pixel_height
  -f, --force-overwrite        Do not prompt for confirmation before
                               overwriting output file
  --precision INTEGER          Number of decimal places of precision in
                               alignment of pixels
  --co NAME=VALUE              Driver specific creation options.See the
                               documentation for the selected output driver
                               for more information.
  --help                       Show this message and exit.

The differences are in the implementation, rio merge-rgba:

  1. only accepts 4-band RGBA rasters
  2. writes the destination data to disk rather than an in-memory array
  3. reads/writes in windows corresponding to the destination block layout
  4. once a window is filled with data values, the rest of the source files are skipped for that window

Why windowed and why write to disk?

Memory efficiency. You'll never load more than 2 * blockxsize * blockysize pixels into numpy arrays at one time, assuming garbage collection is infallible and there are no memory leaks.

While this does mean reading and writing to disk more frequently, having spatially aligned data with identical block layouts (like scenetifs) can make that as efficient as possible. Also...

Why only RGBA?

rio merge is more flexible with regard to nodata. It relies on reads with masked=True to handle that logic across all cases.

By contrast, rio merge-rgba requires RGBA images because, by reading with masked=False and using the alpha band as the sole source of nodata-ness, we get huge speedups over the rio merge approach. Roughly 40x faster for my test cases. The exact reasons behind the discrepency is TBD but since we're reading/writing more intensively with the windowed approach, we need to keep IO as efficient as possible.

Why not improve rio merge

I tried but the speed advantage comes from avoiding masked reads. Once we improve the efficiency of masked reads or invent another mechanism for handling nodata masks that is more efficient, we can pull the windowed approach back into rasterio

Benchmarks

Very promising. On the Landsat scenes, 23 rasters in total. I created reduced resolution versions of each in order to test the performance charachteristics as sizes increase.

resolution raster size rio merge Memory (MB) merge_rgba Memory (MB) rio merge Time (s) merge_rgba Time (s)
300 1044x1044 135 83 1.10 0.70
150 2088x2088 220 107 3.20 1.90
90 3479x3479 412 115 8.85 3.10
60 5219x5219 750 121 25.00 7.00
30 10436x10436 1GB+ crashed 145 crashed at 38 minutes 19.80

Note that the "merge_aligned" refered to in the charts is the same command, just renamed:

mem

speed

Note about pixel alignment

Since the inclusion of the full cover window in rasterio, there is a possibility of including an additional bottom row or right column if the bounds of the destination are not directly aligned with the source.

In rio merge, which reads the entire raster at once, this can manifest itself as one additional row and column on the bottom right edge of the image. The image within remains consistent.

With merge_rgba.py, if we used the default full cover window, errors may appear within the image at block window boundaries where e.g. a 257x257 window is read into a 256x256 destination. To avoid this, we effectively embed a reimplementation of rasterio's get_window using the round operator which improves our chances that the pixel boundaries are snapped to appropriate bounds.

You may see small differences between rio merge and merge_rgba as a result but they should be limited to the single bottom row and right-most column.

Note about resampling

Neither rio merge nor rio merge-rgba allow for bilinear resampling. The "resampling" is done effectively with a crude nearest neighbor via np.copyto. This means that up to 1/2 cell pixel shifts can occur if inputs are misaligned.

About

rio merge alternative optimized for large RGBA rasters

Topics

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages