Python 3.6+ required.
pip install -r requirements-dev.txt
pre-commit install # for formatter options
A rio merge
alternative optimized for large RGBA tifs
rio merge-rgba
is a CLI with nearly identical arguments to rio merge
. They accomplish the same task, merging many rasters into one.
$ rio merge-rgba --help
Usage: rio merge-rgba [OPTIONS] INPUTS... OUTPUT
Options:
-o, --output PATH Path to output file (optional alternative to a
positional arg for some commands).
--bounds FLOAT... Output bounds: left bottom right top.
-r, --res FLOAT Output dataset resolution in units of
coordinate reference system. Pixels assumed to
be square if this option is used once,
otherwise use: --res pixel_width --res
pixel_height
-f, --force-overwrite Do not prompt for confirmation before
overwriting output file
--precision INTEGER Number of decimal places of precision in
alignment of pixels
--co NAME=VALUE Driver specific creation options.See the
documentation for the selected output driver
for more information.
--help Show this message and exit.
The differences are in the implementation, rio merge-rgba
:
- only accepts 4-band RGBA rasters
- writes the destination data to disk rather than an in-memory array
- reads/writes in windows corresponding to the destination block layout
- once a window is filled with data values, the rest of the source files are skipped for that window
Memory efficiency. You'll never load more than 2 * blockxsize * blockysize
pixels into numpy arrays at one time, assuming garbage collection is infallible and there are no memory leaks.
While this does mean reading and writing to disk more frequently, having spatially aligned data with identical block layouts (like scenetifs) can make that as efficient as possible. Also...
rio merge
is more flexible with regard to nodata. It relies on reads with masked=True
to handle that logic across all cases.
By contrast, rio merge-rgba
requires RGBA images because, by reading with masked=False
and using the alpha band as the sole source of nodata-ness, we get huge speedups over the rio merge approach. Roughly 40x faster for my test cases. The exact reasons behind the discrepency is TBD but since we're reading/writing more intensively with the windowed approach, we need to keep IO as efficient as possible.
I tried but the speed advantage comes from avoiding masked reads. Once we improve the efficiency of masked reads or invent another mechanism for handling nodata masks that is more efficient, we can pull the windowed approach back into rasterio
Very promising. On the Landsat scenes, 23 rasters in total. I created reduced resolution versions of each in order to test the performance charachteristics as sizes increase.
resolution | raster size | rio merge Memory (MB) | merge_rgba Memory (MB) | rio merge Time (s) | merge_rgba Time (s) |
---|---|---|---|---|---|
300 | 1044x1044 | 135 | 83 | 1.10 | 0.70 |
150 | 2088x2088 | 220 | 107 | 3.20 | 1.90 |
90 | 3479x3479 | 412 | 115 | 8.85 | 3.10 |
60 | 5219x5219 | 750 | 121 | 25.00 | 7.00 |
30 | 10436x10436 | 1GB+ crashed | 145 | crashed at 38 minutes | 19.80 |
Note that the "merge_aligned" refered to in the charts is the same command, just renamed:
Since the inclusion of the full cover window in rasterio, there is a possibility of including an additional bottom row or right column if the bounds of the destination are not directly aligned with the source.
In rio merge, which reads the entire raster at once, this can manifest itself as one additional row and column on the bottom right edge of the image. The image within remains consistent.
With merge_rgba.py
, if we used the default full cover window, errors may appear within the image at block window boundaries where e.g. a 257x257 window is read into a 256x256 destination. To avoid this, we effectively embed a reimplementation of rasterio's get_window
using the round
operator which improves our chances that the pixel boundaries are snapped to appropriate bounds.
You may see small differences between rio merge and merge_rgba as a result but they should be limited to the single bottom row and right-most column.
Neither rio merge
nor rio merge-rgba
allow for bilinear resampling. The "resampling" is done effectively with a crude nearest neighbor via np.copyto
. This means that up to 1/2 cell pixel shifts can occur if inputs are misaligned.