-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dataset::read as #374
base: master
Are you sure you want to change the base?
Add Dataset::read as #374
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
use ptr::null_mut; | ||
use std::convert::TryInto; | ||
use std::mem::MaybeUninit; | ||
use std::mem::{size_of, MaybeUninit}; | ||
use std::{ | ||
ffi::NulError, | ||
ffi::{CStr, CString}, | ||
|
@@ -12,6 +12,7 @@ use std::{ | |
use crate::cpl::CslStringList; | ||
use crate::errors::*; | ||
use crate::raster::RasterCreationOption; | ||
use crate::raster::{Buffer3D, GdalType, RasterIOExtraArg, ResampleAlg}; | ||
use crate::utils::{_last_cpl_err, _last_null_pointer_err, _path_to_c_string, _string}; | ||
use crate::vector::{sql, Geometry, OwnedLayer}; | ||
use crate::{ | ||
|
@@ -20,10 +21,10 @@ use crate::{ | |
}; | ||
|
||
use gdal_sys::{ | ||
self, CPLErr, GDALAccess, GDALDatasetH, GDALMajorObjectH, OGRErr, OGRGeometryH, OGRLayerH, | ||
OGRwkbGeometryType, | ||
self, CPLErr, GDALAccess, GDALDatasetH, GDALMajorObjectH, GDALRWFlag, GDALRasterIOExtraArg, | ||
OGRErr, OGRGeometryH, OGRLayerH, OGRwkbGeometryType, | ||
}; | ||
use libc::{c_double, c_int, c_uint}; | ||
use libc::{c_double, c_int, c_uint, c_void}; | ||
|
||
#[cfg(all(major_ge_3, minor_ge_1))] | ||
use crate::raster::Group; | ||
|
@@ -282,6 +283,23 @@ impl<'a> Default for LayerOptions<'a> { | |
} | ||
} | ||
|
||
// This defines multiple ways to layout an image in memory, based on GDAL Python bindings | ||
// which have either 'band' or 'pixel' interleave: | ||
// https://github.com/OSGeo/gdal/blob/301f31b9b74cd67edcdc555f7e7a58db87cbadb2/swig/include/gdal_array.i#L2300 | ||
pub enum ImageInterleaving { | ||
/// This means the image is stored in memory with first the first band, | ||
/// then second band and so on | ||
Band, | ||
/// This means the image is stored in memory with first the value of all bands | ||
/// for the first pixel, then the same for second pixel and so on | ||
Pixel, | ||
} | ||
|
||
pub enum BandSelection { | ||
Subset(Vec<i32>), | ||
All, | ||
} | ||
|
||
// GDAL Docs state: The returned dataset should only be accessed by one thread at a time. | ||
// See: https://gdal.org/api/raster_c_api.html#_CPPv48GDALOpenPKc10GDALAccess | ||
// Additionally, VRT Datasets are not safe before GDAL 2.3. | ||
|
@@ -741,6 +759,115 @@ impl Dataset { | |
Ok(self.child_layer(c_layer)) | ||
} | ||
|
||
/// Read a [`Buffer<T>`] from this dataset, where `T` implements [`GdalType`]. | ||
/// | ||
/// # Arguments | ||
/// * `window` - the window position from top left | ||
/// * `window_size` - the window size (width, height). GDAL will interpolate data if `window_size` != `buffer_size` | ||
/// * `buffer_size` - the desired size of the 'Buffer' (width, height) | ||
/// * `e_resample_alg` - the resample algorithm used for the interpolation. Default: `NearestNeighbor`. | ||
/// * `interleaving`- The output buffer image layout (see `ImageInterleaving`) | ||
/// * `bands` - A subset of bands to select or BandSelection::All to read all bands | ||
/// | ||
/// # Example | ||
/// | ||
/// ```rust | ||
/// # fn main() -> gdal::errors::Result<()> { | ||
/// use gdal::{Dataset, ImageInterleaving, BandSelection}; | ||
/// use gdal::raster::ResampleAlg; | ||
/// let dataset = Dataset::open("fixtures/m_3607824_se_17_1_20160620_sub.tif")?; | ||
/// let size = 2; | ||
/// let buf = dataset.read_as::<u8>((0, 0), dataset.raster_size(), (size, size), Some(ResampleAlg::Bilinear), ImageInterleaving::Pixel, BandSelection::All)?; | ||
/// assert_eq!(buf.size, (size, size, dataset.raster_count() as usize)); | ||
/// assert_eq!(buf.data, [103, 116, 101, 169, 92, 108, 94, 163, 92, 112, 93, 179, 89, 109, 91, 181]); | ||
/// let buf = dataset.read_as::<u8>((0, 0), dataset.raster_size(), (size, size), Some(ResampleAlg::Bilinear), ImageInterleaving::Band, BandSelection::All)?; | ||
/// assert_eq!(buf.data, [103, 92, 92, 89, 116, 108, 112, 109, 101, 94, 93, 91, 169, 163, 179, 181]); | ||
/// # Ok(()) | ||
/// # } | ||
/// ``` | ||
pub fn read_as<T: Copy + GdalType>( | ||
&self, | ||
window: (isize, isize), | ||
window_size: (usize, usize), | ||
buffer_size: (usize, usize), | ||
e_resample_alg: Option<ResampleAlg>, | ||
interleaving: ImageInterleaving, | ||
bands: BandSelection, | ||
) -> Result<Buffer3D<T>> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking out loud here: Why don't we just expose There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also wondering if this should be paired with exposing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good points. One thing I wasn't sure is if For example currently I do like the idea of exposing RasterIO (and AdviseRead possibly). Then I feel it would make sense to also expose these on the And then maybe still keep some high-level There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure about that either! 😅 Actually, it's more like a thin wrapper around the C API, with inspiration from the C++ API, while leveraging safety benefits of Rust when possible. I personally would like to focus more on the ergonomics, but there are good, well-considered arguments against that. There have been lots of cooks in this kitchen, and lacking a project BDFL, my default position is to try to be consistent, yet look for ways burn-down friction and entropy along the way.
I'm too ignorant of the project history to offer an explanation for that. It probably best scratched an itch at the time.
This may be a simple situation of convention. I've worked on a number of different geospatial raster processing systems, and I can't think of one that doesn't assum row-major order. I should also note that if Something that might help is to understand the use case for this PR. Is calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good questions, let me try to give some more motivations. The reason I started working on this is because I was working on a xyz tiler service (similar to rio-tiler) to serve XYZ tiles to a web frontend from cloud-optimized GeoTiffs stored on a blobstore. This is a case where I always want to read all bands of the tiff because either the tiffs are RGB or they have more than 3 bands but then I'm gonna do some band combinations to turn into 3 bands for visualization. I do already have a similar service in Python and I'm using Now, granted I could do this by reading band-by-band and recombining with
All in all, if the goal is for this crate to be a wrapper around the C API (that seem like a good guideline to me) but inspired by the GDAL C++ API, then I feel exposing I get the code duplication argument though, but then I would almost argue that if we have to pick one function to expose, it should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @julienr Interesting and helpful context. Thanks for taking the time to write it up! I hope I didn't come across as questioning your need; I just thought for the sake of one of the other contributors having the use case fleshed out would help with thinking about the best long-term API. I have a few additional reflections on the above, in case it's helpful (ignore if not), but I'll need to defer to a more senior contributor to determine the appropriate direction.
We have a rust-based XYZ tiler as well, and I avoided
FWIW, we create an in-memory Dataset where we assemble the bands, apply color correction, etc., and then use the built-in GDAL PNG encoder to render. Negligible overhead compared to the S3 network transfer. And GDAL appears to take care of any reshaping needed.
I agree, but even with the C/C++ API, GDAL doesn't help much, unless you start looking at GDAL Warp and the pixel functions. I aspire to write some wrappers for that API, but I've run away from the dragons therein several times 😅! But maybe I'm missing a part of the API here. 🤔 I've found
That may be so. I wonder if it depends on how big your blocks are, and if allocating
Maybe I'm wrong, but if it's truly a COG format with
Based on your arguments, I agree. But I'd go all the way and expose as much of the API as you can without leaking
I think it's a minor consideration. We could consider traits for a common interface and share functions as we're able, but a lot of this is just baked into GDAL. |
||
let resample_alg = e_resample_alg.unwrap_or(ResampleAlg::NearestNeighbour); | ||
|
||
let mut options: GDALRasterIOExtraArg = RasterIOExtraArg { | ||
e_resample_alg: resample_alg, | ||
..Default::default() | ||
} | ||
.into(); | ||
|
||
let options_ptr: *mut GDALRasterIOExtraArg = &mut options; | ||
|
||
let (mut bands, band_count) = match bands { | ||
BandSelection::Subset(bands) => { | ||
let band_count = bands.len(); | ||
(bands, band_count) | ||
} | ||
BandSelection::All => { | ||
let band_count = self.raster_count() as usize; | ||
let bands = (1_i32..band_count as i32 + 1_i32).collect(); | ||
(bands, band_count) | ||
} | ||
}; | ||
|
||
let pixels = buffer_size.0 * buffer_size.1 * band_count; | ||
let mut data: Vec<T> = Vec::with_capacity(pixels); | ||
let size_t = size_of::<T>() as i64; | ||
|
||
let (pixel_space, line_space, band_space) = match interleaving { | ||
ImageInterleaving::Band => (0, 0, 0), | ||
ImageInterleaving::Pixel => ( | ||
size_t * band_count as i64, | ||
buffer_size.0 as i64 * size_t * band_count as i64, | ||
size_t, | ||
), | ||
}; | ||
|
||
// Safety: the GDALRasterIOEx writes | ||
// exactly pixel elements into the slice, before we | ||
// read from this slice. This paradigm is suggested | ||
// in the rust std docs | ||
// (https://doc.rust-lang.org/std/vec/struct.Vec.html#examples-18) | ||
let rv = unsafe { | ||
gdal_sys::GDALDatasetRasterIOEx( | ||
self.c_dataset, | ||
GDALRWFlag::GF_Read, | ||
window.0 as c_int, | ||
window.1 as c_int, | ||
window_size.0 as c_int, | ||
window_size.1 as c_int, | ||
data.as_mut_ptr() as *mut c_void, | ||
buffer_size.0 as c_int, | ||
buffer_size.1 as c_int, | ||
T::gdal_ordinal(), | ||
band_count as i32, | ||
bands.as_mut_ptr() as *mut c_int, | ||
pixel_space, | ||
line_space, | ||
band_space, | ||
options_ptr, | ||
) | ||
}; | ||
if rv != CPLErr::CE_None { | ||
return Err(_last_cpl_err(rv)); | ||
} | ||
|
||
unsafe { | ||
data.set_len(pixels); | ||
}; | ||
|
||
Ok(Buffer3D { | ||
size: (buffer_size.0, buffer_size.1, band_count), | ||
data, | ||
}) | ||
} | ||
|
||
/// Set the [`Dataset`]'s affine transformation; also called a _geo-transformation_. | ||
/// | ||
/// This is like a linear transformation preserves points, straight lines and planes. | ||
|
@@ -1327,4 +1454,131 @@ mod tests { | |
let mut ds = Dataset::open(fixture("roads.geojson")).unwrap(); | ||
assert!(ds.start_transaction().is_err()); | ||
} | ||
|
||
#[test] | ||
fn test_dataset_read_as_pixel_interleaving() { | ||
let ds = Dataset::open(fixture("m_3607824_se_17_1_20160620_sub.tif")).unwrap(); | ||
print!("{:?}", ds.raster_size()); | ||
let (width, height) = (4, 7); | ||
let band_count = ds.raster_count() as usize; | ||
|
||
// We compare a single dataset.read_as() to reading band-by-band using | ||
// band.read_as() | ||
let ds_buf = ds | ||
.read_as::<u8>( | ||
(0, 0), | ||
(width, height), | ||
(width, height), | ||
Some(ResampleAlg::Bilinear), | ||
ImageInterleaving::Pixel, | ||
BandSelection::All, | ||
) | ||
.unwrap(); | ||
assert_eq!(ds_buf.size, (width, height, band_count)); | ||
|
||
for band_index in 0..band_count { | ||
let band = ds.rasterband(band_index as isize + 1).unwrap(); | ||
let band_buf = band | ||
.read_as::<u8>( | ||
(0, 0), | ||
(width, height), | ||
(width, height), | ||
Some(ResampleAlg::Bilinear), | ||
) | ||
.unwrap(); | ||
assert_eq!(band_buf.size, (width, height)); | ||
for i in 0..height { | ||
for j in 0..width { | ||
assert_eq!( | ||
band_buf.data[i * width + j], | ||
ds_buf.data[i * width * band_count + j * band_count + band_index], | ||
); | ||
} | ||
} | ||
} | ||
} | ||
|
||
#[test] | ||
fn test_dataset_read_as_band_interleaving() { | ||
let ds = Dataset::open(fixture("m_3607824_se_17_1_20160620_sub.tif")).unwrap(); | ||
let size: (usize, usize) = (4, 7); | ||
let band_count = ds.raster_count() as usize; | ||
// We compare a single dataset.read_as() to reading band-by-band using | ||
// band.read_as() | ||
let ds_buf = ds | ||
.read_as::<u8>( | ||
(0, 0), | ||
size, | ||
size, | ||
Some(ResampleAlg::Bilinear), | ||
ImageInterleaving::Band, | ||
BandSelection::All, | ||
) | ||
.unwrap(); | ||
assert_eq!(ds_buf.size, (size.0, size.1, band_count)); | ||
|
||
for band_index in 0..band_count { | ||
let band = ds.rasterband(band_index as isize + 1).unwrap(); | ||
let band_buf = band | ||
.read_as::<u8>((0, 0), size, size, Some(ResampleAlg::Bilinear)) | ||
.unwrap(); | ||
assert_eq!(band_buf.size, size); | ||
assert_eq!( | ||
band_buf.data, | ||
ds_buf.data[band_index * size.0 * size.1..(band_index + 1) * size.0 * size.1] | ||
); | ||
} | ||
} | ||
|
||
#[test] | ||
fn test_dataset_read_as_band_selection() { | ||
let ds = Dataset::open(fixture("m_3607824_se_17_1_20160620_sub.tif")).unwrap(); | ||
let size: (usize, usize) = (4, 7); | ||
// We compare a single dataset.read_as() to reading band-by-band using | ||
// band.read_as() | ||
let ds_buf = ds | ||
.read_as::<u8>( | ||
(0, 0), | ||
size, | ||
size, | ||
Some(ResampleAlg::Bilinear), | ||
ImageInterleaving::Band, | ||
BandSelection::Subset(vec![1, 3]), | ||
) | ||
.unwrap(); | ||
assert_eq!(ds_buf.size, (size.0, size.1, 2)); | ||
|
||
for (i, band_index) in vec![1, 3].iter().enumerate() { | ||
let band = ds.rasterband(*band_index as isize).unwrap(); | ||
let band_buf = band | ||
.read_as::<u8>((0, 0), size, size, Some(ResampleAlg::Bilinear)) | ||
.unwrap(); | ||
assert_eq!(band_buf.size, size); | ||
assert_eq!( | ||
band_buf.data, | ||
ds_buf.data[i * size.0 * size.1..(i + 1) * size.0 * size.1] | ||
); | ||
} | ||
} | ||
|
||
#[test] | ||
fn test_dataset_read_as_buffer_size() { | ||
let ds = Dataset::open(fixture("m_3607824_se_17_1_20160620_sub.tif")).unwrap(); | ||
let size: (usize, usize) = (4, 7); | ||
let buffer_size: (usize, usize) = (8, 14); | ||
let band_count = ds.raster_count() as usize; | ||
let ds_buf = ds | ||
.read_as::<u8>( | ||
(0, 0), | ||
size, | ||
buffer_size, | ||
Some(ResampleAlg::Bilinear), | ||
ImageInterleaving::Band, | ||
BandSelection::All, | ||
) | ||
.unwrap(); | ||
// We only assert that we get the right buffer size back because checking for explicit | ||
// values is convoluted since GDAL handles the decimation by doing some interpolation | ||
assert_eq!(ds_buf.size, (buffer_size.0, buffer_size.1, band_count)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is heavily inspired by
RasterBand::read_as
since a lot of the logic is similar. The main change is that theRasterIO
function on dataset needs some direction about which bands to read ( thebands
param).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow this comment, but I'm wondering, is it not possible to specify which bands to read? IOW, is reading all the bands the only option here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely possible. I'll try to expose this in a way that make it easy to read all bands if you don't want to bother, but also gives you the power to be more specific.