Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torchgeo reads dataset as all zeros while rasterio can read the contents #2466

Open
erob-archim opened this issue Dec 10, 2024 · 1 comment
Labels
datasets Geospatial or benchmark datasets

Comments

@erob-archim
Copy link

Description

Hi, I'm not exactly sure what's causing this issue, though I have a suspicion it might be related to compression of geotiffs.

When data is downloaded from the UK environment agency's LIDAR data programme, it shows in torchgeo as all zeros.

This can be rectified by running the file through gdal_merge.py - I'm unsure why.

This is not an issue when reading directly with rasterio.

Steps to reproduce

Run the following file. It downloads the relevant data,
Checking for a dataset reading all zeros, I get:

Rasterio compressed all zero: False
Rasterio uncompressed all zero: False
torchgeo compressed all zero: True
torchgeo uncompressed all zero: False
import subprocess
from pathlib import Path

import rasterio
from torchgeo.datasets import RasterDataset

subprocess.run(["wget", "https://api.agrimetrics.co.uk/tiles/collections/survey/national_lidar_programme_dsm/2023/1/SP7015?subscription-key=public", "-O", "test_file.zip"])
subprocess.run(["unzip", "test_file.zip"])

test_file = Path("DSM_SP7015_P_12740_20230402_20230404.tif")
uncompressed_file = test_file.with_suffix(".uncompressed.tif")

with rasterio.open(test_file) as src:
    rasterio_compressed = src.read()
    rasterio_compressed_all_zero = (rasterio_compressed == 0).all()

torchgeo_compressed_dataset = RasterDataset(test_file)
torchgeo_compressed = torchgeo_compressed_dataset[torchgeo_compressed_dataset.bounds]["image"]
torchgeo_compressed_all_zero = (torchgeo_compressed == 0).all()

uncompressed_file.unlink(missing_ok=True)
subprocess.run(["gdal_merge.py", "-ot", "Float32", "-of", "GTiff", "-o", str(uncompressed_file), str(test_file)])

with rasterio.open(uncompressed_file) as src:
    rasterio_uncompressed = src.read()
    rasterio_uncompressed_all_zero = (rasterio_uncompressed == 0).all()

torchgeo_uncompressed_dataset = RasterDataset(uncompressed_file)
torchgeo_uncompressed = torchgeo_uncompressed_dataset[torchgeo_uncompressed_dataset.bounds]["image"]
torchgeo_uncompressed_all_zero = (torchgeo_uncompressed == 0).all()

print(f"Rasterio compressed all zero: {rasterio_compressed_all_zero}")
print(f"Rasterio uncompressed all zero: {rasterio_uncompressed_all_zero}")
print(f"torchgeo compressed all zero: {torchgeo_compressed_all_zero}")
print(f"torchgeo uncompressed all zero: {torchgeo_uncompressed_all_zero}")

Version

ff3d087 (latest main)

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Dec 10, 2024
@adamjstewart
Copy link
Collaborator

I'm able to reproduce this issue without TorchGeo using only rasterio. The following code:

import rasterio
import rasterio.merge

filename = 'DSM_SP7015_P_12740_20230402_20230404.tif'
with rasterio.open(filename) as src:
    dest, out_transform = rasterio.merge.merge([src])
    print((dest == 0).all())

gives the following output when run:

lib/python3.13/site-packages/rasterio/merge.py:321: UserWarning: Ignoring nodata value. The nodata value, -3.4028234663852886e+38, cannot safely be represented in the chosen data type, float32. Consider overriding it using the --nodata option for better results. Falling back to first source's nodata value.
  warnings.warn(
True

The warning message there is probably related. I'm not sure why rasterio is returning all zeros just because it can't handle the nodata value. I think this bug is worth reporting to rasterio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

No branches or pull requests

2 participants