Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_raster doesn't maintain CRS for xarray 0.19 #417

Closed
JackLidge opened this issue Oct 6, 2021 · 19 comments
Closed

to_raster doesn't maintain CRS for xarray 0.19 #417

JackLidge opened this issue Oct 6, 2021 · 19 comments
Labels
question Further information is requested

Comments

@JackLidge
Copy link

I've got some code which reads in data from .BSQ files using gdal, creates an xarray dataset and then reprojects and writes it out using rioxarray. This used to work fine but recently the CRS information hasn't been present in the written out datasets. When I downgrade the xarray version, this then works fine. I'm wondering if the way to assign a CRS to a dataset has changed (either in xarray or rioxarray I'm not sure) between version 0.16 and 0.19.

An example of the code being used is below (I can provide a full reproduceable code sample if required, but thought I'd check I'm not using some functions incorrectly first).

Code Sample, a copy-pastable example if possible

wkt = 'PROJCS["WGS_1984_UTM_Zone_47N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",99.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]'
ds.rio.write_crs(wkt, inplace=True)
ds = ds.rio.reproject('+proj=utm +zone=48 +datum=WGS84 +units=m +no_defs', resolution=res)
ds = xr.where((ds == -9999) | (ds == 0), 15000, ds)
    
ds_out = ds.to_array(dim='band')
os.makedirs(os.path.dirname(tmp_fname), exist_ok=True)
ds_out.rio.to_raster(f'{tmp_fname}', dtype=np.int16, nodata=15000)

Problem description

This produces GeoTiffs with no CRS information in them. Running gdalinfo on a dataset written out using the above method (xarray v0.19.0) results in the following output:

Driver: GTiff/GeoTIFF
Files: out.tif
Size is 11721, 11721
Origin = (198675.613085082266480,4703357.477482775226235)
Pixel Size = (10.000000000000000,-10.000000000000000)
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (  198675.613, 4703357.477)
Lower Left  (  198675.613, 4586147.477)
Upper Right (  315885.613, 4703357.477)
Lower Right (  315885.613, 4586147.477)
Center      (  257280.613, 4644752.477)
Band 1 Block=512x512 Type=Int16, ColorInterp=Gray
Band 2 Block=512x512 Type=Int16, ColorInterp=Undefined
Band 3 Block=512x512 Type=Int16, ColorInterp=Undefined
Band 4 Block=512x512 Type=Int16, ColorInterp=Undefined
Band 5 Block=512x512 Type=Int16, ColorInterp=Undefined

Expected Output

Downgrading to xarray v.0.16.2:

Driver: GTiff/GeoTIFF
Files: out.tif
Size is 11721, 11721
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 48N",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 48N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",105,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["unknown"],
        AREA["World - N hemisphere - 102°E to 108°E - by country"],
        BBOX[0,102,84,108]],
    ID["EPSG",32648]]
Data axis to CRS axis mapping: 1,2
Origin = (198675.613085082266480,4703357.477482775226235)
Pixel Size = (10.000000000000000,-10.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (  198675.613, 4703357.477) (101d20'14.65"E, 42d25'26.45"N)
Lower Left  (  198675.613, 4586147.477) (101d23'49.89"E, 41d22'13.61"N)
Upper Right (  315885.613, 4703357.477) (102d45'38.51"E, 42d27'38.48"N)
Lower Right (  315885.613, 4586147.477) (102d47'50.32"E, 41d24'20.88"N)
Center      (  257280.613, 4644752.477) (102d 4'23.34"E, 41d55' 2.76"N)
Band 1 Block=512x512 Type=Int16, ColorInterp=Gray
Band 2 Block=512x512 Type=Int16, ColorInterp=Undefined
Band 3 Block=512x512 Type=Int16, ColorInterp=Undefined
Band 4 Block=512x512 Type=Int16, ColorInterp=Undefined
Band 5 Block=512x512 Type=Int16, ColorInterp=Undefined

Environment Information

For xarray=0.16.2:

rioxarray version 0.1.1
rasterio version 1.1.8
GDAL version 3.1.4
Python version 3.8.6
OS Information Linux-5.4.0-1057-aws-x86_64-with-glibc2.10

For xarray=0.19:

rioxarray version 0.1.1
rasterio version 1.1.8
GDAL version 3.1.4
Python version 3.8.6
OS Information Linux-5.4.0-1057-aws-x86_64-with-glibc2.10

Installation method

Conda install from environment.yml file, can provide full conda list output or environment.yml parameters if required, but both are quite long.

@JackLidge JackLidge added the bug Something isn't working label Oct 6, 2021
@snowman2
Copy link
Member

snowman2 commented Oct 6, 2021

@JackLidge can you upgrade the version of rioxarray to the latest version and confirm whether this issue still persists?

@JackLidge
Copy link
Author

Yep, I upgraded to 0.7.1 and the same issue still occurred.

Weird that conda was picking such an old version of rioxarray to install.

@snowman2
Copy link
Member

snowman2 commented Oct 6, 2021

What happens when you do this?

wkt = 'PROJCS["WGS_1984_UTM_Zone_47N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",99.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]'
ds.rio.write_crs(wkt, inplace=True)
ds = ds.rio.reproject('+proj=utm +zone=48 +datum=WGS84 +units=m +no_defs', resolution=res)
ds = xr.where((ds == -9999) | (ds == 0), 15000, ds)
    
os.makedirs(os.path.dirname(tmp_fname), exist_ok=True)
ds.rio.to_raster(f'{tmp_fname}', dtype=np.int16, nodata=15000)

@snowman2
Copy link
Member

snowman2 commented Oct 6, 2021

Also, what does ds_out contain?

@JackLidge
Copy link
Author

ds_out is just a DataArray containing the same data in ds. Print out of ds_out below:

<xarray.DataArray (band: 5, y: 11721, x: 11721)>
array([[[-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        ...,
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768]],

       [[-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        ...,
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768]],

       [[-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        ...,
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768]],

       [[-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        ...,
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768]],

       [[-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        ...,
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768]]],
      dtype=int16)
Coordinates:
  * x            (x) float64 1.987e+05 1.987e+05 ... 3.159e+05 3.159e+05
  * y            (y) float64 4.703e+06 4.703e+06 ... 4.586e+06 4.586e+06
    spatial_ref  int64 0
  * band         (band) <U5 'blue' 'green' 'red' 'nir' 'mask'

I think previously it flagged an error using to_raster on a dataset, so I converted a dataset to a dataarray when using to_raster. Probably an issue with previously using such an old version of rioxarray.

No issues with writing out a raster just using ds, but still no CRS information being provided from gdalinfo.

@snowman2
Copy link
Member

snowman2 commented Oct 6, 2021

Are you able to provide the source raster?

@JackLidge
Copy link
Author

JackLidge commented Oct 6, 2021

EDIT: File upload now below.

This is the code I've been using to open the file to get it into an xarray reading for reading out:

import gdal
dataset = gdal.Open(in_fname)
data = dataset.ReadAsArray()
ulx, xres, _, uly, _, yres = dataset.GetGeoTransform()
x_list = np.linspace(0, dataset.RasterXSize, dataset.RasterXSize+1)
y_list = np.linspace(0, dataset.RasterYSize, dataset.RasterYSize+1)

x_coords = []
y_coords = []
for x in x_list:
    x_coords.append(ulx + (x * xres))
    
for y in y_list:
    y_coords.append(uly + (y * yres))
ds = xr.DataArray(data=data, dims=['bands', 'y', 'x'],
     coords={'x': x_coords[:-1], 'y': y_coords[:-1], 'bands': ['blue', 'green', 'red', 'nir']})
ds = ds.to_dataset(dim='bands')

@snowman2
Copy link
Member

snowman2 commented Oct 6, 2021

Side question: Is there a reason you aren't using rioxarray.open_rasterio?

ds = rioxarray.open_rasterio(in_fname)

Your logic seems like the reason is related to #296. However, the final dataset doesn't match that, so maybe not.

@JackLidge
Copy link
Author

I wrote the script which does this processing a while ago - at the time this was the only method I could find to get everything to work in the way it does currently. I can't remember exactly why I used this specific combination of gdal, xarray and rioxarray to do everything.

At the moment it works fine by just holding xarray at v0.16 but if this isn't can't be solved for whatever reason then I'll probably go back and rewrite it to see if it will work using open_rasterio. I'll run some checks on opening this file with open_rasterio later on today and see if that fixes the issues with writing out CRS.

@snowman2
Copy link
Member

snowman2 commented Oct 7, 2021

It seems I may need that environment.yml file. My environment doesn't seem to like the file:

With osgeo.gdal:

ERROR 4: `T47TQG_20210919T035541_TOA_rad_10m_atm.bsq' not recognized as a supported file format.

With rasterio:

RasterioIOError: 'T47TQG_20210919T035541_TOA_rad_10m_atm.bsq' not recognized as a supported file format.

My environment:

rioxarray (0.7.2.dev0) deps:
  rasterio: 1.2.9
    xarray: 0.19.0
      GDAL: 3.3.2

Other python deps:
     scipy: None
    pyproj: 3.2.1

System:
    python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)  [GCC 9.4.0]
   machine: Linux-5.4.0-88-generic-x86_64-with-glibc2.31

@snowman2
Copy link
Member

snowman2 commented Oct 7, 2021

I am wondering if it is a file format where there are multiple files and some of them are missing in the zip you uploaded?

@JackLidge
Copy link
Author

There is an .hdr file which might be needed to download it, although I've never explicited checked whether it's needed for loading it or not. So I've re-uploaded the .bsq file with its corresponding .hdr file. I've also added the environment.yml file used in my conda environment.
Link here

@snowman2
Copy link
Member

snowman2 commented Oct 7, 2021

This works for me:

import rasterio
import rioxarray

with rasterio.open("T47TQG_20210919T035541_TOA_rad_10m_atm.bsq") as rds, rasterio.vrt.WarpedVRT(rds, crs="EPSG:32648") as wds:
    xds = rioxarray.open_rasterio(wds)
    original_crs = xds.rio.crs
xds = xarray.where((xds == -9999) | (xds == 0), 15000, xds)
xds.rio.write_crs(original_crs, inplace=True)
xds.rio.write_nodata(15000, inplace=True)
xds.rio.to_raster('out.tif', dtype="int16")
out_xds = rioxarray.open_rasterio("out.tif")
print(out_xds.rio.crs)
EPSG:32648

@snowman2
Copy link
Member

snowman2 commented Oct 7, 2021

Updated for: https://corteva.github.io/rioxarray/stable/getting_started/manage_information_loss.html

import rasterio
import rioxarray

with rasterio.open("T47TQG_20210919T035541_TOA_rad_10m_atm.bsq") as rds, rasterio.vrt.WarpedVRT(rds, crs="EPSG:32648") as wds:
    xds = rioxarray.open_rasterio(wds)
    original_attrs = xds.attrs.copy()
    original_encoding = xds.encoding.copy()
    original_crs = xds.rio.crs

xds = xarray.where((xds == -9999) | (xds == 0), 15000, xds)

# preserve lost informaiton
xds.rio.update_attrs(original_attrs, inplace=True)
xds.rio.update_encoding(original_encoding, inplace=True)
xds.rio.write_crs(original_crs, inplace=True)
xds.rio.write_nodata(15000, inplace=True)

# write to file
xds.rio.to_raster('out_attrs.tif', dtype="int16")

@snowman2
Copy link
Member

snowman2 commented Oct 7, 2021

Note: The hdr file was what was needed to work. The environment I used was the same as I posted above ref

@JackLidge
Copy link
Author

Yep, can confirm that works. Thanks very much for all your help, I'll look at reconfiguring this script to use that method.

I haven't seen the vrt.WarpedVRT function before, is the CRS supplied there the input or output CRS? I wasn't fully sure based off scanning the rasterio docs for the function.

@snowman2
Copy link
Member

snowman2 commented Oct 8, 2021

Glad to hear that it worked. The crs kwarg is the destination CRS. Here are the API docs: https://rasterio.readthedocs.io/en/latest/api/rasterio.vrt.html?highlight=warpedvrt#rasterio.vrt.WarpedVRT

@snowman2
Copy link
Member

snowman2 commented Oct 8, 2021

Going to close as this issue seems to be resolved.

@snowman2 snowman2 closed this as completed Oct 8, 2021
@snowman2 snowman2 added question Further information is requested and removed bug Something isn't working labels Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants