Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing gdal-created metadata from python #251

Closed
djhoese opened this issue Dec 12, 2019 · 5 comments
Closed

Accessing gdal-created metadata from python #251

djhoese opened this issue Dec 12, 2019 · 5 comments

Comments

@djhoese
Copy link

djhoese commented Dec 12, 2019

I have a tiledb array that I've created from a geotiff using gdal_translate. When I open the array in python I can't find a way to access some of the metadata that gdalinfo is able to print out:

Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_DATETIME=2019:12:12 19:10:18

It seems that this information is added to the .aux.xml file inside the tiledb directory. I'm guessing tiledb-py doesn't use that at all? Similar to how translating a geotiff to a PNG makes an aux.xml that is not part of the PNG standard but can still be picked up by GDAL?

@normanb
Copy link
Contributor

normanb commented Dec 12, 2019

@djhoese there are several ways to get to this metadata with Python. You can open the .aux.xml directly in Python, or use the virtual filesystem inside of TileDB.

However an easier way is to use Rasterio, we have more detail about the Rasterio integration with TileDB here. Once you are using Rasterio with GDAL3 and TileDB then the commands rio info and direct access with rasterio.open will provide this metadata.

We are moving away from using a sidecar file to using TileDB array metadata (in v1.7+), however this does not impact the ways to access this metadata in GDAL or Rasterio.

@djhoese
Copy link
Author

djhoese commented Dec 13, 2019

Thanks @normanb. I have GDAL 3.0.2 from conda-forge and tiledb and I'm not able to access the metadata in the .xml file. I tried rio info on the command line, xarray.open_rasterio, and rasterio.open. If I load the original geotiff I can see the datetime metadata I'm looking for.

XML Contents:

<PAMDataset>
  <SRS dataAxisToSRSAxisMapping="1,2">PROJCS["unknown",GEOGCS["unknown",DATUM["unknown",SPHEROID["GRS 1980",6378137,298.257222096042]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433]],PROJECTION["Geostationary_Satellite"],PARAMETER["central_meridian",-75],PARAMETER["satellite_height",35786023],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],EXTENSION["PROJ4","+proj=geos +sweep=x +lon_0=-75 +h=35786023 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs"]]</SRS>
  <GeoTransform> -5.4348948850560002e+06,  5.0100432200000000e+02,  0.0000000000000000e+00,  5.4348948850560002e+06,  0.0000000000000000e+00, -5.0100432200000000e+02</GeoTransform>
  <Metadata domain="IMAGE_STRUCTURE">
    <MDI key="DATA_TYPE">Byte</MDI>
    <MDI key="NBITS">8</MDI>
    <MDI key="X_SIZE">21696</MDI>
    <MDI key="Y_SIZE">21696</MDI>
  </Metadata>
  <Metadata>
    <MDI key="AREA_OR_POINT">Area</MDI>
    <MDI key="TIFFTAG_DATETIME">2019:12:12 19:10:18</MDI>
  </Metadata>
</PAMDataset>

Open GeoTIFF with xarray:

In [2]: import xarray as xr
In [3]: ds = xr.open_rasterio('GOES-16_ABI_RadF_true_color_20191212_191018_GOES-East.tif')
In [4]: ds
Out[4]:
<xarray.DataArray (band: 4, y: 21696, x: 21696)>
[1882865664 values with dtype=uint8]
Coordinates:
  * band     (band) int64 1 2 3 4
  * y        (y) float64 5.435e+06 5.434e+06 5.434e+06 ... -5.434e+06 -5.435e+06
  * x        (x) float64 -5.435e+06 -5.434e+06 ... 5.434e+06 5.435e+06
Attributes:
    transform:         (501.004322, 0.0, -5434894.885056, 0.0, -501.004322, 5...
    crs:               +proj=geos +lon_0=-75 +h=35786023 +x_0=0 +y_0=0 +ellps...
    res:               (501.004322, 501.004322)
    is_tiled:          1
    nodatavals:        (nan, nan, nan, nan)
    scales:            (1.0, 1.0, 1.0, 1.0)
    offsets:           (0.0, 0.0, 0.0, 0.0)
    AREA_OR_POINT:     Area
    TIFFTAG_DATETIME:  2019:12:12 19:10:18

Open TileDB Array that was created with gdal_translate and has the XML file internally:

In [3]: ds = xr.open_rasterio('goes16_fldk_true_color_20191212_191018')                                                                                                                                                                                                       
In [4]: ds                                                                                                                                                                                                                                                                    
Out[4]: 
<xarray.DataArray (band: 4, y: 21696, x: 21696)>
[1882865664 values with dtype=float64]
Coordinates:
  * band     (band) int64 1 2 3 4
  * y        (y) float64 5.435e+06 5.434e+06 5.434e+06 ... -5.434e+06 -5.435e+06
  * x        (x) float64 -5.435e+06 -5.434e+06 ... 5.434e+06 5.435e+06
Attributes:
    transform:   (501.004322, 0.0, -5434894.885056, 0.0, -501.004322, 5434894...
    crs:         +proj=geos +lon_0=-75 +h=35786023 +x_0=0 +y_0=0 +ellps=GRS80...
    res:         (501.004322, 501.004322)
    is_tiled:    1
    nodatavals:  (nan, nan, nan, nan)
    scales:      (1.0, 1.0, 1.0, 1.0)
    offsets:     (0.0, 0.0, 0.0, 0.0)

@normanb
Copy link
Contributor

normanb commented Dec 13, 2019

@djhoese I am able to see the datetime field with gdalinfo against a tiledb array.

For rasterio, these fields are seen as tag, I added support for additional tiff tags to be reported in this PR against xarray - pydata/xarray#3249 but it looks like I need to make this more general for other formats within rasterio, I can do that and create an example.

@normanb
Copy link
Contributor

normanb commented Dec 13, 2019

For rasterio I am able to see all the metadata tags (including datetime) but this does need to be copied over for xarray. I have pasted below my interactive rio session.

rio insp meta_test 
Rasterio 1.0.25 Interactive Inspector (Python 3.7.2)
Type "src.meta", "src.read(1)", or "help(src)" for more information.
>>> src.meta
{'driver': 'TileDB', 'dtype': 'uint8', 'nodata': None, 'width': 21696, 'height': 21696, 'count': 3, 'crs': CRS.from_wkt('PROJCS["unknown",GEOGCS["unknown",DATUM["unknown",SPHEROID["GRS 1980",6378137,298.257222096042]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433]],PROJECTION["Geostationary_Satellite"],PARAMETER["central_meridian",-75],PARAMETER["satellite_height",35786023],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],EXTENSION["PROJ4","+proj=geos +sweep=x +lon_0=-75 +h=35786023 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs"]]'), 'transform': Affine(501.004322, 0.0, -5434894.885056,
       0.0, -501.004322, 5434894.885056)}
>>> src.tags()
{'AREA_OR_POINT': 'Area', 'TIFFTAG_DATETIME': '2019:12:12 19:10:18'}

@ihnorton
Copy link
Member

ihnorton commented Feb 2, 2022

Please comment and reopen if needed.

@ihnorton ihnorton closed this as completed Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants