Skip to content

Catalog

felix edited this page Mar 15, 2017 · 3 revisions

Main elements the catalogue should contain

Products

  • Sentinel raw zip file
  • Derived Sentinel ARP zip
  • One layer in a set of layers ie topography later in mastermap
  • Any physical thing we distribute
  • property mappings can go into importer config

Collection

  • Collection of raw Sentinel zip files
  • Mastermap
  • Land Cover map
  • Any collection of products (A collection may have just one product)
  • metadata will be common bits for all products in a collection
  • metadata will be expressed in the topcat gemini2 format

Collection naming convention

Naming convention as follows:
project/productset/x/y/z/version

Where x/y/z are for product specific differentiation.
No version exists for the top level product. previous version would have vX attributed.
Version numbers start at 1 except in very special cases as these are all release products.

Ie
Top level
eodip/sentinel1/ard/backscatter

Previous versions
eodip/sentinel1/ard/backscatter/v2
eodip/sentinel1/ard/backscatter/v1

Records for previous data that has expired and been deleted should be removed from the catalog

Examples:
eodip/sentinel1/ard/backscatter
scotland-gov-gi/lidar-1/processed/dsm
scotland-gov-gi/lidar-1/processed/dsm/gridded/27700/100000

Catalog should

  • Contain the spatial extents associated with a product allowing them to be queried efficiently
  • Contain the properties that are unique to that (type of) product * these would constitute the dimensions against which products would be filtered, such as the temporal extents.
  • Show how the lienage of a product relates to other products in the catalogue
  • Allow the lineage to be queried in a flexable fashion * ie products derived from a product, ancestors of a product. Also complex queries like collections containing products derived from the products of other collections (collection lineage)
  • Record the details of the process by which a product was derived
  • Provide exportable metadata about the product and collection in a known format (gemini). This should allign with topcat

Other Principals

  • Any record is created at the point a product / collection is added / created in the system. That data should not change from that point except to accomodate changes in catalog structure or correct errors.
  • A recreated product is a new product and should be treated accordingly
  • A deleted / overwritten product should be deleted.

Data Formats

  • Spatial Data = WGS84
  • File sizes = Bytes

Tech

Postgres has the ability to store json in a binary format (jsonb). Jsonb fields can be indexed and queried along with conventional scaler and relational data. Jsonb enables a simple table to be polymorphic. In our case a simple table can store data from products with many different properties as those property values can be stored in indexed jsonb

  • Jsonb is good for structured data that doesn't need to be queried in a maliable fassion.
  • Conventional scalar data fields and relational structurs are good for data that needs to be queried in a maliable fassion and common dimensions that would be queried across products, ie spatial and temporal extents.

collection 2 S2_raw 1 ****> Sentinal_ARD

collection 1 S2_raw 2 _|*****> Product