Skip to content

Commit

Permalink
remove caom2 package from VO-DML
Browse files Browse the repository at this point in the history
initial draft CAOM document
  • Loading branch information
pdowler committed Aug 29, 2024
1 parent 1be32a7 commit 76b53a4
Show file tree
Hide file tree
Showing 8 changed files with 5,042 additions and 3,042 deletions.
202 changes: 179 additions & 23 deletions CAOM.tex
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,11 @@ \section*{Conformance-related definitions}

\section{Introduction}

???? Write something ????
The Common Archive Observation Model (CAOM) is a metadata model that describes
astronomical data stored in archives and make that data findable and accessible
(the first two concepts of the FAIR principles: Findable, Accessible, Interoperable,
Reusable). CAOM places no constraints on the format of the data itself and thus does
not support or enable interoperable and reusable data.

\subsection{Role within the VO Architecture}

Expand Down Expand Up @@ -67,7 +71,43 @@ \section{Use Cases}

\subsection{Describing Data}

\subsection{Operating a Data Management System}
The primary design goal of CAOM is to describe astronomical data so that
users (astronomers) can query archives and discover data that is suitable
for a specific research goal or project.

The metadata includes descriptive quantities to help users discover data,
such as coverage in position, energy, and time, logical data types like
images, spectra, and time series, and some some origin metadata like
telescope, instrument, and proposal information.

The model also includes a structure that captures some of the important relationships
between data products, such as different products of an observation that have had
different processing applied and new derived observations that are created by
combining data from several other observations. CAOM also describes the way a
data product is made up of one or more physically stored components while remaining
loosely coupled with the storage system itself.

CAOM provides a transparent way to express data access rights so users can see
which data they have access to and even query for data that will soon be available.

Most importantly, new kinds of data products and areas of study sometimes require
that the model must evolve (and new features) to better describe new data and enable
the queries that new research requires. CAOM has a well defined mechanism to support
evolution and still bring all legacy data forward with minimal effort.

\subsection{Implementation and Operations}

CAOM also supports a range of data management functions that make it implementable
and robust for large scale archive operations. The integrity of metadata that is
created, stored, and accessed can be verified using a metadata checksum algorithm.
The metadata checksums can also be used to optimise interactions like database
transactions and test database persistence and other forms of serialisation for
completeness and correctness.

The model is designed to support robust synchronisation of observation metadata
between the origin and other mirror sites through modification timestamps and
the use of metadata checksums to insure that metadata transport does not introduce
any corruption or incompleteless (details elsewhere ???).

\section{Model Overview}

Expand All @@ -78,29 +118,19 @@ \section{Model Overview}
\label{fig:core}
\end{figure}

\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM2datatypes.png}
\caption{CAOM Data Types}
\label{fig:datatypes}
\end{figure}

\subsection{Entity}

\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM4entities.png}
\caption{CAOM Entities}
\label{fig:entity}
\end{figure}

\subsection{Observation}

An Observation is ...
An Observation is the result of some activity to create data. Data acquisition by a
telescope creates a SimpleObservation, usually with one Plane describing the raw
data.

\subsection{Plane}

A Plane is ...
A Plane is a single data product within an Observation, usually with one or more
Artifact(s) that correspond to resources (usually files) that are stored. Processing
that transforms the data in a Plane in some fashion usually creates a new Plane in
the same observation. Common processing like calibration to remove the instrument
signature change the calibration level of the plane.

\subsection{Artifact}

Expand All @@ -114,12 +144,55 @@ \subsection{Part}
of such an Artifact. The meaning of Part depends on the type of resource that the
Artifact refers to, as described by the Artifact.contentType.

TBD: keep, deprecate, or remove?
Note: Part is not shown in the UML diagrams. TBD: keep, deprecate, or remove?

\subsection{Chunk}

A Chunk describes a single data array using WCS (World Coordinate System) concepts.
TBD: keep, deprecate or remove?

Note: Chunk is not shown in the UML diagrams. TBD: keep, deprecate or remove?

\subsection{Data Types}
Some of the classes in the model are intended to be used as data types (e.g. columns
types in a database and exposed as such in a TAP service).

\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM2datatypes.png}
\caption{CAOM Data Types}
\label{fig:datatypes}
\end{figure}

\subsection{Vocabularies}

CAOM uses a mixture of enumerations and vocabulary references. As it has evolved,
several concepts began as enumerations and were later converted to use a vocabulary
when it became clear that the IVOA Vocabularies process was a more appropriate way
to support the gradual evolution of set of concepts needed by the community.

\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM3vocabularies.png}
\caption{Enumerations and Vocabularies}
\label{fig:vocab}
\end{figure}

\subsection{Entity}

The Entity concept defines the common metadata necessary to persist and validate
instances of classes within the model. Practically, the entity classes in the
model are related by one-to-many composition and thus indicate a limit when
implementing (e.g. in a relational mapping, each entity has to be in a separate
table and one table per entity would be the minimum number of tables required
to persist complete instances).

\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM4entities.png}
\caption{CAOM Entities}
\label{fig:entity}
\end{figure}


% include external generated file so this tex doc is easy to edit
\input{generated.tex}
Expand All @@ -129,8 +202,91 @@ \section{Changes from Previous Versions}

\subsection{Changes from OpenCADC CAOM-2.4}

TODO
\subsubsection{General Changes}

- change `Plane.position.bounds` to be mandatory

- change `Plane.energy.bounds` to be mandatory

- change `Plane.time.bounds` to be mandatory

- change `Plane.polarization.states` to require at least 1 value

The above changes mean that each of position/energy/time/polarization objects
have one mandatory field and queries using a single is not null constraint
can be used to detect if the object is present.

- add `ArtifactDescription` entity to support providing descriptions with links
(eg in a DataLink output)

- add `Artifact.descriptionID` to refer to a shared `ArtifactDescription`

- add `Proposal.reference` as optional proposal metadata (URI to web page, paper, etc)

- split `Entity` into a base `Entity` class with main properties and a `CaomEntity`
suitable for having child entities (by composition); one or both could be extracted
and re-used in other models (TBD)

\subsubsection{Radio Support}

For radio observations, many properties such as field-of-view, spatial and spectral resolution are dependent on frequency. Modern,
wideband facilities can have large frequency-dependent variation in these properties within a single observation.

- add `Plane.position.minBounds` (Shape) to describe variable coverage (bounds is already max bounds)

- add `Plane.position.maxAngularScale` (Interval) to describe min/max scale of signal/objects in the data

- add `Plane.energy.resolution` (double) to describe the absolute resolution (representative value, probably mean/pixel)

- add `Plane.energy.resolutionBounds` (Interval) to describe the min/max absolute resolution when it varies across the data

- add `Plane.time.exposureBounds` (Interval) to describe the min/max exposure time when it varies across the data

- change `Plane.energy.restwav` to `Plane.energy.rest` so the name makes sense with different profiles (quantities and units)

- remove `Plane.position.timeDependent` as it was only used to explain why Plane.position.bounds was null because of tracking mode

- add `Observation.telescope.trackingMode` and refer to a non-existent IVOA vocabulary to describe the
tracking/pointing of the telescope during the observation; null indicates sidereal tracking (for backwards compat)

- add `Plane.uv` (Visibility) to describe UV-plane (expect: only used when dataProductType=visibility)

- add `Plane.uv.distance` (Interval) to describe the min and max distance in the UV plane

- add `Plane.uv.distributionEccentricity` (double); mandatory or optional within Visibility?

- add `Plane.uv.distributionFill` (double); mandatory or optional within Visibility?

- change `Plane.polarization.states` to refer to a (non-existent) vocabulary (replaces PolarizationState enum) that could be extracted from WCS, ObsCore, and community usage/extensions

\subsubsection{Use of Identifiers}

- replace `Observation.observationID` (String) with `Observation.uri` (URI) to be the complete self contained identifier; values would be used in `DerivedObservation.members` to refer to other observations

- replace `Plane.productID` (String) with `Plane.uri` (URI) to be the complete self-contained identifier; values would be used in `Plane.provenance.inputs` to refer to other planes
- remove `Plane.creatorID` because it is essentially redundant vs Plane.uri

A `publisherID` value is strictly outside the core model because the value must be changed (generated) when CAOM metadata is synced from one publisher to a differnt publisher.

\subsubsection{Reconcile with IVOA Usage}

- change `Plane.dataProductType` to refer directly to the IVOA product-type vocabulary

- change `Artifact.productType` to refer directly to the IVOA DataLink Core (semantics) vocabulary

- change `Plane.observable.ucd` to refer directly to IVOA UCD1+

- add `Plane.position.calibration` and refer to a non-existent IVOA vocabulary that could be extracted from the ObsCore optional section

- add `Plane.energy.calibration` (as above)

- add `Plane.time.calibration` (as above)

- add `Plane.observable.calibration` (as above)

- remove SampledInterval in favour of separate Interval and Interval[] columns in Energy, Time, CustomAxis

- remove MultiPolygon in favour of separate Polygon and MultiShape columns; SegmentType and Vertex removed (unused)

% NOTE: IVOA recommendations must be cited from docrepo rather than ivoabib
% (REC entries there are for legacy documents only)
Expand Down
Loading

0 comments on commit 76b53a4

Please sign in to comment.