Skip to content

Commit

Permalink
Simplify documentation of datamodels and usage of plugins (#977)
Browse files Browse the repository at this point in the history
# Description
I keep having problems with creating my own plugins and datamodels, and
there
are so many inconsistent examples around. 

I try here to clarify some of the things and add a small section on how
to make entities, that is easy to find.

## Type of change
- [ ] Bug fix & code cleanup
- [ ] New feature
- [ ] Documentation update
- [ ] Test update

## Checklist for the reviewer
This checklist should be used as a help for the reviewer.

- [ ] Is the change limited to one issue?
- [ ] Does this PR close the issue?
- [ ] Is the code easy to read and understand?
- [ ] Do all new feature have an accompanying new test?
- [ ] Has the documentation been updated as necessary?
  • Loading branch information
francescalb authored Oct 28, 2024
2 parents 7035047 + 25e94a3 commit 33c841f
Show file tree
Hide file tree
Showing 5 changed files with 118 additions and 79 deletions.
77 changes: 0 additions & 77 deletions doc/user_guide/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,83 +269,6 @@ Relations are currently not explored in metadata, but are included because of
their generality.
However, relations are heavily used in [collections].


### Representing an entity
Lets start to make a "Person" entity, where we want to describe his/her name, age and skills.

```json
{
"uri": "http://onto-ns.com/meta/0.1/Person",
"meta": "http://onto-ns.com/meta/0.3/EntitySchema",
"description": "A person.",
"dimensions": [
{
"name": "N",
"description": "Number of skills."
}
],
"properties": [
{
"name": "name",
"type": "string",
"description": "Full name."
},
{
"name": "age",
"type": "float",
"unit": "years",
"description": "Age of person."
},
{
"name": "skills",
"type": "string",
"shape": ["N"],
"description": "List of skills."
}
]
}
```

First we have "uri" identifying the entity, "meta" telling that this is an instance of the entity schema (hence an entity) and a human description.
Then comes "dimensions".
In this case one dimension named "N", which is the number of skills the person has.
Finally we have the properties; "name", "age" and "skills".
We see that "name" is represented as a string, "age" as a floating point number with unit years and "skills" as an array of strings, one for each skill.


### SOFT7 representation
Based on input from [SOFT7], DLite also supports a slightly shortened representation of entities.
The "Person" entity from the above example will in this representation, look like:

```json
{
"uri": "http://onto-ns.com/meta/0.1/Person",
"description": "A person.",
"dimensions": {
"N": "Number of skills."
},
"properties": {
"name": {
"type": "string",
"description": "Full name."
},
"age": {
"type": "float",
"unit": "years",
"description": "Age of person."
},
"skills": {
"type": "string",
"shape": ["N"],
"description": "List of skills."
}
}
}
```

In this representation defaults the `meta` field to the entity schema if it is left out.
Dimensions and Properties are dictionaries (JSON objects) instead of arrays with the dimension or property name as key.

references
----------

Expand Down
73 changes: 73 additions & 0 deletions doc/user_guide/datamodels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
Representing a datamodel (entity)
----------------------------------

The underlying structure of DLite datamodels are described under [concepts].

Here, at set of rules on how to create a datamodel is presented.

Note that several other possibilities are avilable, and this can be seen in the
examples and tests present in the repository.

We choose here to present only one method as mixing reprentation methods might
be confusing. Note, however that yaml and json representations are interchangable.

A generic example with some comments for clarity can be seen below.

```yaml
uri: http://namespace/version/name
description: A description of what this datamodel represents.
dimensions: # Named dimensions referred to in the property shapes. Simplest to represent it as a dict, set to {} if there are no dimensions
name_of_dimension: description of dimension
properties:
name_of_property1:
description: What is this property
type: ref # Can be any on string, float, double, int, ref ....
unit: unit # Can be ommitted if the property has no unit
shape: [name_of_dimension] # Can be omitted if the property is a scalar
$ref: http://namespace/version/name_of_referenceddatamodel # only if type is ref
```
The keywords in the datamodel have the following meaning:
* `uri`: A URI that uniquely identifies the datamodel.
* `description`: A human description that describes what this datamodel represents.
* `dimensions`: Dimensions of the properties (referred to by the property shape). Properties can have the same dimensions, but not necessarily. Each dimension is described by:
- name of the dimension
- a human description of the dimension
In the below example there is one dimension with name "N" and description "Number of skills."
* `properties`: Sub-parts of the datamodel that describe the individual data fields. A property has a name and is further specified by the following keywords:
- `description`: Human description of the property.
- `type`: Data type of the property. Ex: "blob5", "boolean", "uint", "int32", "string", "string10", "ref", ...
- `$ref`: Optional. URI of a sub-datamodel. Only used if type is "ref".
- `unit`: Optional. The unit. Ex: "kg", "km/h", ... Can be omitted if the property has no unit.
- `shape`: Optional. Describes the dimensionality of the property as a list of dimension names. Ex: `[N]`. Can be omitted if the property has no shape, i.e. the instance always has only one value. This is equivalent to a 0-dimensional array, i.e. shape=[].
The datamodel below has three properties; "name", "age" and "skills". We see that "name" is represented as a string, "age" as a floating point number with unit years and "skills" as an array of strings, one for each skill.


A slightly more realistic example is the "Person" entity, where we want to describe his/her name, age and skills:

```yaml
uri: http://onto-ns.com/meta/0.1/Person
description: A person.
dimensions:
N: Number of skills.
properties:
name:
description: Full name.
type: string
age:
description: Age of person.
type: float
unit: years
skills:
description: List of skills.
type: string
shape: [N]
```


dlite-validate
==============
The [dlite-validate tool][./tools.md#dlite_validate] can be used to check if a specific representation (in a file) is a valid DLite datamodel


[concepts]: https://sintef.github.io/dlite/user_guide/concepts.html
1 change: 1 addition & 0 deletions doc/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ User Guide
:caption: Contents

concepts
datamodels
type-system
exceptions
collections
Expand Down
34 changes: 32 additions & 2 deletions doc/user_guide/storage_plugins.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Storage plugins
===============
Storage plugins / Drivers
=========================

Content
-------
Expand Down Expand Up @@ -28,6 +28,36 @@ It also comes with a specific `Blob` and `Image` storage plugin, that can load a
Storage plugins can be written in either C or Python.


How to make storage plugins available
-------------------------------------

As described below it is possible (and most often advisable) to create specific drivers (storage plugins) for your data.
Additional storage plugins drivers can be made available by setting the environment variables
`DLITE_STORAGE_PLUGIN_DIRS` or `DLITE_PYTHON_STORAGE_PLUGIN_DIRS` e.g.:
```bash
export DLITE_STORAGE_PLUGIN_DIRS=/path/to/new/folder:$DLITE_STORAGE_PLUGIN_DIRS
```

Within python, the path to the directory containing plugins can be added as follows:

```python
import dlite
dlite.python_storage_plugin_path.append("/path/to/plugins/dir")
```

Often drivers are connected to very specific datamodel (entities).
DLite will find these datamodels if the path to their directory is set with the
environment variable `DLITE_STORAGES` or added within python with `dlite.storage_path.append` similarly to described above for drivers.


```{attention}
Often, during development dlite will fail unexpectedly. This is typically either because of an error in the
datamodel or the driver.
The variable DLITE_PYDEBUG can be set as `export DLITE_PYDEBUG=` to get python debugging information.
This will give information about the driver.
It is advisable to first check that the datamodel is valid with the command `dlite-validate datamodelfilename`.
```

Using storages implicitly from Python
-------------------------------------
For convenience DLite also has an interface for creating storages implicitly.
Expand Down
12 changes: 12 additions & 0 deletions doc/user_guide/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@ Tools
DLite comes with a small set of tools.


dlite-validate
--------------
The dlite-validate tool can be used to check if a specific representation (in a file) is a valid DLite datamodel.

This can be run as follows
```bash
dlite-validate filename.yaml # or json
```

It will then return a list of errors if it is not a valid datamodel.


dlite-getuuid
-------------
This is a handy small tool for generating a random UUID or getting the UUID corresonding to an URI.
Expand Down

0 comments on commit 33c841f

Please sign in to comment.