-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialise as EMMO datasets #796
Conversation
…ter/software/dlite into 652-serialise-data-models-to-tbox
- Changed some blank nodes to classes and named literals. - Updated dataset figure.
…/dlite into 652-serialise-data-models-to-tbox
…/dlite into 652-serialise-data-models-to-tbox
OK, removed the chemistry dataset.
Added more info to the PR description. |
I do not understand why we need to specify this string in the Also, the example test_dataset1_save.py creates an instance of a dataset from some hard-coded data. Would it be possible to show how to instantiate a dataset coming from an external file, as it seems to me a more common scenario. I would suggest using the Fluid dataset from OntoTrans' OTE demonstration: isobaric_liquids_nist.xlsx and its associated datamodel. Note that the spreadsheet has three tabs: Benzene, water, and hexane, but these are not specified in the datamodel's dimensions (please check if this is correct).
Question: where are the prefixes EMMO and MAP defined in |
About the new functions. Suppose a datamodel, mappings, and instance of a particular dataset have already been created and stored in the KB. Which function should be used to add a different dataset having the same datamodel and mappings? |
I agree that this additional mapping is probably more confusing than helpful. I just included this relation since it was in the figure you and Emanuele made. Note that String and StringData are two different concepts in EMMO. The fact that |
Good point. The test_dataset1_save.py file is intended for unit testing of the new |
MAP is pre-defined and imported from >>> from dlite.dataset import EMMO
>>> EMMO.Atom
'https://w3id.org/emmo#EMMO_eb77076b_a104_42ac_a065_798b2d2809ad' |
For that you can use |
Good. Suppose one wants to create new instances of datasets whose mappings are generic, like:
A particular instance should have the Substance specified, e.g.
But the rest of the mapping should be the same. How can instances be created that share the same mappings but differ for a few triplets? |
…/dlite into 652-serialise-data-models-to-tbox
…/dlite into 652-serialise-data-models-to-tbox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but propert documentation and examples of use, should have high priority, for this new functionality to have any real value (for others than the core developers).
Since the functionality is needed in the development of SS1 in OpenModel with short due date, I think we can approve the functionality, provided that the documentation has high priority.
We describe the datasets at the TBOX level. At this level, the simple mappings like But you are right that we at the individual level we can add simple relations. That is definitely useful. Lets discuss it and make a new PR for that. |
Description
This PR implements a new
dlite.dataset
module for serialisation of DLite datamodels and instances to an RDF representation based on EMMO following the representation shown in the figure below.The main interface is exposed by four new functions:
add_dataset()
: stores datamodel+mappings to a triplestoreadd_data()
: stores an instance (or datamodel) to a triplestoreget_dataset()
: loads datamodel+mappings from a triplestoreget_data()
: loads an instance (or datamodel) from a triplestoreQuestion: Is the naming of these functions understandable? The term dataset comes from EMMO, but as a user of DLite it may be confusing. Maybe
save_datamodel()
,save_instance()
,load_datamodel()
,load_instance()
would be more intuitive?Two tests are added using a datamodel matching what is shown in the figure.
test_dataset1_dave.py loads first a
FluidData
datamodel and documents it semantically with the following mappingsNote the use of
emmo:isDescriptionFor
relations in the mappings. They are stored as-is in the triplestore.The
map:mapsTo
are translated tordfs:subClassOf
when serialised to the triplestore.Then it uses the
add_dataset()
function in the newdlite.dataset
module and stores it as RDF in a local triplestore. The content of the triplestore corresponds now to the figure below.Then it creates two
FluidData
instances and store them (using theadd_data()
function) as RDF in a local triplestore as well. The instances are represented as an individual with a rdf:JSON data property containing the instance data.Finally the triplestore is serialised to a turtle file.
test_dataset2_load.py loads the turtle file into a local triplestore and reconstruct the
FluidData
datamodel as well as the mappings using theget_dataset()
function.Using the
get_hash()
method, it is checked that the reconstruct theFluidData
datamodel is exactly equal to the original datamodel.Finally it loads the two instances using the
get_data()
function and check that they are exactly equal to the two original instances.Type of change
Checklist for the reviewer
This checklist should be used as a help for the reviewer.