This document is a high-level description of the ontology used in this Scala-based project.
The chosen and organically developed approach to this project is, first and foremost, a domain-specific one. This means that instead of developing the ontology first, starting from scratch, I have represented the data in code, using Scala as the strongly and statically typed programming language choice. More (technical) details about this technical aspect are found in the README.md.
Another implication of the domain-specific approach, is that it's model-driven. The first steps were to model the entities by abstracting the initial set of data (CSV and XML files, which are details). This view, approach and methodology fits well with the established discipline of software engineering. In such a view, the concepts and technologies of the Semantic Web and Linked Data world are technical details. This is formulated in and expressed by the wording "semantic technology".
The alternative approach of domain-agnostic and/or data-centric application development was also considered. The purpose of this document and my implementation of the project is not to disregard this possibility as a bad or lesser choice. Both (and other) approaches are fitting and worth considering, depending on several factors and considerations. As always in the software world, everything has trade-offs and a cultural and skill-based embedding. One of the reasons for choosing such a model-driven approach, was simply to experiment with the idea and implementation.
After modelling the concepts of the project in code, I went on to developing the ontology. The rest of the document is centered around this: the ontology and its development.
The model, which was based on and abstracted from the data files, can be found in the entities folder. The Scala code should be quite readable and understandable. A (arguably less understandable) visual representation in a UML diagram looks like this:
(Hint: https://raw.githubusercontent.com/edufuga/LinkedDataScala/main/LinkedDataUML.svg looks nicer when viewing)
This UML diagram was generated with scala-uml
. Other alternative tools exist.
The UML diagram is mentioned here only as a possible visualization of the model. I still think the code is nicer, but
I'm obviously biased.
One thing to notice is that the code is not only strongly and statically typed (by using Scala), but also uses very explicit types. This is also known as "tiny types": Instead of using primitive (in a conceptual sense) types such as Strings, Integers, etc., we declare the types of the fields (attributes, members, properties) very explicitly with an own type.
This is fine in the code, but whether the ontology should be so explicit can and should be posed as well. My current approach is to make the ontology conceptually closer to the data, by using primitive types such as strings and numbers (integer, double, float, etc.). This may or may not change, once the ontology is used as a part of the semantic technology driving the code implementation. It's also open for discussion.
The ontology is written using the Web Ontology Language (OWL). More specifically, it was developed using the desktop application Protégé, an ontology editor. The result was serialized using Protégé's own OWL API to a format such as RDF/Turtle or OWL/XML. Both files are found in the repository:
- OWL2 XML Syntax: linked_data_ontology_rdf.owl
- RDF/Turtle: linked_data_ontology_rdf.ttl
These files can be opened with Protégé to visualize, discuss, review and evolve the ontology.
The ontology can be visualized inside Protégé after installing the OWLViz plugin and enabling it (Window > Tabs > OWLViz). The outcome looks like the following screenshot:
Notice the German names due to the localization with rdfs:label
annotation. These differ from the (English) IRI.
Strangely enough, the alphabetically first language (Catalan) is not shown in the GUI of Protégé.
Object properties such as hasDepartment
and hasEmployee
are not functional-properties. This is done in order to
represent a (conceptual) list of entities. Specifically, the Organisation
entity contains a list of Department
s.
This is one of the instances of the clear difference between the semantic world and the world of programming languages.
More concretely, while there are containers and collections in RDF (such as a list), there aren't any in OWL.
Independently of whether we want to use lists in our OWL model, this is still a limitation in the modelling strength of OWL. Arguably, a list is more a syntactic than a semantic detail or concept, but this should be a decision taken by the modeller. There are concepts and abstractions that do benefit from the explicit grouping and/or ordering, as well as whether the contents are homogeneous or not.
This was documented in a much shorter fashion in comments such as the following:
The usage of this non-functional (i.e. not limited to a single value in the range, thus enabling an indirect list representation) property is shown in the Manchester syntax used by Protégé in
In the previous screenshot, under Organisation
, the only
is the Manchester syntax for the allValuesFrom
property
restriction. Compare this with
Organisation.scala, where the
list is declared explicitly using the corresponding type.
Most properties (i.e. functions, field access) are represented as data properties. In other words: The range of the
property is essentially either a xsd:string
or a xsd:nonNegativeInteger
. This is conceptually closer to the raw data
than the representation in code (which uses, as previously stated, the tiny types idea of wrapping primitive types in
explicitly declared and semantically meaningful types).
The pattern of declaring (data or object) properties as standalone functions on one side, and "using" them in the form of property restrictions inside an anonymous subclass definition, is a recurring one. This is in contrast to the much simpler (but tighter, more coupled) modelling of data structures in programming languages. Conceptually, the result is the same: Representing a field, property (in programming language parlance), or member of a class/type.
The fact that there is an explicit distinction in the (naming) of "object" and "data properties" seems also strange from a programming perspective. Why is there an explicit distinction, while the range already tells exactly what a property is meant (to be used) for? That seems either redundant, or perhaps I'm still missing an important fact or idea.
This section contains a compact selection of some of the resources and references which were useful in developing the ontology and gaining (more) knowledge about ontologies. Furthermore, the techniques and methodologies for ontology engineering also needed to be considered.
Concretely, the topic of how to match the domain of software engineering with the semantic technology was (and still is) a recurring theme behind this project and my approach to it.
- https://protege.stanford.edu/
- http://protegeproject.github.io/protege/getting-started/
- http://protegeproject.github.io/protege/views/class-description/
- http://protegeproject.github.io/protege/class-expression-syntax/
- https://protegewiki.stanford.edu/wiki/ProtegeOWL_API_Programmers_Guide
- https://www.dbis.informatik.uni-goettingen.de/Teaching/SWPr-SS19/owlapi.pdf
- https://github.com/protegeproject/owlviz
- https://www.w3.org/TR/2004/REC-owl-guide-20040210/
- https://www.w3.org/TR/2012/REC-owl2-quick-reference-20121211/
- https://www.w3.org/2007/OWL/refcardA4
OpenHPI course on Knowledge Graphs
- Week 4: Ontologies as Key to Knowledge Representation
- Week 5: Ontological Engineering for Smarter Knowledge Graphs