-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demonstration of a SKOS-backed semi-open vocabulary for MIME types #363
Comments
In response to Competency Question 3, "Which content types carry Answers to that question should return the various ...
{
"@type": "observable:ContentDataFacet",
...
"observable:mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
...
}
... This would be discoverable via taxonomic knowledge that that MIME type is a narrower concept than |
This patch adds initial support for SKOS taxonomies as a hierarchy-enabled alternative to UCO's current string-vocabularies practice. While the typical SKOS namespace is prefixed, the import of SKOS is done by referencing an OWL-DL compatible subset of SKOS as a version IRI import. (See SKOS Reference C.3.) Paul Brandt identified the SKOS strategy used in this patch, though his initial draft was in another repository. I ported the property to UCO's ontology repository due to needing to satisfy references for SHACL. This patch is a reduction from the first definition of `core:TaxonomicConcept` drafted for UCO CP-99. References: * [OC-140] (CP-99) UCO should provide a SKOS taxonomy of device types for observable:deviceType * #363 * https://www.w3.org/TR/skos-reference/#namespace-documents Section C.3, "SKOS RDF Schema - OWL 1 DL Sub-set (informative)" Co-authored-by: Paul Brandt <paul@brandt.name> Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Dublin Core's `dcterms:FileFormat` class is an existing class that represents file formats, including but not limited to IANA Media Types. (The reference to IANA Media Types is found on the `dcterms:format` property, where the IANA list is recommended, but not required.) `dcterms:FileFormat` is defined as an `rdfs:Class`. In order for UCO, as an OWL ontology, to use this class as a range of a property (especially `observable:mimeType`), it needs to be designated as an `owl:Class`. Note there are some things this patch does not do: * This patch does not import Dublin Core. It appears Dublin Core's model is incompatible with OWL 2 DL. For instance, the property `dcterms:format` appears to violate the OWL 2 DL separation of datatype properties (range of literals) from object properties (range of objects), on review of its sub-property `dcterms:extent` having a range of literals. * This patch does not extend `FileFormat`'s superclasses to also be OWL Classes. * This patch does not adapt the `dcterms:format` property noted above, because in addition to its OWL 2 DL compatibility issues, its semantic scope extends beyond data formats. Compatibility notes: Extending `dcterms:FileFormat` to be both an OWL Class and RDFS Class is compatible with OWL 2 DL. Reviewing this record: https://www.w3.org/TR/2012/REC-owl2-mapping-to-rdf-20121211/#Parsing_of_the_Ontology_Header_and_Declarations Table 5 Row 2 An ontology that also designates `dcterms:FileFormat` as an `rdfs:Class` would have the pair of `owl:Class` and `rdfs:Class` statements reduced to `owl:Class` only as part of its OWL parse. If an ontology has chosen to import Dublin Core as a whole, it has already implicitly made a commitment to use OWL 2 FULL instead of OWL 2 DL. References: * #363 * https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#FileFormat * https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#format Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
For compatibility with other graph data models that use IANA Media Types, the required range is set to `dcterms:FileFormat`. References: * #363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
This patch adds a class hierarchy to distinguish between known IANA Media Types and Media Types known to not be registered with IANA. A unit test is added to demonstrate how mimeType being objects can also enable hierarchical searches, even between IANA and non-IANA types. A follow-on patch will generate validation result files. References: * #363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * #363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
…feature A follow-on patch will generate a Make-managed file. References: * #363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * #363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
One part of the Solution of this proposal has been posted, in PR 377. The earlier commits document the rationale for some class selection and extensions. This PR includes demonstration in unit tests of how MIME taxons can be used while looking close to string-literal properties (see The MIME demonstrations in that PR also include demonstration of a hierarchy between Media Types, including some "aliasing" relationships. I cannot yet share the repository that generates the IANA Media Registry, but here are a few examples of taxons included in its generated monolithic resource: <http://purl.org/NET/mediatypes/application/gzip>
a
dcterms:FileFormat ,
skos:Concept
;
dcam:memberOf dcterms:IMT ;
skos:exactMatch <https://taxonomy.unifiedcyberontology.org/uco/mime/0.0.1/application/gzip> ;
.
<https://taxonomy.unifiedcyberontology.org/uco/mime/0.0.1/application/gzip>
a
prov:Entity ,
uco-types:IANAMediaType
;
rdfs:isDefinedBy <https://www.iana.org/assignments/media-types/application/gzip> ;
skos:exactMatch <http://purl.org/NET/mediatypes/application/gzip> ;
skos:inScheme uco-mime:MIMEScheme ;
skos:notation "application/gzip" ;
.
<https://taxonomy.unifiedcyberontology.org/uco/mime/0.0.1/application/tar+gzip>
a uco-types:NonIANAMediaType ;
rdfs:comment "The media type suffix +gzip is registered in RFC 8460, Section 6.3."@en ;
rdfs:seeAlso <https://www.rfc-editor.org/rfc/rfc8460.html#section-6.3> ;
skos:broader
<https://taxonomy.unifiedcyberontology.org/uco/mime/0.0.1/application/gzip> ,
<https://taxonomy.unifiedcyberontology.org/uco/mime/0.0.1/application/tar>
;
skos:notation "application/tar+gzip" ;
. I selected those excerpts because:
There is also a At the end of the day, the (pending-release) taxonomy will permit discovery of all XML files in a knowledge base by including this in a SPARQL query: ?nMimeType
skos:exactMatch* /
skos:broaderTransitive* /
prov:alternateOf* /
skos:exactMatch* /
skos:notation "text/xml" ; |
As an aside, it was surprisingly complex to determine how to work with forward slashes in prefixed concepts between Turtle, JSON-LD, and SPARQL. Each syntax handles forward-slash differently. Further details for those curious are in this bug report and its attached PR. Fortunately, usage of |
A follow-on commit will regenerate Make-managed files. References: * ucoProject/UCO#357 * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#357 * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
The objects will temporarily use the drafting namespace while the taxonomy is being implemented. This review also caught a typo of MIME types - JPEGs should be `image/jpeg`, not `image/jpg`. A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
The objects will temporarily use the drafting namespace while the taxonomy is being implemented. A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist What needs to be done before submitting this to solutions review? |
The taxonomy repository has been posted here, with some notations that it is currently experimental and needs committee review: https://github.com/ucoProject/UCO-Taxonomy-MIME We should now have sufficient demonstration available to show whether these changes to |
This continues to be an invalid change proposal under the CDO charter foundational principles as discussed at length for the issue proposing changing controlled vocabularies to use semantic individuals rather than strings. When the issue of the invalidity of the proposal for changing vocabs to semantic individuals was discussed the ball was placed in the proposers court to suggest an approach that allowed the rigor of semantic individuals while allowing the required flexibility of strings. I have seen no such suggestion and this specific change proposal does not offer such a suggestion but rather is simply a single subcase of the broader proposal that was rejected. |
Re: Rejection of broader proposal I do not believe the proposal was rejected, but rather withheld due to lack of technical knowledge. We have significantly more technical knowledge now. Re:
Only one person has declared those principles foundational, and other similar-strength language, and we have already seen those principles (I suspect the act one you're referencing) be used to attempt to move UCO away from a technology that is essential for any interoperability with other RDF-based technologies, which would be a significant act of UCO's isolation if enacted. The entire UCO design document is still under committee review. Re: incompleteness I agree on lag being a possibility. The taxonomy does create a need for an operational tempo. In support of a regular deployment tempo, maintenance of the taxonomy for adding is almost entirely scriptable. Re:
You missed some details where the semi-open vocabulary enforcement pattern of gently (
A firmer Re: Keeping I have not seen arguments against the benefits of this proposal that strings make more difficult.
I still believe UCO's current pattern pattern of semi-open vocabularies will hit a technical insurmountability with "Extension" vocabularies that are sets of strings that somehow use a non-public namespace. I admit the implementation in this MIME proposal does not yet get all the way to using the pattern I think the other properties will need. Other properties, especially If this proposal is not accepted today, I will adjust it. The MIME taxonomy does by itself serve a need in CASE-Corpora (DCAT uses |
This is absolutely NOT one person declaring those principles foundational. As discussed before, I fully recognize the benefits fo having a formally defined taxonomy (some of which you identify above) and would love to have one. However, we cannot require users of UCO to refer to values directly as members of such a formal taxonomy for all the reasons discussed. It is very impractical for the large majority of UCO target users to have ontological knowledge and more specifically knowledge of how UCO uses ontology to define taxonomy individuals simply to express values outside what UCO has defined, especially for values such as MIMEType and kindOfRelationship where any UCO taxonomy or any other is guaranteed to be incomplete. |
This was not voted on today for a few reasons. Not necessarily all of them are listed here:
A further note on the current uncontrolled nature of Looking forward, I still believe IRI-identified graph individuals will be a necessity as a full replacement for the current UCO semi-open vocabulary design. I think the resolution, using classes developed for this proposal, would involve something like this query as part of a SELECT $this
WHERE {
FILTER NOT EXISTS {
?nTaxon
a/rdfs:subClassOf* types:MIMEFormat ;
skos:notation $this ;
.
}
} I think that is a correct way, but using a I think this will need to be exercised further before returning for committee review. Likely, it will be tried in CASE-Corpora as that graph grows, particularly with an eye towards "govdocs1". |
This reverts commit f4a0111, reversing changes made to 296497e. Reverting from `unstable` branch due to scheduling for post-1.0.0. References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#363 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Background
In UCO, many types are identified by a string as opposed to a thing, i.e., a IRI-backed node in a graph. The advantage of the former is that a new type-by-string is easy to create when the particular type is missing from the ontology. The disadvantages are:
The advantage of the latter, i.e., type-by-IRI, are the opposite of the former type-by-string's disadvantages. At the same time, type-by-IRI has a disadvantage of its own, being that an upgrade of the available types with a new one requires knowledge of RDF(s), OWL, and/or the data model or ontology that define the other individuals being supplemented.
The purpose of this issue is to lay the foundation that is necessary to gain data/experience with users adding a type-by-IRI in UCO.
Objective / Purpose
The purpose of this issue is to lay the foundation that is necessary to gain data/experience with issues that users might run into when adding a type-by-IRI in UCO.
Requirements
Requirement 1
UCO shall have access to a SKOS-vocabulary that specifies individuals to represent each and every mime-type as defined by the IANA Media Types registry, in order to use these individuals to specify the type of a medium registered in UCO.
Requirement 2
The resulting taxonomy shall align with the standard two-tier scheme as defined by the IANA Media Type Registry:
Requirement 3
The SKOS-vocabulary shall be serialised in Turtle.
Requirement 4
Loosely-coupled: Any modification to the SKOS-vocabulary shall not imply a change to the UCO-ontology.
Requirement 5
Manageability: Any modification to the IANA Media Type Registry shall effect an update to the SKOS-vocabulary, preferrably mechanically.
Requirement 6
Continuity & maintainability: Any modification to the SKOS-vocabulary shall result in a new version.
Requirement 7
Provenance: Any Media (Content) Type or Subtype added to the SKOS-vocabulary that originates from the UCO or CASE community, shall be categorised as such. This implies that for any Media (Content) Type / Media Subtype pair that exists, its provenance is maintained.
Risk / Benefit analysis
Benefits
The benefit of the stated objective is that data about, and experience from, users adding type-by-iri to the vocabulary become available. It is then possible to investigate how to improve the user acceptance and minimise their technical knowledge required for adding a new type in this way.
The benefit sof having a vocabulary about IANA Media Types available, are:
http://purl.org/dc/terms/MediaType
in their graph design.Risks
Except in relation to the semi-openess of the vocabulary, the submitter is unaware of risks associated with this change.
Consequences
The intention of theis CR is that the type-by-string design will be replaced by a type-by-IRI design. The consequences that are foreseen, are (not necessarily comprehensively) as follows:
observable:mimeType
to become anowl:ObjectProperty
.Competencies demonstrated
Competency 1
Competency 2
<substring>
(string as ordered characters) somewhere in their name, abbreviation or description? No constraints apply for the amount of characters used in the substring; the search is agnostic for diacritical characters, i.e., ana
in the substring findsā
,ă
,ä
and similar characters.Competency 3
zip
?Competency 4
As security service provider, I want to reference
application/tar
, and I don't care whether it is a IANA media type or not. I've always said application/tar, it's been coded like that in my product for a decade, and my customers know I mean 'tape archive' when I say that.Competency 5
uco-something:IANAMediaType
|uco-something:NonIANAMediaType
]?uco-something:IANAMediaType
|uco-something:NonIANAMediaType
]Competency 6
uco-something:IANAMediaType
oruco-something:NonIANAMediaType
?uco-something:IANAMediaType
oruco-something:NonIANAMediaType
, according to specification.Solution suggestion
The taxonomy converts the IANA Media Types registry into SKOS under a UCO namespace, following a mostly two-tier skos:ConceptScheme:
Note that some extension media types not part of IANA are defined for various reasons, and may or may not be submitted in the future for standardization to IANA. These extensions follow the non-registration practice of [RFC 6838, Section 3.4], and all include the string [/x-uco-].
This repository's primary product is a monolithic ontology and taxonomy file, serialized in Turtle, mime.ttl.
(This repository is undergoing NIST review for release. If you are interested in providing early feedback, please contact @ajnelson-nist .)
UCO could subclass
dcterms:MediaType
with a new classuco-types:IANAMediaType
, and a siblinguco-types:NonIANAMediaType
in order to support Requirement 7 and Competency 4.Coordination
develop
The text was updated successfully, but these errors were encountered: