-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IRI reuse and synonyms #17
Comments
Why would re-using existing IRIs be a problem? .. Wasn't the existence of different terms with identical meanings (such as name, title) - due to blindness about what vocabularies are out there - one of the major issues of the Semantic Web? It's a major hindrance to automated agents in interpreting distributed RDF data, and the integration of multiple RDF datasets. I suppose my question is: why confound even the simple, intuitive concept of vocabulary (re-)use? Perhaps I'm misunderstanding the main rationale behind this issue .. Why would it lead to "messy" RDF data, and why is this related to normalization? I understand that this leads to an up-front burden for the developer - but that problem could be solved in a different way (see e.g., BioPortal, which greatly facilitates the discovery of relevant ontologies). |
@darth-willy the problem is not the re-use of existing IRIs per se. The problem is that users are forced to refer to those IRIs verbatim (modulo namespace prefixes) whenever they are used. To illustrate, suppose I want to use a particular collection of concepts from several namespaces, and I want to tell my users what they are. Right now I have to explicitly list all of those URIs, such as:
And if you are debugging your RDF and looking at a term such as dc:author, is it on my approved list? Hmm, not easy to tell. You have to carefully examine the whole list. It would be much more user-friendly if I could bundle up this entire collection of URIs, from many namespaces, into a single coherent package and use a common prefix for that entire bundle of concepts, so that others could use it like this:
Then if you see dc:author in your data you will instantly know that it is not on my approved list, whereas if it shows up as :author then you know it is on the list. This is quite analogous to what can already be done in programming languages. Obviously a mechanism would have to be developed to support these name associations or renaming, and it would be nice if it could be done both on an individual basis, such as picking one specific URI from a ontology, and on a group basis, such as combining all of the URIs from both FOAF and PROV. (Conflicts would obviously have to be resolved also, if two sources use the same local names.) I think there are two basic approaches that could address this need. One would be to define a property that is used for renaming URIs
This would act somewhat like a one-way owl:sameAs : when the processor sees :title it treats it as dc:title. The other basic approach would be to define a higher-level binding syntax, roughly like what programming languages use for importing libraries. For example, in JavaScript you can pull in an externally defined object (that has sub-names) and associates it with your own local name:
I don't know what mechanism would be easiest. It would be nice to explore some ideas. |
Maybe I'm just misunderstanding.. But firstly, I don't know why one would have to explicitly list all URIs of re-used concepts up front, i.e., create an "approved list". Do you mean restricting authors regarding which predicates, types, .. should be used in data, e.g., which will be added to your repository (i.e., for consistency purposes)? This renaming of URIs with your own prefix merely seems to obfuscate their provenance, and, as you say, will require support for resolving (possibly, a chain of) re-named URIs. If you are referring to the effort of having to refer to multiple namespaces, this seems quite minimal; and, in fact, in line with the overall philosophy of utilizing a distinct namespace to group domain- or application-specific URIs. Wouldn't throwing concepts from different namespaces, i.e., from different domains, into a single, personal namespace break this design goal? |
If you have control over your RDF authors, then yes you could restrict your authors to the approved list of predicates and classes (for example). But even if you don't control your RDF authors, such as if you are integrating data from external sources, then often there will be a set of predicates and classes (for example) that you already know how to handle -- the approved list -- and when new ones show up from new data sources, you might expand the approved list. So it is useful both to be able to easily distinguish easily between terms that are on the approved list and terms that are not, and it is useful to be able to work with the approved terms using a single namespace. But the desire for easy URI renaming or synonyms goes beyond that also. It would also help when two or more URIs are discovered for the same entity, and this happens a lot. For simpler processing it would be easiest if you could simply declare that the URIs are synonyms, and indicate which URI is the preferred synonym, and then use only that URI within your application, instead of having to deal with all of them.
That is exactly one of the goals that this issue is intended to address. Right now if I want to reuse someone else's URI, I cannot use a distinct namespace to group it into my application-specific namespace. In other words, a URI cannot belong to more than one group, because it only has one namespace. It is only and forever it its original application-specific namespace group. This does not make sense when the goal is to reuse common URIs. Perhaps this means that we need to somehow separate the grouping mechanism (which currently is done with namespaces) from the unique identification mechanism (which is done with URIs). |
Ah, @dbooth-boston , I think I am beginning to get what you are after :-) Yes, we have struggled with this over the years. |
Yes, exactly. Those are some of the use cases and workarounds that would be addressed if we had better standard mechanisms to address this issue. |
Honestly, I don't think anything new is necessary for this functionality.
Define your own app terminology in your own namespace. If you just want to
copy a full vocabulary into your namespace, a few lines of curl + perl
suffice to automatically generate a ttl file containing your app
terminology. Assign every property its 'canonical' full URI. No need to
invent novel predicates, rdfs:subPropertyOf or skos:broader would already
work. If you have an issue with their semantics (as they can be regarded
as implying non-equivalence -- formally they don't), you can just create
such a property in your namespace.
Assuming we work with rdfs:subPropertyOf, and we define your app
terminology as superproperties of vocabulary-specific properties (or the
other way around, as you wish), your app terminology contains statements
such as
@Prefix : <..../myapp#>
skos:prefLabel rdfs:subPropertyOf :prefLabel.
Assuming that you want to retrieve the original URIs of all vocabulary
elements in your app terminology, then use
SELECT ?uri
WHERE {
?uri rdfs:subPropertyOf ?myuri.
FILTER(regex(str(?myuri),'.*/myapp#$'))
}
Filtering for namespace URIs is current practice for checking if an
alement is part of one particular vocabulary (*this* should be improved,
actually). A nicer alternative in this case would be to just create *one*
superproperty of all properties you want to use (say, :appProperty), and
then retrieve ?uri rdfs:subPropertyOf :appProperty.
Wrt. a few lines of curl + perl (+ grep): The following is sufficient to
"import" skos (or any other canonically formatted RDF/XML vocabulary):
#!/bin/bash
curl -l http://www.w3.org/2009/08/skos-reference/skos.rdf | \
perl -e '
my $localname="";
while(<>) {
if(m/.*xml:base=.*/) {
my $skos=$_;
$skos=~s/.*xml:base=\"([^\"]*)\".*\n/$1/;
print "PREFIX skos: <".$skos."#>\n";
}
if(m/^\s*<rdf:Description[^>]*rdf:about/) {
$localname=$_;
$localname=~s/.*rdf:about=\".*[#\/]([^#\/]+)\".*\n/$1/;
}
if(m/<rdf:type[^>]*Property\"/) {
if(! $localname eq "") {
print "skos:".$localname." rdfs:subPropertyOf :".$localname."
.\n";
$localname="";
}
}
};
'
Note that this is a proof of concept, but that I do *not* suggest to rely
on this particular code. Neither I would say that it wouldn't be nice to
have an application that creates such things automatically in a more
generic fashion. However, while such an application *should* be part of
the RDF ecosystem, there's no need to enrich the RDF *vocabulary* to get
this functionality.
Am .12.2018, 20:21 Uhr, schrieb David Booth <notifications@github.com>:
I think there are two basic approaches that could address this need. One
would be to define a property that is used for renaming URIs
dc:title rdf:prefUri :title .
Well, this can just be rdfs:subPropertyOf or skos:broader.
The other basic approach would be to define a higher-level binding
syntax, roughly like what programming languages use for importing
libraries. For >example, in JavaScript you can pull in an externally
defined object (that has sub-names) and associates it with your own
local name:
var React = require('react');
In fact, this would feel more natural to a programmer. A bit like import
in java. However, why not create an TTL/SPARQL/etc. preprocessor that uses
the trick above to turn this into conventional RDF or SPARQL? Instead of
changing the languages, this could be a standalone library, then. And in
case it is getting popular, one can think about integrating its component
into the general RDF vocabulary stack. (Rather than the other way around.)
In fact, I can see a lot of problems with the latter ("higher-level")
approach, as it leads people too import too greedily without proper name
clash resolution. (Or, if name clashes are automatically resolved, with
limited transparency to the end user how this is done.) Hence, having this
as a pre-processor rather than a language component allows us to inspect
and to verify the intermediate representation. But then, we're no longer
talking about revising RDF, but about a specific tool for RDF vocabulary
management.
|
@HughGlaser wrote:
This sounds like a vocabulary mapping problem where you need to map external vocabularies into the vocabulary your application logic is defined in. The mapping could be defined through rules, and may be context dependent, and may define preferences when the the input provides multiple choices. The data could be pushed through the rules in an eager processing model, or pulled in lazy processing model. The inability to find a mapping for a new data source could send a signal that developer attention is needed. To put that differently, some processes are fully automated whilst others bring humans into the loop for collaborative problem solving. |
@draggett wrote
Yes, this is pretty much what Fresnel provides. Having agreed there is a problem here, what do you think is the best way to make it easier for newbies (and others) to surmount or avoid it? |
This is one of the topics I proposed for the W3C Graph Data workshop in early March. What would really help is to gather some concrete use cases that we can forge ideas against. It relates to interest in higher level frameworks and easier rule languages. |
Ok, I think I'm discerning a few different, albeit related, topics here (feel free to correct): (1) Useful to be able to work with the approved terms using a single namespace. To me it seems that these are separate issues (although one solution could partially address multiple of them). The first issue seems related to ease of use, but, as mentioned, could obfuscate the provenance (and thus, meaning) of these terms, simply to avoid utilizing a few more prefixes.. Issues 2-4 seem very much related and pertain to data discovery. As noted by others, they could be (partially) resolved by introducing a built-in "synonym" predicate, possibly supported by a sameAs service. By checking whether a new term is a synonym of an approved term or not, one can meet issue 2. |
@darth-willy, yes that is a pretty good summary. You could think of these topics as being independent, but I think it's helpful to step back and take a broader view of them. What they have in common is the tension between the rigidity of a single global naming space versus the need to create RDF applications locally and independently. Again I'll make an analogy of RDF being like assembly language. At the assembly language level, there is only one global variable space. But as higher-level languages were developed, computer scientists realized that it is very beneficial to support local naming spaces, and provide mechanisms for mapping them into the underlying global naming space. By analogy, this has not yet been developed for RDF. I do not yet know what will be the best mechanisms to address these issues -- we still need more creative ideas -- but I think at least one element should include the ability to easily indicate a preferred synonym. sameAs services can be quite useful, but one cautionary note: in the end they can only suggest synonymous URIs. They cannot be authoritative for all applications. This is because different applications need to make different judgement calls about which URIs are synonymous enough for that application's purpose. Attempting to universally decide that two URIs are synonymous leads to deep and unsolvable philosophical questions about identity. |
@dbooth-boston I'm still a bit confused by your analogy to namespaces in programming languages. In particular, (a) how this relates to the prior notion of "re-packaging" terms into a local, application-specific namespace, and (b) that a namespace mechanism does not exist for RDF. It seems like the former would be similar to re-packaging a class like You make a good point when saying that synonyms can be application-specific. E.g., for some applications, |
I disagree very strongly.
True.
Yes - and that's why it is a worthless task. |
@dbooth-boston wrote:
@HughGlaser wrote:
Sorry, I should have been clearer. Certainly a sameAs service that a developer has chosen to use for a particular application can be authoritative for that application. What I meant was that general purpose sameAs services -- such as sameas.org -- cannot be authoritative for all applications: they can only suggest synonyms, because developers still need to make their own final choices of what they consider synonyms. |
@darth-willy wrote:
Indeed, and this is supported by JSON-LD contexts. It is also likely to be important when dealing with graph databases that don't support namespaces explicitly, and when we want to use RDF as an interchange framework between different graph databases. |
[Catching up on an earlier comment that I missed.] @chiarcos, yes, that is a work-around that could be used. But I think the need is general enough that we really should have better support for it in the RDF ecosystem. I don't think rdfs:subPropertyOf or skos:broader are ideal for this, because they are already used for completely different purposes that would be detrimental to conflate. I also agree that a pre-processor might turn out to be the best approach. But I think we still need more ideas on the table. |
The discussion has been mostly in terms of predicates, but should this apply also to instance URIs? |
@azaroth42 , yes, I think the need exists for instance URI synonyms also. |
I would like to differentiate (here and in some other issues) between changes (or not) of the fundamental RDF concepts and what serializations can offer. My reading of the thread that this is an issue for the latter and not the former. JSON-LD has, by and large, solved this issue through the Words of warnings, though:
In spite of the potential pitfalls, I think the fundamental approach of JSON-LD, yielding |
Interesting idea! i wonder how an |
In theory, RDF authors should reuse
existing IRIs, rather than minting their own. But this makes
for messy RDF and increases the up-front burden on developers.
Consider a typical RDF project that integrates data from
multiple sources, and needs to connect that data into its own
vocabulary. The resulting data involves both the normalized
vocabulary and the non-normalized source vocabularies,
intermixed. The developers might be happy to adopt existing
concepts like foaf:name (for a person's name) and dc:title (for
a document title) into the project's normalized vocabulary.
But by using those existing IRIs instead of minting their
own IRIs in their own namespace (such as myapp:name and
myapp:title), it becomes hard to distinguish IRIs of the normalized
vocabulary from IRIs of the non-normalized source vocabularies.
Ideally a project should be able to use its own preferred names
(and namespaces), like myapp:name and myapp:title, while still
tying those names to existing external IRIs, such as foaf:name
and dc:title.
owl:sameAs is not great for this. It is too heavyweight
for simple synonyms, and it is only for OWL individuals --
not classes. Furthermore, it provides no way to indicate
which IRI is locally preferred. It would be good to have a
simple standard way to rename IRIs or define IRI synonyms.
The text was updated successfully, but these errors were encountered: