Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation alsoKnownAs <uri> #21

Closed
azaroth42 opened this issue Feb 18, 2015 · 31 comments
Closed

Annotation alsoKnownAs <uri> #21

azaroth42 opened this issue Feb 18, 2015 · 31 comments
Assignees
Labels

Comments

@azaroth42
Copy link
Collaborator

Requirement:
In order to deduplicate annotations across multiple systems, it would be useful to know where the annotations were originally harvested from. Also, if a client assigns an internal URI (such as a UUID) to the annotation, recording this in the model would be valuable so the client can later re-discover the annotation.

Discussion:
In the CG, the model included oa:equivalentTo. This seems a much broader issue than just ours -- is there a better relationship that we can make use of? iana:via? prov:derivedFrom?

@paolociccarese
Copy link
Contributor

Prov-o property http://www.w3.org/TR/prov-o/#wasDerivedFrom
"A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity."

@azaroth42 azaroth42 self-assigned this Jul 10, 2015
@azaroth42
Copy link
Collaborator Author

👍 to prov:wasDerivedFrom as per @paolociccarese, but do we need it explicitly in the model?

@azaroth42 azaroth42 added the tpac label Oct 21, 2015
@akuckartz
Copy link

👍 for using PROV-O

@azaroth42
Copy link
Collaborator Author

Timbl: Could use rel=canonical

@azaroth42
Copy link
Collaborator Author

User focused rationale: (Shevski)

  • Don't want to see duplicates in search results
  • Want to have a unique identifier / canonical URI to ensure that replies can be merged across copies

@azaroth42
Copy link
Collaborator Author

Proposal: Client SHOULD assign a unique identifier to the annotation, and might consider using a UUID.
[timbl, iherman,rswick,etc]

@rtroncy
Copy link

rtroncy commented Oct 27, 2015

+1 !

@tilgovi
Copy link
Contributor

tilgovi commented Oct 27, 2015

👍

@tilgovi
Copy link
Contributor

tilgovi commented Oct 27, 2015

If the annotation specifies its own ID in the form a URI all of this is solved. Using JSON-LD as an example, you could dereference http://example.com/anno.json and see "@id": "http://canonical.example.com/p/foobar123.json" and that's totally okay.

If the annotation is being re-published because some system wants to add information to it, then it's maybe not owl:sameAs anymore. It's a different resource, with possibly new content, prov:derivedFrom makes sense. But so does rel=canonical.

Anyway, is this a data model issue or a protocol issue? It's tagged as "model" here, but I'm not sure it really is an issue with the model.

@azaroth42
Copy link
Collaborator Author

Proposal post-TPAC:

  • Use rel=via instead of rel=canonical in the model. This would be iana:via (json-ld key: via) on the Annotation, with the object being another URI for the Annotation.

Rationale: rel=canonical is too strong. If the @id is the deferenceable HTTP URI, and the referenced URI is just a UUID, then the canonical one is the HTTP URI already. Instead we want to say that we got this resource via this other resource. We could ALSO assert canonical if necessary, but that would require quite some coordination amongst systems. To me (personally, non editor, non chair, yadda yadda) that seems like something to add once we have deployment experience.

Rationale: As we've seen with other PROV terms, the implications and unexpected/unintended side effects are many and varied. For a system intended to be simple and "webby", reusing existing terms from web-friendly ontologies such as the IANA link relations, and originally defined in Atom for just this same purpose, seems preferable.

@iherman
Copy link
Member

iherman commented Nov 3, 2015

On 4 Nov 2015, at 01:43, Rob Sanderson notifications@github.com wrote:

Proposal post-TPAC:

Use rel=via instead of rel=canonical in the model. This would be iana:via (json-ld key: via) on the Annotation, with the object being another URI for the Annotation.
Rationale: rel=canonical is too strong. If the @id https://github.com/id is the deferenceable HTTP URI, and the referenced URI is just a UUID, then the canonical one is the HTTP URI already. Instead we want to say that we got this resource via this other resource. We could ALSO assert canonical if necessary, but that would require quite some coordination amongst systems. To me (personally, non editor, non chair, yadda yadda) that seems like something to add once we have deployment experience.

Right. We can try to formulate, formally, a feature as 'at risk' when going to CR, based on implementation experience, or something like that.

@iherman
Copy link
Member

iherman commented Nov 7, 2015

Unfortunately, using iana:via may lead to problems. See the (huge!) thread in mnot/I-D#39 started by Erik Wilde which is still not fully resolved; the essence of it is "link relations for RDF?". (The discussion degenerated into how rel relations are registered, what the HTML5 does, why is the IANA registration broken, etc, etc, etc. Do we want to go there?)

Call me chicken, but I do not believe we should be part of the discussion if we do not really really need it. Let us try to live without it (alas!, I would say).

@BigBlueHat
Copy link
Member

Related to this issue and the discussion happening on #96 (and of course the comparison happening on #102), ActivityStreams has a url property which works similarly to iana:via--and probably more obviously so in terms of what one would expect inside that package. In AS2 it can be xsd:anyUri or an as:Link. Personal preference would be fur just xsd:anyUri + the ability for that to be an array (which is inherent in JSON-LD afaik).

@azaroth42
Copy link
Collaborator Author

Regarding the IANA discussion linked, it's mostly Tantek and Elf arguing and adding off-topic noise preventing Mark and Eric from actually making progress. The discussion that is contentious is the HTML5 list, which we don't need to refer to, and as a wiki page I doubt we could normatively do so anyway. Also note that the issue is closed... so there must have been some resolution :)

Thus I stand by my proposal for iana:via, as especially it also gets us first, next, prev and last relationships which we'll need if AS doesn't pan out.

@iherman
Copy link
Member

iherman commented Nov 30, 2015

Well... Yes, the discussion linked has gone into a discussion between Tantek and Elf, but the original issue, as raised by Erik, is still open. What would be the full URI of the property? Because https://www.iana.org/assignments/link-relations/via does not de-reference, which is a big no-no for linked data... We could use https://www.iana.org/assignments/link-relations/link-relations.xhtml#via which does de-reference (but it does so ignoring #via), but it is really not a proper URI for an RDF property. What URI do you have in mind? Is there a URI that ensures any kind of interoperability with other usages of link relations in RDF?

I think it is a really sad state of affairs, and the issue raised by Erik is right on the spot. But, at this moment, it is still not resolved.

@azaroth42
Copy link
Collaborator Author

Given that we won't need to import iana:next/prev/first/last, a slight amendment to the proposal:

  • Create a new relationship oa:via, with the same definition as iana:via. If it gets sorted out, we can swap it before Last Call with no change to anything testable.

@BigBlueHat
Copy link
Member

Works for me.

@iherman
Copy link
Member

iherman commented Dec 4, 2015

Me too.

On 4 Dec 2015, at 04:56, BigBlueHat notifications@github.com wrote:

Works for me.

@shepazu
Copy link
Member

shepazu commented Dec 16, 2015

via is only appropriate if the annotation was first published to one service and shared from there. Some annotations are first "published" locally, on the user's system, and later shared elsewhere, and some annotations may be published in multiple services (e.g. multiple URIs) simultaneously. Wouldn't a UUID be a better way to model this?

@BigBlueHat
Copy link
Member

@shepazu they're different animals. 🐱 🐶

via has the purpose described here--a chain of locations this annotation has been. One of those items MAY be (though I don't think we're defining it in this specific issue) a UUID URN, but either way a UUID is not a better way to model "this"--but it may still be useful for a non-dereferncable, deduplicator thing (similar to a Message-ID in email.

@paolociccarese
Copy link
Contributor

I am not sure having multiple 'via' values is going to be a good idea as we will not be able to distinguish the original annotation from its copies.

@azaroth42
Copy link
Collaborator Author

To address the 'which of these URIs is canonical' question, I propose (quite simply) that we allow both via and canonical 😸

There may not be a canonical URI, for example if the client doesn't provide a URI at all and sends the annotation to multiple servers. It would be unwise for servers to assert a canonical URI without instruction from the client, as we could end up with many competing canonical URIs.

So the processing requirements would be:

  • If a server receives an Annotation with a URI in id from a client, it SHOULD put it in via and put its own URI for the annotation in id. (e.g. a push scenario for acquisition of the anno)
  • If a federating server discovers and harvests an Annotation with a URI in id, then it MUST put it in via and use its own URI for its copy. (e.g. a pull scenario for acquisition of the anno)
  • Servers MUST maintain any asserted canonical URI, and there MUST be at most one canonical URI asserted. It MAY also be in via.

@iherman
Copy link
Member

iherman commented Dec 17, 2015

@azaroth42, just to clarify

  • whichever process creates the original annotation may add a unique, canonical URI to the annotation, and that URI should not be changed by any other process, right?
  • via may contain several URI-s, thereby providing some sort of a bread crumbs? Or does via contain a single URI?

@akuckartz
Copy link

The title of this issue seems to be obsolete.

@BigBlueHat
Copy link
Member

👍 for canonical! I was wondering about the publish offline first scenario with regards to id and via.

I think it would look like:

Offline:

{
  "id": "urn:uuid:1234-567...",
  "target": "http://...."
}

Published online later:

{
  "id": "http://annotations.example/blah-blah",
  "canonical": "urn:uuid:1234-567...",
  "target": "http://...."
}

Aggregated elsewhere:
Published online later:

{
  "id": "http://other.example/blah-blah-again",
  "via": ["http://annotations.example/blah-blah"],
  "canonical": "urn:uuid:1234-567...",
  "target": "http://...."
}

Does that make sense?

Should the (offline) first example also re-state it's id in the canonical value--and if so, should that be a requirement? Thought being that if it were already there, then future systems MUST leave it alone and MUST move the value of id to the via breadcrumb/chain/thing.

Other than that question, I think this thing sings pretty sweetly now. 🎶 🐦

@azaroth42
Copy link
Collaborator Author

Is via ordered, or just a set of URIs?

@iherman
Copy link
Member

iherman commented Jan 13, 2016

@azaroth42 I guess an ordered list makes sense, it provides a breadcrumb. Let us go for this to close this:-)

@iherman
Copy link
Member

iherman commented Jan 13, 2016

Telco decision to go for ordered list, http://www.w3.org/2016/01/13-annotation-irc#T16-58-04

@tilgovi
Copy link
Contributor

tilgovi commented Jan 13, 2016

I don't see the motivation for addressing de-duplication anywhere. Can anyone summarize? I can understand why a system may wish to de-duplicate, but couldn't they do that based on the content itself?

@BigBlueHat
Copy link
Member

@tilgovi if I'm trying to syndicate / distribute an update to the canonical annotation back to all these peers / aggregators, then depending on content wouldn't work.

@azaroth42
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants