Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DID, DID-URL, identification (DID view <-> SW view) #183

Closed
iherman opened this issue Feb 3, 2020 · 21 comments
Closed

DID, DID-URL, identification (DID view <-> SW view) #183

iherman opened this issue Feb 3, 2020 · 21 comments
Assignees
Labels
editorial Editors should update the spec then close pending close Issue will be closed shortly if no objections

Comments

@iherman
Copy link
Member

iherman commented Feb 3, 2020

The issue of DID vs. DID-URL vs. identification came up during the F2F meeting, and the subsequent discussion made me realize that there may difference between what the DID document says and what the Semantic Web says.

My understanding is as follows (and my understanding may be flawed):

  • a DID (e.g., did:example:abcdefg) "identifies" the subject (which is specified as part of the DID information a.k.a. DID document)
  • a DID-URL, e.g., did:example:abcdefg/path1/path2/path3 identifies the same subject as its "DID part", i.e., did:example:abcdefg.

Put it another way, the 'non-did' part of the DID-URL (i.e., the path, fragment, query, or param) does not affect the action of "identification" itself.

Compare it to the way the Semantic Web considers things:

  • a (HTTP) URL, identifies a resource (there is no formal assignment to a subject)
  • URLs of the form http://www.example.org and http://www.example.org/path1/path2/path3 are strictly different, and they are considered to identify different resources. (It may happen that these two are aliases and dereference to the same content, but that is besides the point.)

This means that for example, if in a Semantic Web sets of statements (e.g., in a Verifiable Claim...) I use the following two URIs:

  • did:example:abcdefg
  • did:example:abcdefg/path1/path2/path3

then the two views will clash: the resources that are identified from the Semantic Web point of view are different, whereas they are considered to be identical from the point of view of the DID model.

We may be o.k. and comfortable with this. But even if we are, we may want to make it clear in the document somewhere imho, because it may be source of confusion.

@iherman iherman added the editorial Editors should update the spec then close label Feb 3, 2020
@peacekeeper
Copy link
Contributor

peacekeeper commented Feb 3, 2020

@herman

Put it another way, the 'non-did' part of the DID-URL (i.e., the path, fragment, query, or param) does not affect the action of "identification" itself.

I disagree. What part of the spec is it that gives this impression? I would argue that your examples did:example:abcdefg and did:example:abcdefg/path1/path2/path3 are also considered different resources, just like it has always been in the Semantic Web model. I don't see why DIDs would change that.

I would argue that in the Semantic Web, two different URIs could potentially identify the same resource, e.g. https://markus.name/#me and https://danubetech.com/markus. The fact that they identify the same resource could be indicated with an owl:sameAs statement, or somehow implied, etc. You don't know whether they identify the same or different resources just from the identifier alone.

In the same way, different DID URLs may or may not identify the same resource. You would have to find out by reading the appropriate DID method spec, or matrix parameter specification, or some application documentation in order to know that.

So I don't really see how there are two views?

@iherman
Copy link
Member Author

iherman commented Feb 3, 2020

@peacekeeper,

I disagree. What part of the spec is it that gives this impression? I would argue that your examples did:example:abcdefg and did:example:abcdefg/path1/path2/path3 are also considered different resources.

The spec says (as @talltree pointed at during the meeting):

DID subject
The entity the DID document is about. That is, the entity identified by the DID [...]
DID URL
A DID plus an optional DID path, optional ? character followed by a DID query, and optional # character followed by a DID fragment.

Take the example did:example:abcdefg and did:example:abcdefg/path1/path2/path3. The latter is a "DID plus an optional DID path", i.e., the two URI-s share the DID, therefore they share the subject. Do we want to say that the generic (and loose) term "(Web) Resource" that the URI-s identify and the "subject" that the DID identifies formally in the DID doc are different notions? If so, that does mean that if I look at these as URI-s used on the Semantic Web then they refer to two different Web resources, but they identify the same abstract "thing" (the subject). That is what I said and what is not intuitive for a Semantic Web person (and deserves being spelled out in the document).

I would argue that in the Semantic Web, two different URIs could potentially identify the same resource, e.g. https://markus.name/#me and https://danubetech.com/markus. The fact that they identify the same resource could be indicated with an owl:sameAs statement, or somehow implied, etc.

That is almost correct, but one had to be cautious. On the Semantic Web those two URI-s are, by default, distinct and identify two different resources. An external entity (user) could record a statement (axiom) whereby:

<https://markus.name/#me> owl:sameAs <https://danubetech.com/markus>

but that must be stated explicitly as an additional fact in a, say, knowledge graph. An OWL reasoner could then make various deductions but, for example, a pure RDFS reasoner could not. For a pure RDFS reasoner, those two resources remain distinct.

@dlongley
Copy link
Contributor

dlongley commented Feb 3, 2020

@iherman,

Take the example did:example:abcdefg and did:example:abcdefg/path1/path2/path3. The latter is a "DID plus an optional DID path", i.e., the two URI-s share the DID, therefore they share the subject. Do we want to say that the generic (and loose) term "(Web) Resource" that the URI-s identify and the "subject" that the DID identifies formally in the DID doc are different notions? If so, that does mean that if I look at these as URI-s used on the Semantic Web then they refer to two different Web resources, but they identify the same abstract "thing" (the subject). That is what I said and what is not intuitive for a Semantic Web person (and deserves being spelled out in the document).

did:example:abcdefg and did:example:abcdefg/path1/path2/path3 identify, or refer to, different "resources". That the second URL contains did:example:abcdefg in it is important to the resolution process, in the same way that https://example.com is important to the resolution process for https://example.com/path1/path2/path3, yet these two example.com URLs likewise refer to different "resources".

If you were to make two Verifiable Credentials, the first with a credentialSubject identified by did:example:abcdefg and the second with a credentialSubject identified by did:example:abcdefg/path1/path2/path3, they would NOT refer to the same thing.

There is no difference with DID URLs and HTTPS URLs in this respect; Semantic Web software should treat them the same way. We should get any sort of language in the DID core spec that might lead one to believe there is a difference resolved.

Note: Historically, "DIDs" were at least partially designed to be a more portable/decentralized/self-sovereign form of a "WebID" and a "DID Document" was somewhat of an analog to a WebID's associated "Profile Document". All of these technologies are intended to play nicely together and with the broader (including non-Sem Web via JSON-LD tech) community.

@peacekeeper
Copy link
Contributor

Take the example did:example:abcdefg and did:example:abcdefg/path1/path2/path3. The latter is a "DID plus an optional DID path", i.e., the two URI-s share the DID, therefore they share the subject.

I don't see the problem. In your example did:example:abcdefg and did:example:abcdefg/path1/path2/path3, the DID subject is in both cases did:example:abcdefg. But the identified resource is different, since the URIs are different.

Two different HTTP URIs also identify different resources, even if they share the same domain name.

Do we want to say that the generic (and loose) term "(Web) Resource" that the URI-s identify and the "subject" that the DID identifies formally in the DID doc are different notions?

Yes the "DID subject" and the "resource identified by a DID URI" are different notions.

@iherman
Copy link
Member Author

iherman commented Feb 3, 2020

@dlongley said:

If you were to make two Verifiable Credentials, the first with a credentialSubject identified by did:example:abcdefg and the second with a credentialSubject identified by did:example:abcdefg/path1/path2/path3, they would NOT refer to the same thing.

and @peacekeeper said:

In your example did:example:abcdefg and did:example:abcdefg/path1/path2/path3, the DID subject is in both cases did:example:abcdefg. But the identified resource is different since the URIs are different.

I must admit these two statements still sound a bit contradictory in my mind. Of course, if I start with what @dlongley said then there is indeed no problem whatsoever. But the statement of @peacekeeper does not make this crystal clear. Actually, he also said:

...
Yes the "DID subject" and the "resource identified by a DID URI" are different notions.

So how do these two notions differ? Can we clearly put this into the spec?

@dlongley
Copy link
Contributor

dlongley commented Feb 3, 2020

@iherman,

When @peacekeeper said this:

In your example did:example:abcdefg and did:example:abcdefg/path1/path2/path3, the DID subject is in both cases did:example:abcdefg. But the identified resource is different since the URIs are different.

You could sub in HTTPS URLs it would be similar to saying this:

"In your example https://example.com and https://example.com/path1/path2/path3, the host is in both cases example.com. But the identified resource is different since the URIs are different."

Yes the "DID subject" and the "resource identified by a DID URI" are different notions.
So how do these two notions differ? Can we clearly put this into the spec?

The DID subject is a component of a DID URL, much like a host is a component of an HTTPS URL. Note that you may refer directly to the DID subject itself using a DID URL that contains only the DID subject. Adding anything else to the DID URL would refer, necessarily, to something different.

The "host" part in an HTTPS URL is relevant to the resolution process in the same way the "DID subject" is: an HTTPS URL is resolved relative to the host and a DID URL is resolved relative to the DID subject. But, in both cases, different URLs refer to different things.

@iherman
Copy link
Member Author

iherman commented Feb 3, 2020

The DID subject is a component of a DID URL, much like a host is a component of an HTTPS URL.

yeah, after I put in my previous comment, I figured out something like that. And that is fine. But the spec is not clear about this (which was my original issue in the first place!). It says:

DID subject
The entity the DID document is about. That is, the entity identified by the DID and described by the DID document.

No mention of that 'component' thing. It also includes statements like "interacting with the DID subject", "Communicating with the DID subject", "authentication of the DID subject", etc., which suggests it is some sort of an external entity (yes, much like a host).

But, in §5.1, which formally defines the DID syntax, there is no mention of the DID subject being a component of a DID URL. In §6.2 the DID subject is defined as follows: "The DID subject is denoted with the id property". Well, the id property (at least in JSON-LD) can have any URI as a value, i.e., per specification, https://example.com/path1/path2/path3 can be used as a value for id, too. This means that, at least per specification, the DID Subject is not a component, it is the whole URI.

I believe we may be along the same lines, saying that the subject is analogous to an HTTPS host component, but the spec needs, imho, improvements to make these things clear.

@iherman
Copy link
Member Author

iherman commented Feb 17, 2020

(I did not want to pursue this discussion while the Grand Compromise PR was open…)

@dlongley, @peacekeeper, @talltree, I try to collect my thoughts below, based on the discussion in this thread. I hope my summary is, technically, sound. (My problems stem from a possible presentation that I may have to do on “DID for dummies”…).

Strictly speaking, the DID Core specification defines two, a bit orthogonal notions:

  1. DID: is a special type of URI, akin, in its structure, to URNs, and is used to “identify” a subject.
    • This term is not directly defined in the terminology section. There is an indirect definition via the DID Subject: “…the entity identified by the DID and described by the DID document.” (That statement is also present in the introduction).
    • The General Syntax section has a separate did term as part of the ABNF, and there is a (by nature non-normative!) note that equates the general term of a DID with the ABNF. (This MUST be made normative!)
  2. DID URL: is a URL, insofar as it is used to locate a (Web) Resource. This resource is not (necessarily) the subject; it is, for example, a specific part of the corresponding DID document.
    • In contrast to the definition of a DID in the terminology section the corresponding entry for a DID URL is purely syntactic: “DID URL A DID plus an optional DID path, optional ? character followed by a DID query, and optional # character followed by a DID fragment.”. It is completely silent on its functional role; I believe it should specify it there.
    • The very same note referred to above defines the usage of a DID URL. Again, that is non-normative; this MUST be normative.

I see these two notions as, albeit connected to one another, completely different. One is an “abstract” URN-like URI, the other is a URL, ie, a locator. Their role, their usage, etc., are different. I believe the confusion comes from the fact that they are, sort of, smushed in the document. This leads to an error in the text in the abstract which says “DIDs are URLs”. Well no, they are not. It leads to, sort of, hidden definitions (which messed me up): for example, the title in [§5.1 Generic DID syntax](https://w3c.github.io/did-core/ is a misnomer: that section defines the DID as well as the DID URLs (in fact, most of the ABNF is on DID URLs…). In general, this duality is not make explicit enough.

Here is what I would propose to do: Section 5 should be restructured to make the difference much clearer. The structure could be something like:

  • 5.Identifiers
    • 5.1 DID
      • 5.1.1 Generic DID syntax (which would only include the did ABNF)
      • 5.1.2 Method-specific syntax
      • 5.1.3 Normalization
      • 5.1.4 Persistence
    • 5.2. DID URL
      • 5.2.1 Generic DID URL syntax (which would include the rest of the ABNF)
      • 5.2.2 Generic DID parameters
      • 5.2.3 Path
      • 5.2.4 Query
      • 5.2.5 Fragment
    • Relationship between DIDs and DID URLs (some further clarification, if needed)

Some other, more random thoughts and issues on the same subject:

  • §5.7 Fragments says:

    • Implementers are strongly discouraged from using a DID fragment for anything other than a method-independent reference into the DID document to identify a component of a DID document (for example, a unique public key description or service endpoint).

      The specification must be much more normative than that, because we define a completely new URL. It should say SHOULD NOT instead of “strongly discouraged”. I actually wonder whether it shouldn't be MUST NOT.

  • I would not shy away from some bike shedding on whether the term “DID URL” is indeed the right term. I know URL has a clear meaning in IETF land but, alas!, it has lost this clear meaning on Web land: by now the standard reference to URLs in W3C specifications is the WhatWG URL Living Standard. (Let us not go into a discussion whether this is a good thing or a bad thing. We should take it as a fact of life.) The URL Living Standard defines parsing rules that, if one looks at it more closely, are in fact parsing rules for URIs and not (only) URLs. I have not checked the latest versions of the WhatWG based libraries (say, in node.js), but I would expect (I would hope!) that they would parse DID URLs as well as DIDs properly. But, if so, the term “URL” has become ambiguous and we have still the opportunity to stay clear from a possible confusion with the terms that may bite us later.

  • §7.2 DID Subject says that the value of id MUST be a single valid DID. In light of the definitions that is the way it should be. However, when getting to JSON-LD land, where id is aliased to @id, the situation becomes a bit complex: in JSON-LD/RDF land @id is the (RDF) identifier of the (RDF) subject which, per the RDF Concepts and Abstract Syntax spec is a URI (more exactly, to make things even more complex, an IRI…). I.e., it would be a perfectly fine, in JSON-LD, to use a DID URL as a value of id in a DID document. However, I believe that this would not make sense from a DID point of view. I can see two possibilities:

    1. The DID spec makes it very explicit that, in a did-ld-json document, the value of id is restricted, compared to the general JSON-LD term (the same holds for the value of controller).
    2. We use a separate subject key, disjoint from the id (i.e., @id) term. This has the extra complication that each DID Document object becomes identified by a blank node:-(

    I believe (1) above is o.k., but must be made very explicit in the spec (currently, this issue is not mentioned).

My apologies if this note has become a bit too long…

(@msporny, I am happy to create a PR for the spec along these lines at some point if you prefer, but I wanted to get some general agreement first. Also, my understanding may still be wrong…)

@iherman
Copy link
Member Author

iherman commented Feb 18, 2020

The URL Living Standard defines parsing rules that, if one looks at it more closely, are in fact parsing rules for URIs and not (only) URLs. I have not checked the latest versions of the WhatWG based libraries (say, in node.js), but I would expect (I would hope!) that they would parse DID URLs as well as DIDs properly.

Well... the discrepancy is bigger than I thought. There is an online viewer for whatwg-url at https://jsdom.github.io/whatwg-url/. Unfortunately, though DIDs and DID URLs are parsed, the result is not exactly what we would expect. See, for example

  • DID example: the protocl (did) is recognized, but the method plus the method-specific identifier is considered as a 'path'.
  • DID with path, query and fragment: query and fragment properly recognized, path is merged with the method-specific identifier.

Looking at this reinforces my feeling that we should not call this a “DID URL”, ie, we should keep away from the “URL” term. The change can be as simple as call it a “DID Locator” instead.

@talltree
Copy link
Contributor

talltree commented Mar 1, 2020

@iherman I apologize for not seeing/responding to this earlier. Let me summarize my feedback on both of your previous comments.

First, +1 to your first overall suggestion that in the DID Core spec, we should reorganize Section 5 on Identifiers into the subsections you suggest. I think that would an excellent clarification.

Secondly, I applaud your deep dive into of the applicability of the term "URL". I think most of us have been working off the general W3C definition of URLs being the subset of URIs that locate resources on the Web. But your analysis of the WHAT-WG URL parsing reminded me to revisit the path component rules in the RFC 3986 ABNF. Here's an excerpt of the relevant rules:

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   hier-part     = "//" authority path-abempty
                 / path-absolute
                 / path-rootless
                 / path-empty

   path          = path-abempty    ; begins with "/" or is empty
                 / path-absolute   ; begins with "/" but not "//"
                 / path-noscheme   ; begins with a non-colon segment
                 / path-rootless   ; begins with a segment
                 / path-empty      ; zero characters

   path-abempty  = *( "/" segment )
   path-absolute = "/" [ segment-nz *( "/" segment ) ]
   path-noscheme = segment-nz-nc *( "/" segment )
   path-rootless = segment-nz *( "/" segment )
   path-empty    = 0<pchar>

Indeed, our ABNF for DIDs uses the path-rootless rule, which means that everything after the scheme name is considered part of the path until it either terminates or ends in a query or fragment separator.

This is in contrast to "conventional" URLs that have an authority component followed by two forward slashes. With those "rooted" URIs, the next forward slash always indicates the start of a path. But not so with "rootless" URIs.

So if I understand it, the case you are making is that the term "URL" is so deeply associated with URIs that use a rooted authority component that it will be misleading for us to use the term for "DID URLs" because they do not use a rooted authority component.

And thus that a term like "DID Locator" is more accurate.

I believe you are right. How do others feel?

@iherman
Copy link
Member Author

iherman commented Mar 2, 2020

So if I understand it, the case you are making is that the term "URL" is so deeply associated with URIs that use a rooted authority component that it will be misleading for us to use the term for "DID URLs" because they do not use a rooted authority component.

I did not realize this difference, thanks for following this up, @talltree. But this is perfectly in line with the whatwg compliant library operation (see #183 (comment)).

@peacekeeper
Copy link
Contributor

@iherman

I think you are right that the terms DID and DID URL can be more clearly explained and separated, and I think your new structures for Section 5 makes a lot of sense.

I'm not familiar with the WhatWG work, and I don't really understand why the use of the term URL is a problem. I thought any URI that can be dereferenced to a representation of a resource is a URL? Should we add a double-slash // to introduce a proper authority component that is composed of the method name and method-specific ID?

I can see two possibilities:

i. The DID spec makes it very explicit that, in a did-ld-json document, the value of id is restricted, compared to the general JSON-LD term (the same holds for the value of controller).
ii. We use a separate subject key

I'm pretty sure it's the first, shouldn't be a problem to define the JSON-LD DID document in this way I think?

@iherman
Copy link
Member Author

iherman commented Mar 2, 2020

I'm not familiar with the WhatWG work, and I don't really understand why the use of the term URL is a problem. I thought any URI that can be dereferenced to a representation of a resource is a URL? Should we add a double-slash // to introduce a proper authority component that is composed of the method name and method-specific ID?

The term URL is a problem because the WhatWG has made a decision to, essentially, dump the term 'URI' and use the term 'URL' only. They then created a new document which is used these days as a reference for HTML related documents, API-s, and implementations.

We may decide to ignore this, stick to our guns, and only care about the original, IETF specifications. My personal advise is not to do that: if at some point we would like to see the DID served by browsers just as many of them can handle, say, mailto or ftp, we want to avoid a source of confusion. And this is one of those. For us not to use the term DID URL but, as I proposed, DID Locator is a totally harmless, easy thing to do which does not cost us a penny...

I'm pretty sure it's the first, shouldn't be a problem to define the JSON-LD DID document in this way I think?

Well, we would then define some sort of a 'profile' of JSON-LD insofar as we restrict the possible value of id. Which actually we already do: I presume it would not be appropriate, in a DID document, to use an http url as the value of id (i.e., designating the subject). We would have to make this clear in the document in words, and also as part of the JSON schema that, I presume, we will also have...

@msporny
Copy link
Member

msporny commented Mar 2, 2020

I presume it would not be appropriate, in a DID document, to use an http url as the value of id (i.e., designating the subject).

That's true, but you may use URLs elsewhere in the DID Document, like in service descriptions... and, fundamentally, this is linked data and we should be able to point elsewhere on the Web by using HTTPS URLs. The early versions of the DID spec didn't restrict id to being a DID only, seems like that has changed... in the early days, we wanted to support https URLs and integrate w/ the web. Now that there is a did:web method, maybe we don't need to do that?

In any case, just highlighting that this isn't as simple of a decision as it may seem at first.

@iherman
Copy link
Member Author

iherman commented Mar 2, 2020

@msporny I understand and, of course, I was not suggesting imposing a restriction whereby all identifiers should be did-s. Indeed, that would be crazy (e.g., due to the service parameters). Actually, a different example is:

"id": "did:example:123456789abcdefghi#keys-2"

when DID URL (as opposed to a DID) is used to identify the key, and that should be perfectly fine.

The only thing I was proposing is that the value of the term controller, as well as the value of the id when used as an id for the DID Document (i.e., when designating the subject) must be restricted to a DID. I do not think that could be expressed in a JSON-LD context; this is something an accompanying (alas! non-normative) JSON-Schema could do.

@peacekeeper
Copy link
Contributor

@dmitrizagidulin may have an opinion on this isse; during a recent DID Resolution call we discussed whether multiple DID documents could exist under a single DID, e.g. did:web:example.com/user1, did:web:example.com/user2, etc.

Personally I don't think this is a good idea; the whole point of DIDs is that each DID subject has its own DID. But I thought I'd mention it since it came up recently.

@dmitrizagidulin
Copy link

@iherman

I presume it would not be appropriate, in a DID document, to use an http url as the value of id (i.e., designating the subject).

As @peacekeeper said above, this is an area of active discussion for the did:web DID method.

Specifically, it is appropriate (for a did:web document) to have an http URL as the value of the id/subject. The active discussion is whether or not a URL with a path is allowed as the subject. The current consensus is leaning towards 'no'. (Which incidentally means that if one wants to have different DIDs at different paths of a single domain, those paths would need to be encoded into the authority part of the DID.

@iherman
Copy link
Member Author

iherman commented Mar 3, 2020

I must admit the possibility of having an HTTP URL as an identifier for a subject sounds strange to me. If there is such thing as a did:web than this should be used for a subject identifier. But I was not part of the discussions so far, so it is difficult to me to take a final decision.

Administratively: this issue has yielded several issues, which is entirely my fault (#183 (comment)). The core of this issue was the did vs. did url, which has now a separate PR (#212). I would think that this issue would deserve its own issue in this group. Would it be possible to create a separate issue, summarizing the DID Resolution call discussion?

@peacekeeper
Copy link
Contributor

Would it be possible to create a separate issue, summarizing the DID Resolution call discussion?

I agree this makes sense. @dmitrizagidulin do you want to do that (since you're closest to it), or do you want me to try?

@iherman
Copy link
Member Author

iherman commented Mar 6, 2020

The PR-s #212 and #214 took care of most of this issue. I have created two separate issues (#217 and #218) for the two further issues that was touched upon in this issue, and has not been solved (and, therefore, were not handled by #212 and #214). I propose therefore to close this current issue.

@msporny @burnburn @brentzundel

@brentzundel
Copy link
Member

Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial Editors should update the spec then close pending close Issue will be closed shortly if no objections
Projects
None yet
Development

No branches or pull requests

7 participants