-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authoritative Contained Resource Data #352
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligns in spirit with what was decided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just going on about this: dct:modified
and stat:mtime
isn't defined as the relevant times for the resource, as detailed in the comments. It can't be. It has to be defined in other terms.
I haven't read all the comments in this thread, so apologies if this has already been addressed. But I just have a couple of comments/questions. Reading this part is confusing for me:
It says "MUST", but then it says "unless that information is inapplicable to the server". Wouldn't that be an implementation detail of the server, and thus something irrelevant in the protocol description? Also, as an app developer reading this I could be mislead to think that all servers provide the data, because it says MUST. But as I understand it, that's not true, depending on some implementation details of the server, some containers won't return this data. Keeping that in mind, why not just use MAY in that sentence? Also, I'm not sure why there are two values indicating what seems to be the same information: As an app developer, it was confusing to know which one to use, so I think I just picked one at random. Now it mentions that |
@NoelDeMartin Thank you. The discussion on "MUST, unless" happened here: #352 (comment) . "unless" has to do with inapplicability. We may change to "SHOULD" but this is to be worked out in #343 or next release of the spec. Unfortunately MAY is not strong enough for a requirement to enable the feature. stat:mtime may be deprecated in a later version of the spec ( #352 (comment) ). Indeed rely on dcterms:modified for the foreseeable future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, except for the wrong RFC6570 template, to which I propose a correction.
Implementer's note: the Community Solid Server supports the text of this PR via configuration: https://github.com/solid/community-server-recipes/blob/feat/spec-352/metadata/config-metadata.json. |
Co-authored-by: Ruben Verborgh <ruben@verborgh.org>
Co-authored-by: Ruben Verborgh <ruben@verborgh.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have bad feelings about this, as we are now saying that it is OK for a server to not track the modification of representations of a resource and give the modification times of just a part of it. It seems likely to me that this will cause many problems for caching in the future, and that servers will serve stale data, and that it will result in difficulty for decentralization efforts.
Nevertheless, I acknowledge that it is now an internally consistent structure in line with NSS behavior and that RFC7232 is very liberal on this point, as it says:
The last-modified time would usually be the most recent time that any of those parts were changed.
and, what this does is to say "usually, yes, but not for containers in Solid".
There ought to be a 👃 reaction
<dt about="#contained-resource-metadata-rdf-type" id="contained-resource-metadata-rdf-type" property="skos:prefLabel"><code>rdf:type</code></dt> | ||
<dd about="#contained-resource-metadata-rdf-type" property="skos:definition">A class whose URI is the expansion of the <em>URI Template</em> [<cite><a class="bibref" href="#bib-rfc6570">RFC6570</a></cite>] <code>http://www.w3.org/ns/iana/media-types/{+iana-media-type}#Resource</code>, where <code>iana-media-type</code> corresponds to a value from the IANA Media Types [<cite><a class="bibref" href="#bib-iana-media-types">IANA-MEDIA-TYPES</a></cite>].</dd> | ||
<dt about="#contained-resource-metadata-stat-size" id="contained-resource-metadata-stat-size" property="skos:prefLabel"><code>stat:size</code></dt> | ||
<dd about="#contained-resource-metadata-stat-size" property="skos:definition">A non-negative integer giving the size of the resource in bytes.</dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This item in particular should be discussed in the security considerations section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you consider stat:size (I assume that's what your comment is referring to - GitHub UI trips me up sometimes) to more of a concern than the others?
Had this text earlier in the Note #contained-resource-metadata-considerations :
Servers are encouraged to consider omitting authoritative data about a contained resource when an agent is unauthorized to read the contained resource.
Removed it as per Kjetil's suggestion: #352 (comment)
Can revive and put it under #security-considerations but note that it may be equivalent / already captured by:
Servers are strongly discouraged from exposing information beyond the minimum amount necessary to enable a feature.
If you'd like more specific considerations that should be mentioned in there, we can do that as well. Have something in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided that a server can choose to omit this data (under the "inapplicability" clause), this is fine as-is.
Knowing the size of a file that one may not otherwise have access to read, can be extremely useful data when looking to exploit weaknesses in a server. I would have serious reservations about including that information on a Storage that holds sensitive personal data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that since we do know that we cannot have a server check the authz of every child resource, then is inappropriate to leave it as an implementation detail, we'd need to specify a concrete mechanism. I think it would be better for this to be a SHOULD, as per RFC2119, but for now, it seems that it could be left out under the inapplicability clause.
<dt about="#contained-resource-metadata-stat-size" id="contained-resource-metadata-stat-size" property="skos:prefLabel"><code>stat:size</code></dt> | ||
<dd about="#contained-resource-metadata-stat-size" property="skos:definition">A non-negative integer giving the size of the resource in bytes.</dd> | ||
<dt about="#contained-resource-metadata-dcterms-modified" id="contained-resource-metadata-dcterms-modified" property="skos:prefLabel"><code>dcterms:modified</code></dt> | ||
<dd about="#contained-resource-metadata-dcterms-modified" property="skos:definition">The date and time when the resource was last modified.</dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that a server provides a Last-Modified
header and assuming that we do not want to break how caching works for browsers, I do not see how this (dcterms:modified
or stat:mtime
) can be implemented without causing a cascade of updates to the root of a Storage.
For example, consider a resource at /lvl-1/lvl-2/foo.ttl
with the following response headers:
Last-Modified: Sun, 05 Dec 2021 17:45:02 GMT
ETag: "abcd"
The container at /lvl-1/lvl-2/
has the following headers:
Last-Modified: Tue, 07 Dec 2021 12:23:15 GMT
ETag: "1234"
That container would include the following triples:
</lvl-1/lvl-2/>
dcterms:modified "2021-12-07T12:23:15Z"^^xsd:dateTime ;
ldp:contains </lvl-1/lvl-2/foo.ttl> .
</lvl-1/lvl-2/foo.ttl>
dcterms:modified "2021-12-05T17:45:02Z"^^xsd:dateTime .
The container at /lvl-1/
has the following headers:
Last-Modified: Tue, 07 Dec 2021 19:37:22 GMT
ETag: "zyxw"
And the following body:
</lvl-1/>
dcterms:modified "2021-12-07T19:37:22Z"^^xsd:dateTime ;
ldp:contains </lvl-1/lvl-2/> .
</lvl-1/lvl-2/>
dcterms:modified "2021-12-07T12:23:15Z"^^xsd:dateTime .
Now consider that a client adds a triple to /lvl-1/lvl-2/foo.ttl
. The result will lead to the Last-Modified
being updated on /lvl-1/lvl-2/foo.ttl
. The content of that resource has changed, so the ETag header also changes.
Because the Last-Modified
header changes, the representation of /lvl-1/lvl-2/
now also changes to reflect the new status of the contained resource. As a consequence, and in order to ensure that clients receive the latest version of /lvl-1/lvl-2/
, the Last-Modified
and ETag
headers of this container resource must change.
Because the Last-Modified
header for /lvl-1/lvl-2/
changes, the representation of /lvl-1/
also needs to change, which leads to a further cascade of changes to the root of the storage.
There are three ways around this, as I see it:
- Do not include the
dcterms:modified
orstat:mtime
triples (i.e. ignore this part of the spec) - Don't worry about breaking how browser caching works (i.e. ignoring app/client needs)
- Put this data in a separate (auxiliary) resource that doesn't lead to a cascade of change.
I prefer the third option because it makes it possible to include this data without breaking how browsers interact with HTTP resources, but absent that approach, the only route forward I see (while staying roughly in line with this spec) is to simply not include the modified time of contained resources (again, arguing "inapplicability").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @acoburn , I have been arguing the same, an auxiliary resource solves all problems, it can represent the entirety of the resource (and it makes sense to add data for each representation too if needed), avoids cascading problems, can be subject to separate authorization, can be used to track any changes and should be at least as easy to implement.
However, we haven't been able to find consensus around that for 0.9, so indeed, the result is that these data may be stale when conditional requests are used. I disagree with the conclusion, but as the RFC7232 is fairly vague at this point, I have chosen to accept it for 0.9, sincerely hoping we can revisit for 1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the Last-Modified header changes, the representation of
/lvl-1/lvl-2/
now also changes to reflect the new status of the contained resource.
While the representation data of /lvl-1/lvl-2/
changes because of the dcterms:modified
value of /lvl-1/lvl-2/foo.ttl
, the Last-Modified
header of /lvl-1/lvl-2/
need not change - nor prohibited - as there were no changes to the containment triples ( #server-container-last-modified ).
A recommendation (or advisement) along the following lines may be necessary:
When the resource metadata of an existing contained resource changes, the server MUST send a weak entity-tag
when responding to container’s request URI.
I want to mention again that the requirements introduced by this PR does not expect a container (/lvl-1/lvl-2/
) to include resource metadata about itself (</lvl-1/lvl-2/> dcterms:modified "2021-12-07T12:23:15Z"^^xsd:dateTime .
). (I don't care if that's not the point - it easily introduces complications/assumptions to the discussion that we are better off without.) And again, if resource metadata about the container itself is desired - I may have missed the discussion that calls for it - the specification should needs to say so because both the generation and protection of those resource metadata needs to apply. On a related note:
When a server plans to include resource metadata about the container in the response of the same container, the server should determine the Last-Modified
header value as a regular change to the representation data. (It can send a strong entity-tag
as usual.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does mean that apps/users will see stale data. From an impl perspective (given the choice), I would rather not provide the data than provide it in a way that confuses users.
I am really trying to be constructive here, but I don't see how this is implementable, given this text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands, the Protocol uses the same requirement levels as in the RFC for Last-Modified and ETag headers. Servers might not include either Last-Modified or ETag headers, and so clients can't absolutely rely on them for being there.
That aside,
https://datatracker.ietf.org/doc/html/rfc7232#section-2.4 and https://datatracker.ietf.org/doc/html/rfc7232#section-6 suggests that headers with entity-tags will have higher precedence.
Server needs to enable the use of conditional requests so that client can eventually get a 200/304/412 or whatever.
Perhaps I'm missing something here but as mentioned in previous comment, I don't see a staleness issue around entity-tags.
If a client is only making decisions with Last-Modified/If-*-Since (when Last-Modified is provided by the server), then the representation metadata in the response will not help them to differentiate between a change to containment triples or any change to the representation. In this particular scenario, client has to make a GET request on the container.
We can certainly revisit that in 1.0.
For now, I suggest to add the following and go ahead with 0.9:
When the resource metadata of an existing contained resource changes, the server MUST send a weak entity-tag when responding to container’s request URI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the entity tag changes from x to y, there is no issue, but that is orthogonal to the weakness indicator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. Then in what case would it not change from x to y?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any time the representation changes, the ETag changes. There is no argument about that. I am only responding to the proposed requirement to use weak ETags. And to be clear, I am not opposed to using weak ETags, but we can't claim that weak ETags will solve this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weak entity-tag is suggested because strong entity-tag wouldn't be correct when only the resource metadata changes. All meanwhile allowing the client to make conditional requests. Right at this second, I don't have a strong opinion on adding that requirement because I do think it comes directly from the RFC but it may be reasonable to mention.
That aside, I acknowledge that there is (always) room to discuss/work this out further but it doesn't need to be a blocker for 0.9. (We mark handful of stuff to be revisited for 1.0, and I don't see why it can't be done here as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ following discussion and agreements in 12/15 editor meeting
Issues concerning:
The concept #authoritative-information describes an RDF constraint that's analogous to "authoritative metadata" as per https://www.w3.org/2001/tag/doc/mime-respect - the sender's HTTP message is considered to be authoritative.
Server receiving the message can either respect sender's intentions or responds with redirect or error message. For example, in #server-protect-authoritative-resource-data , when a client requests to update a container's description including authoritative data about the contained resources, the server will reject.
The concept #authoritative-resource-data specialises #authoritative-information in that it is used in context of resource descriptions. #authoritative-contained-resource-data is one application of authoritative information in the container description.