Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authoritative Contained Resource Data #352

Merged
merged 24 commits into from
Dec 15, 2021

Conversation

csarven
Copy link
Member

@csarven csarven commented Nov 17, 2021

Issues concerning:


The concept #authoritative-information describes an RDF constraint that's analogous to "authoritative metadata" as per https://www.w3.org/2001/tag/doc/mime-respect - the sender's HTTP message is considered to be authoritative.

Server receiving the message can either respect sender's intentions or responds with redirect or error message. For example, in #server-protect-authoritative-resource-data , when a client requests to update a container's description including authoritative data about the contained resources, the server will reject.

The concept #authoritative-resource-data specialises #authoritative-information in that it is used in context of resource descriptions. #authoritative-contained-resource-data is one application of authoritative information in the container description.

@kjetilk kjetilk requested a review from a team November 17, 2021 16:32
@kjetilk kjetilk added this to the Release 0.9 milestone Nov 17, 2021
@kjetilk kjetilk linked an issue Nov 17, 2021 that may be closed by this pull request
Copy link
Contributor

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aligns in spirit with what was decided.

protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
Copy link
Member

@kjetilk kjetilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just going on about this: dct:modified and stat:mtime isn't defined as the relevant times for the resource, as detailed in the comments. It can't be. It has to be defined in other terms.

protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
@csarven csarven mentioned this pull request Nov 25, 2021
@NoelDeMartin
Copy link

I haven't read all the comments in this thread, so apologies if this has already been addressed. But I just have a couple of comments/questions.

Reading this part is confusing for me:

Servers MUST include resource metadata about contained resources as part of the container description, unless that information is inapplicable to the server.

It says "MUST", but then it says "unless that information is inapplicable to the server". Wouldn't that be an implementation detail of the server, and thus something irrelevant in the protocol description? Also, as an app developer reading this I could be mislead to think that all servers provide the data, because it says MUST. But as I understand it, that's not true, depending on some implementation details of the server, some containers won't return this data.

Keeping that in mind, why not just use MAY in that sentence?

Also, I'm not sure why there are two values indicating what seems to be the same information: dcterms:modified and stat:mtime. Is it just for compatibility with current implementations? Or is there an important difference between the two?

As an app developer, it was confusing to know which one to use, so I think I just picked one at random. Now it mentions that dcterms:modified has to match the Last-Modified header, so I suppose that's the "cannonical" one?

@csarven
Copy link
Member Author

csarven commented Dec 5, 2021

@NoelDeMartin Thank you. The discussion on "MUST, unless" happened here: #352 (comment) . "unless" has to do with inapplicability. We may change to "SHOULD" but this is to be worked out in #343 or next release of the spec. Unfortunately MAY is not strong enough for a requirement to enable the feature.

stat:mtime may be deprecated in a later version of the spec ( #352 (comment) ). Indeed rely on dcterms:modified for the foreseeable future.

RubenVerborgh added a commit to CommunitySolidServer/Recipes that referenced this pull request Dec 5, 2021
Copy link
Contributor

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, except for the wrong RFC6570 template, to which I propose a correction.

protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Outdated Show resolved Hide resolved
protocol.html Show resolved Hide resolved
@RubenVerborgh
Copy link
Contributor

Implementer's note: the Community Solid Server supports the text of this PR via configuration: https://github.com/solid/community-server-recipes/blob/feat/spec-352/metadata/config-metadata.json.
Live version for testing at https://drive.verborgh.org/.

csarven and others added 2 commits December 5, 2021 21:48
Co-authored-by: Ruben Verborgh <ruben@verborgh.org>
Co-authored-by: Ruben Verborgh <ruben@verborgh.org>
Copy link
Member

@kjetilk kjetilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have bad feelings about this, as we are now saying that it is OK for a server to not track the modification of representations of a resource and give the modification times of just a part of it. It seems likely to me that this will cause many problems for caching in the future, and that servers will serve stale data, and that it will result in difficulty for decentralization efforts.

Nevertheless, I acknowledge that it is now an internally consistent structure in line with NSS behavior and that RFC7232 is very liberal on this point, as it says:

The last-modified time would usually be the most recent time that any of those parts were changed.

and, what this does is to say "usually, yes, but not for containers in Solid".

There ought to be a 👃 reaction

<dt about="#contained-resource-metadata-rdf-type" id="contained-resource-metadata-rdf-type" property="skos:prefLabel"><code>rdf:type</code></dt>
<dd about="#contained-resource-metadata-rdf-type" property="skos:definition">A class whose URI is the expansion of the <em>URI Template</em> [<cite><a class="bibref" href="#bib-rfc6570">RFC6570</a></cite>] <code>http://www.w3.org/ns/iana/media-types/{+iana-media-type}#Resource</code>, where <code>iana-media-type</code> corresponds to a value from the IANA Media Types [<cite><a class="bibref" href="#bib-iana-media-types">IANA-MEDIA-TYPES</a></cite>].</dd>
<dt about="#contained-resource-metadata-stat-size" id="contained-resource-metadata-stat-size" property="skos:prefLabel"><code>stat:size</code></dt>
<dd about="#contained-resource-metadata-stat-size" property="skos:definition">A non-negative integer giving the size of the resource in bytes.</dd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This item in particular should be discussed in the security considerations section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you consider stat:size (I assume that's what your comment is referring to - GitHub UI trips me up sometimes) to more of a concern than the others?

Had this text earlier in the Note #contained-resource-metadata-considerations :

Servers are encouraged to consider omitting authoritative data about a contained resource when an agent is unauthorized to read the contained resource.

Removed it as per Kjetil's suggestion: #352 (comment)

Can revive and put it under #security-considerations but note that it may be equivalent / already captured by:

Servers are strongly discouraged from exposing information beyond the minimum amount necessary to enable a feature.


If you'd like more specific considerations that should be mentioned in there, we can do that as well. Have something in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided that a server can choose to omit this data (under the "inapplicability" clause), this is fine as-is.

Knowing the size of a file that one may not otherwise have access to read, can be extremely useful data when looking to exploit weaknesses in a server. I would have serious reservations about including that information on a Storage that holds sensitive personal data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that since we do know that we cannot have a server check the authz of every child resource, then is inappropriate to leave it as an implementation detail, we'd need to specify a concrete mechanism. I think it would be better for this to be a SHOULD, as per RFC2119, but for now, it seems that it could be left out under the inapplicability clause.

<dt about="#contained-resource-metadata-stat-size" id="contained-resource-metadata-stat-size" property="skos:prefLabel"><code>stat:size</code></dt>
<dd about="#contained-resource-metadata-stat-size" property="skos:definition">A non-negative integer giving the size of the resource in bytes.</dd>
<dt about="#contained-resource-metadata-dcterms-modified" id="contained-resource-metadata-dcterms-modified" property="skos:prefLabel"><code>dcterms:modified</code></dt>
<dd about="#contained-resource-metadata-dcterms-modified" property="skos:definition">The date and time when the resource was last modified.</dd>
Copy link
Member

@acoburn acoburn Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that a server provides a Last-Modified header and assuming that we do not want to break how caching works for browsers, I do not see how this (dcterms:modified or stat:mtime) can be implemented without causing a cascade of updates to the root of a Storage.

For example, consider a resource at /lvl-1/lvl-2/foo.ttl with the following response headers:

Last-Modified: Sun, 05 Dec 2021 17:45:02 GMT
ETag:  "abcd"

The container at /lvl-1/lvl-2/ has the following headers:

Last-Modified: Tue, 07 Dec 2021 12:23:15 GMT
ETag: "1234"

That container would include the following triples:

</lvl-1/lvl-2/> 
    dcterms:modified "2021-12-07T12:23:15Z"^^xsd:dateTime ;
    ldp:contains </lvl-1/lvl-2/foo.ttl> .
</lvl-1/lvl-2/foo.ttl>
    dcterms:modified "2021-12-05T17:45:02Z"^^xsd:dateTime .

The container at /lvl-1/ has the following headers:

Last-Modified: Tue, 07 Dec 2021 19:37:22 GMT
ETag: "zyxw"

And the following body:

</lvl-1/> 
    dcterms:modified "2021-12-07T19:37:22Z"^^xsd:dateTime ;
    ldp:contains </lvl-1/lvl-2/> .
</lvl-1/lvl-2/>
    dcterms:modified "2021-12-07T12:23:15Z"^^xsd:dateTime .

Now consider that a client adds a triple to /lvl-1/lvl-2/foo.ttl. The result will lead to the Last-Modified being updated on /lvl-1/lvl-2/foo.ttl. The content of that resource has changed, so the ETag header also changes.

Because the Last-Modified header changes, the representation of /lvl-1/lvl-2/ now also changes to reflect the new status of the contained resource. As a consequence, and in order to ensure that clients receive the latest version of /lvl-1/lvl-2/, the Last-Modified and ETag headers of this container resource must change.

Because the Last-Modified header for /lvl-1/lvl-2/ changes, the representation of /lvl-1/ also needs to change, which leads to a further cascade of changes to the root of the storage.

There are three ways around this, as I see it:

  • Do not include the dcterms:modified or stat:mtime triples (i.e. ignore this part of the spec)
  • Don't worry about breaking how browser caching works (i.e. ignoring app/client needs)
  • Put this data in a separate (auxiliary) resource that doesn't lead to a cascade of change.

I prefer the third option because it makes it possible to include this data without breaking how browsers interact with HTTP resources, but absent that approach, the only route forward I see (while staying roughly in line with this spec) is to simply not include the modified time of contained resources (again, arguing "inapplicability").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @acoburn , I have been arguing the same, an auxiliary resource solves all problems, it can represent the entirety of the resource (and it makes sense to add data for each representation too if needed), avoids cascading problems, can be subject to separate authorization, can be used to track any changes and should be at least as easy to implement.

However, we haven't been able to find consensus around that for 0.9, so indeed, the result is that these data may be stale when conditional requests are used. I disagree with the conclusion, but as the RFC7232 is fairly vague at this point, I have chosen to accept it for 0.9, sincerely hoping we can revisit for 1.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Last-Modified header changes, the representation of /lvl-1/lvl-2/ now also changes to reflect the new status of the contained resource.

While the representation data of /lvl-1/lvl-2/ changes because of the dcterms:modified value of /lvl-1/lvl-2/foo.ttl , the Last-Modified header of /lvl-1/lvl-2/ need not change - nor prohibited - as there were no changes to the containment triples ( #server-container-last-modified ).

A recommendation (or advisement) along the following lines may be necessary:

When the resource metadata of an existing contained resource changes, the server MUST send a weak entity-tag when responding to container’s request URI.


I want to mention again that the requirements introduced by this PR does not expect a container (/lvl-1/lvl-2/) to include resource metadata about itself (</lvl-1/lvl-2/> dcterms:modified "2021-12-07T12:23:15Z"^^xsd:dateTime .). (I don't care if that's not the point - it easily introduces complications/assumptions to the discussion that we are better off without.) And again, if resource metadata about the container itself is desired - I may have missed the discussion that calls for it - the specification should needs to say so because both the generation and protection of those resource metadata needs to apply. On a related note:

When a server plans to include resource metadata about the container in the response of the same container, the server should determine the Last-Modified header value as a regular change to the representation data. (It can send a strong entity-tag as usual.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does mean that apps/users will see stale data. From an impl perspective (given the choice), I would rather not provide the data than provide it in a way that confuses users.

I am really trying to be constructive here, but I don't see how this is implementable, given this text

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands, the Protocol uses the same requirement levels as in the RFC for Last-Modified and ETag headers. Servers might not include either Last-Modified or ETag headers, and so clients can't absolutely rely on them for being there.

That aside,

https://datatracker.ietf.org/doc/html/rfc7232#section-2.4 and https://datatracker.ietf.org/doc/html/rfc7232#section-6 suggests that headers with entity-tags will have higher precedence.

Server needs to enable the use of conditional requests so that client can eventually get a 200/304/412 or whatever.

Perhaps I'm missing something here but as mentioned in previous comment, I don't see a staleness issue around entity-tags.

If a client is only making decisions with Last-Modified/If-*-Since (when Last-Modified is provided by the server), then the representation metadata in the response will not help them to differentiate between a change to containment triples or any change to the representation. In this particular scenario, client has to make a GET request on the container.

We can certainly revisit that in 1.0.

For now, I suggest to add the following and go ahead with 0.9:

When the resource metadata of an existing contained resource changes, the server MUST send a weak entity-tag when responding to container’s request URI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the entity tag changes from x to y, there is no issue, but that is orthogonal to the weakness indicator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Then in what case would it not change from x to y?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time the representation changes, the ETag changes. There is no argument about that. I am only responding to the proposed requirement to use weak ETags. And to be clear, I am not opposed to using weak ETags, but we can't claim that weak ETags will solve this issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weak entity-tag is suggested because strong entity-tag wouldn't be correct when only the resource metadata changes. All meanwhile allowing the client to make conditional requests. Right at this second, I don't have a strong opinion on adding that requirement because I do think it comes directly from the RFC but it may be reasonable to mention.

That aside, I acknowledge that there is (always) room to discuss/work this out further but it doesn't need to be a blocker for 0.9. (We mark handful of stuff to be revisited for 1.0, and I don't see why it can't be done here as well.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@justinwb justinwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ following discussion and agreements in 12/15 editor meeting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Specify existing practice for Container data about contents
7 participants