Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etag propagation is not always synchronous #3988

Closed
individual-it opened this issue Jun 20, 2022 · 11 comments
Closed

etag propagation is not always synchronous #3988

individual-it opened this issue Jun 20, 2022 · 11 comments
Labels

Comments

@individual-it
Copy link
Member

in some cases etags are not propagated synchronous, e.g. tests like apiWebdavEtagPropagation1/moveFileFolder.feature:184 or apiWebdavEtagPropagation2/copyFileFolder.feature:188 fail randomly in CI

@butonic could you please define cases where the etag propagation is allowed to be async, so that we can adjust the tests accordingly

@butonic
Copy link
Member

butonic commented Jun 27, 2022

A proper implementation on the client and server side would use the Prefer: respond-async header as defined in https://datatracker.ietf.org/doc/html/rfc7240#section-4.1 AFAICT that is not the case and the tests currently correctly test synchronous etag propagation.

@dragotin @felix-schwarz IIRC the clients need the file id in the response of a PUT request, right? What about the etag?

We can add Prefer: respond-async, wait=5 to indicate that clients are willing to wait 5sec for a response. But these headers all only apply the the resource that was PUT.

I would argue that a Prefer: respond-async, wait=5 header would allow the server to respond synchronously with all the metadata that is available for the resource, eg fileid, mtime, size, etag ... while at the same time allowing async etag propagation up the tree. respond-async means that the client does not need any response, which is true if clients would generate uuid fileid they send along in a PUT request, TUS upload or MKCOL request.

@dragotin
Copy link
Contributor

As of now, the clients expect the ETag in response to the PUT (or the last action in the chunked uploads).
See https://github.com/owncloud/client/blob/master/src/libsync/propagateuploadng.cpp#L572 for the Desktop example, if the etag is empty, that results in an error.

We do not want to change that now.

That is also what we are aiming for with the asynchronous upload: The ETag is returned immediately with the last PUT, and the server takes care to set/change the ETag accordingly.

Note that theETag just needs to change, no further semantics, so it should be not so difficult for the server to even do that synchronous.

@phil-davis
Copy link
Contributor

Note: we generally see this issue in tests that do a MOVE or COPY or some upload-overwrite, the response to the request has an eTag for the particular resource. But also the eTAG of parent folders up to the root is expected to change. The delay (async behavior) is observed when the test scenario "quickly" does API requests that check the eTag of the parent, grand-parent etc folders.

So, IMO, the current implementation is OK - clients will get an appropriate eTAG in the response for the resource that they just uploaded, copied, moved... Other clients are polling higher folders to "notice" an eTag change, and then discover and sync the changes from the server to their local storage. Those clients might not notice the higher-level eTag change for a few seconds. (Typically in practice we see that the higher-level eTags are updated within 1 second anyway). For clients that are polling at around 30 second intervals anyway, 1 or 2 seconds delay in the root eTag changing is not really going to be significant.

@dragotin
Copy link
Contributor

Ah, seems I overlooked that it is about the ETag change of the directories above the uploaded item, not the item itself.

I agree, that sounds ok if that does not happen immediately.

@felix-schwarz
Copy link

@butonic At the moment, the iOS client uses neither oc-fileid nor oc-etag. It instead does a subsequent PROPFIND on the expected destination path to fetch all metadata on the file in one go.

So the expectation from the iOS client is that when the server response to the upload request has been received:

  • a subsequent PROPFIND (Depth 0) on the expected destination path should work and not return an error
  • a subsequent PROPFIND (Depth 1) on the parent folder should include the file in its response

The iOS client should not run into any issues if the Etag updates to all parent folders is happening with delay / asynchronously. The only effect of that should be that updates on other devices that include the new file are delayed.

Regarding returning metadata in the upload response, as discussed in the past, I'd love to see a mechanism for clients to specify which properties of an uploaded file should be returned in the response through special request and response HTTP headers. F.ex. by tunneling a WebDAV request/response via oc-dav-request and oc-dav-response headers.

@butonic
Copy link
Member

butonic commented Jun 27, 2022

With regard to the latter, could you point me to an issue? I'd like to propose using the Prefer header here as well, eg. Prefer: include-annotations="display.subject" as described by https://docs.oasis-open.org/odata/odata/v4.01/odata-v4.01-part1-protocol.html#sec_Preferenceincludeannotationsodatainc However, I assume the annotations are not meant to solve that case.

Unfortunately, the IANA preference registry has no property we could use to tell the server which properties to return in the response.

Related: #3782

But all this is not about the original question: when is async etag propagation allowed? Do clients expect the root etag to change synchronously?

Maybe a simple Prefer: return=representation or Prefer: return=minimal as described in https://datatracker.ietf.org/doc/html/rfc7240#section-4.2 is sufficient. IMO no. etag propagation should always be async, but I remeber reading a webdav rfc text mentioning that the etag of a collection has to change if any of its contained collections changed ...

@dragotin
Copy link
Contributor

dragotin commented Jun 27, 2022

I think it can be summarized:

  • Async propagation of the ETag up the dir tree is fine. Clients can deal with that. Testsuite needs to as well.
  • Clients make benefit if the last action (last move for example) returns the ETag, remote permissions and FileID metadata header. If one of them is missing that either causes an error (FileID) or additional subsequent PROPFINDs. Both is not optimal.
    I do not think that we have to add new headers now. That is implicit, yes, but for long time, and to no harm.

@butonic
Copy link
Member

butonic commented Jun 27, 2022

Which brings us back to owncloud/core#14531 (comment)

Only if you talk to broken servers. An ETag shouldn't say anything about the contents of a collection, so you shouldn't rely on this behavior if you are trying to build a standard client.

Instead, you can use the ctag (or better yet) use the sync-collection.

and owncloud/core#14531 (comment)

The problem with using the ETag is that it does have a specific definition, and that definition is incompatible with how you are using the etag.

👋 to @evert if he is still around. Thx for all the insights you provided over the years! It is purely on our side to not having followad your advice any better...

To clarify: in the webdav rfc the etag purpose is described as

Contains the ETag header value (from Section 14.19 of [RFC2616]) as it would be returned by a GET without accept headers.

But webdav collections are not required to have a GET representation that correlates with the listing of the collection, http://www.webdav.org/specs/rfc4918.html#rfc.section.9.4:

The semantics of GET are unchanged when applied to a collection, since GET is defined as, "retrieve whatever information (in the form of an entity) is identified by the Request-URI" [RFC2616]. GET, when applied to a collection, may return the contents of an "index.html" resource, a human-readable view of the contents of the collection, or something else altogether. Hence, it is possible that the result of a GET on a collection will bear no correlation to the membership of the collection.

So we basically made up that semantic meaning on our own.

While getctag for calendar collections has meanwhile been deprecated in favor of the generic DAV:sync-token as per https://datatracker.ietf.org/doc/html/rfc6578, we will have to send getetag at least for legacy clients. For newer clients we could return getctag for collections.

The getctag aka collection tag is not to be confused with the [cTag property] returned in the graph api which is a content tag:

An eTag for the content of the item. This eTag is not changed if only the metadata is changed. Note: This property is not returned if the item is a folder. Read-only.

and then they clarify with this:

The eTag and cTag properties work differently on containers (folders). The cTag value is modified when content or metadata of any descendant of the folder is changed. The eTag value is only modified when the folder's properties are changed, except for properties that are derived from descendants (like childCount or lastModifiedDateTime).

@evert
Copy link

evert commented Jun 27, 2022

Thank you!!

@dragotin
Copy link
Contributor

dragotin commented Jun 27, 2022

well yes. I am kinda lost how that relates to the initial problem of this issue. I think @individual-it has a reasonable answer, so I am closing this one. Feel free to re-open.

@butonic
Copy link
Member

butonic commented Jun 27, 2022

@dragotin there is no place where we properly documented what an etag in owncloud world means. Without a common understanding of the concept we cannot discuss this topic. But I agree: etag propagation can always be asyncronous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants