Depth infinity: Make recursive PROPFIND calls cheaper #14531

danimo · 2015-02-26T10:52:21Z

Testcase

curl -XPROPFIND https://user:pwd@server/owncloud/remote.php/webdav -H "Depth:infinity"

Actual results

On a well-equipped x86_64 machine it takes 7:20 minutes under heavy server load to list 5279 items (dirs/files).

Expected results

This should actually be fairly cheap, since all required information is available from the DB. The suspicion is that we could attain some serious speedup by investigating code paths and DB queries and optimizing them.

Motivation

Cheap recursive PROPFINDs would allow the client to move away from doing directory-by-directory discovery, which in turn would significantly speed up syncs, especially all sync runs until completing initial sync.

The text was updated successfully, but these errors were encountered:

DeepDiver1975 · 2015-02-26T10:53:07Z

@icewind1991 just another case for profiling webdav - please have a look - THX

guruz · 2015-02-26T12:49:18Z

Note that 1.8 has recursive PROPFIND disabled, this issue only affects 1.7.x

Sabre also creates the XML Reply in memory, maybe this could be streamed to the client instead..

DeepDiver1975 · 2015-02-26T12:50:14Z

Sabre also creates the XML Reply in memory, maybe this could be streamed to the client instead..

@evert ?? THX

evert · 2015-02-26T16:55:22Z

Congrats on finding the exact reason I didn't want this feature in the first place, and it's now disabled by default ;). It basically enabled a DDOS attack fairly easily. Do a few of these PROPFINDs in parallel, and the entire server grinds to a halt.

However, streaming xml is coming. This requires two things:

1 -> Completion and integration on the 'xml-rewrite2' branch. This is a project I've been on and off working on for about 2 years now and actually getting pretty close to integration.

Pull request here: https://github.com/fruux/sabre-dav/pull/602

This spawned a separate project sabre/xml which I'm also working on getting to a full release:

http://sabre.io/xml/

Allowing users to return iterators (and generators) wherever arrays are used.

Most of the time when creating the response for this is not just spent writing xml, the most important thing is that the entire WebDAV tree needs to be traversed on the server. Most of the work is in fetching every node, checking access control, etc...

To properly enable streaming support, we need to be able to get the results as they come in and create the xml on the fly.

Doing this is scheduled after a next major sabre/dav release, as PHP 5.5 is required for full generator support, and I don't want to do a half-baked version of this with Iterators first.

However, none of this will really affect the total time spent (7:20) as the exact same processing is still happening. The difference is that there's lower memory usage and the first byte returns a lot quicker.

To reduce the total time spent, you should still do some proper profiling as @DeepDiver1975 said.

PVince81 · 2015-02-26T17:13:39Z

Why is recursive PROPFIND needed in the first place ?
Is this mostly for the initial sync ?

Let me think:
Once everything is synced, a PROPFIND on the root should be enough to compare etags. If the etag is different, then need to do a PROPFIND to get the folder content and see which etag changed there.

evert · 2015-02-26T17:15:25Z

Only if you talk to broken servers. An ETag shouldn't say anything about the contents of a collection, so you shouldn't rely on this behavior if you are trying to build a standard client.

Instead, you can use the ctag (or better yet) use the sync-collection.

PVince81 · 2015-02-26T17:18:24Z

Can we use our existing etags as ctags ? 😉 (whatever that is, need to look it up)

PVince81 · 2015-02-26T17:19:21Z

Here was a result from blackfire.io I did when testing with Sabre 2.1, it was a PROPFIND with depth 1 on a folder with 69 entries: https://blackfire.io/profiles/6f6c0033-da1e-43d0-b886-b5eeeb62286c/graph

evert · 2015-02-26T17:21:01Z

Ctag is also non-standard, but it's well defined:

http://svn.calendarserver.org/repository/calendarserver/CalendarServer/trunk/doc/Extensions/caldav-ctag.txt

The problem with using the ETag is that it does have a specific definition, and that definition is incompatible with how you are using the etag.

So while this ctag was meant for calendars, it makes sense for any collection.

However, sync-tokens are even better. Instead of doing a full PROPFIND every time, you literally ask for all changes since the last sync-token you received which can make this incredibly fast.

PVince81 · 2015-02-26T17:25:01Z

   Description:  The CS:getctag property allows clients to quickly
      determine if the contents of a calendar or scheduling Inbox or
      Outbox collection have changed since the last time a
      "synchronization" operation was done.  The CS:getctag property
      value MUST change each time the contents of the calendar or
      scheduling Inbox or Outbox collection change, and each change MUST
      result in a value that is different from any other used with that
      collection URI.

That sounds like our etags. When the collection content changes (or any subcollection), the etag will change.

So would it just be a matter of renaming "getetag" to "getctag" ?

Last changes since the last sync means the server needs to remember the states that it returned for every sync client so it can diff it, which cannot be done with the current architecture/data model. It was suggested by @icewind1991 here: #4936 (speed up by keeping etag history)

evert · 2015-02-26T17:25:22Z

Implementing this:

http://tools.ietf.org/html/rfc6578

Will basically solve any performance problems you have today of this kind. It will greatly reduce memory usage and cpu usage on both server and client. You literally just get a list of deletions and additions back.

evert · 2015-02-26T17:27:38Z

So would it just be a matter of renaming "getetag" to "getctag" ?

It's also in a different namespace.

Last changes since the last sync means the server needs to remember the states that it returned for every sync client so it can diff it, which cannot be done with the current architecture/data model. It was suggested by @icewind1991 here: #4936 (speed up by keeping etag history)

That sounds very similar indeed. What you need from a database perspective:

A primary key (this becomes your sync token, can just be an incremental id)
The path to the file that was changed
The type of operation (modify/create/delete)

It's pretty lightweight

PVince81 · 2015-02-26T17:50:26Z

So does it mean that for every file change (rename, delete, upload, download, even outside of WebDAV) need to create a new entry there ? Or should these be grouped ?

"the path of the file that was changed": the path can change if the file was moved. So possibly need more info about moved stuff.

evert · 2015-02-26T17:54:10Z

It's flexible, you can either do it per file, or per group of changes.

I saw a note about remote storage in one of the tickets as well. I can see that that's a little harder. But even there I assume that you must have some way to figure out whether a tree or subtree has changed within remote storage, even today, right?

So the change for that would be that you now have to figure out what has changed in a remote storage tree, and log those entries as well. You would only have to do this when the sync-token is requested, and it gives you room for future optimization. A big different for remote storage is that the logic to determine changes is now something that happens on the server, not the client.

Plus down the road you get the option to handle changes in remote storage asynchronously.

PVince81 · 2015-02-26T18:07:59Z

Unfortunately the external storage stuff relies mostly on recursive scanning (no subtree magic). It checks the mtime of remote folders and compares with the local cache. If it has changed, it goes deeper. It can be slow.

I suppose it should be possible to write this as a separate experimental app/Sabre plugin (if time permits) without being too intrusive.

evert · 2015-02-26T18:12:50Z

I figured that was a possibility.. but then at least you have the option to do this processing server-side, and you don't have to let clients do this. I assume that you already need to do something similar to this today?

PVince81 · 2015-02-27T17:57:27Z

Sounds like a big rewrite 😄

I uploaded 1600 files to my local server (on master, with the new Sabre 2.1) and ran a PROPFIND with infinity through backfire. Here is the result: https://blackfire.io/profiles/488be46f-7890-4be2-8e1b-7098961a572e/graph

There aren't many folders, just this:

data/root/files
data/root/files/lots
data/root/files/lots/of
data/root/files/lots/of/files
data/root/files/lots/of/files/500
data/root/files/lots/of/files/1000
data/root/files/lots/of/files/100

The numbers are the number of files inside the respective folders.

Result doesn't look too bad.

PVince81 · 2015-02-27T18:02:48Z

Note: this was an "allprops" call. I'll maybe redo another one that also request the OC-specific props next week.

PVince81 · 2015-02-27T18:03:40Z

@danimo did your items contain shared files/folders ?

PVince81 · 2015-04-23T16:25:10Z

I hope it got better with 8.1 😄

I remember @jturcotte did some tests, but not sure if they were about recursive PROPFIND or whether they involved shared folders.

jturcotte · 2015-04-24T08:11:53Z

I've set Depth to 1 in all requests: https://github.com/owncloud/administration/blob/master/performance-tests-c%2B%2B/main.cpp#L69

And the test data was only 1 directory deep, so that wouldn't stress it so much either.

PVince81 · 2015-09-21T14:38:57Z

Any new data from 8.1.3 ? 😄

DeepDiver1975 · 2015-09-22T13:51:16Z

let's move this on -> 9.0

guruz · 2015-11-04T09:18:10Z

Since the sync client is not using recursive PROPFIND anymore and the code was removed from the client (since it was still libneon based instead of Qt QNetworkAccessManager) I'm closing this.

LukeOwlclaw · 2016-02-29T16:52:24Z

I just had a look at the mysql log for a single, non-recursive PROPFIND operation:

104 SELECTs
16 `UPDATE oc_file_lock``
8 INSERT INTOoc_file_locks`

Can this be correct? In particular, are the writing access operations to oc_file_locks needed for a purely reading PROPFIND?

PVince81 · 2016-02-29T16:55:10Z

When doing a PROPFIND, you are reading a list of entries in a folder.
A concurrent process could make a chance to the folder listing while you are changing it (ex: add/update/remove entries).
So the folder currently being listed will be locked with a shared lock to avoid modifications to it during the read operation (also modifications to etags/sizes, etc). Failure to do so could cause unpredictable results, like missed entries or so.

PVince81 · 2016-02-29T16:55:17Z

@icewind1991 correct me if I'm wrong

LukeOwlclaw · 2016-02-29T18:13:13Z

The only thing that comes to my mind what could go wrong, is that a directory size is not equal to the sum of its containing items (in case dir size is somehow obtained first, then a file is changed in parallel, afterwards containing files are "propfound" (but this sounds like an odd implementation if that happened)). It this case the result is inconsistent in itself, however, a PROPFIND should never "see" and report an inconsistent DB state as this must be ensured by the write operations using appropriate locking.

LukeOwlclaw · 2016-02-29T19:11:04Z

My apologies, I used version 8.2.0. Now using 8.2.2 performance is much better (~factor 100). Probably due to the fact that there are no UPDATEs nor INSERTs and only about half the number of SELECTs.

guruz · 2017-03-15T13:25:47Z

Re-opening as per @hodyroff @michaelstingl @ogoffart @jturcotte @SergioBertolinSG

michaelstingl · 2017-03-15T14:31:30Z

… adding @mrow4a to the discussion about performance…

mrow4a · 2017-03-15T14:58:17Z

Working on it, and see no ending :>

#27372
#27339
#27373
#27284

guruz · 2017-03-23T16:46:37Z

Some data points:
A infinity PROPFIND XML reply taking uncompressed 12 MB (around 2900 directories, around 20.000 files)
Downloads currently in 25 sec using curl.
I didn't check the server memory usage.

(no memory / XML streaming operations yet).

PVince81 · 2017-03-30T08:59:48Z

Found this PR about PROPFIND streaming: https://github.com/fruux/sabre-dav/pull/898

guruz · 2017-04-04T18:41:43Z

When implementing this, please also try to gzip the response stream.

Right now in our recommended web server configs we disable gzip because some web servers mess with ETags then. But for the text-based PROPFIND it still makes sense.

ogoffart · 2017-04-05T06:40:27Z

Right now in our recommended web server configs we disable gzip because some web servers mess with ETags then

We have been using OC-ETAG snice mirall 1.7.0, so i think we can ignore this issue now. (are we still supporting such clients)

But I wonder about the resuming when downloading files.

hodyroff · 2017-04-05T06:45:56Z

Support as in Support Contract and active fixing = latest version only
Support as in should work and if not we consider it a bug = 2.0 with 10.0; 1.8 with 9.1 IMHO

PVince81 · 2017-07-10T09:02:10Z

bug with Infinity: #28341

michaelstingl · 2021-07-09T09:24:10Z

Related: #38583

stale · 2021-09-20T14:27:48Z

This issue has been automatically closed.

danimo added this to the 8.1-current milestone Feb 26, 2015

DeepDiver1975 added the performance label Feb 26, 2015

DeepDiver1975 mentioned this issue Mar 2, 2015

PROPFIND bug prevents larger number of files per directory (and causes connection timeouts) #5084

Closed

karlitschek modified the milestones: 8.2-next, 8.1-current May 4, 2015

DeepDiver1975 modified the milestones: 9.0-next, 8.2-current Sep 22, 2015

guruz closed this as completed Nov 4, 2015

guruz removed this from the 9.0 milestone Mar 15, 2017

guruz reopened this Mar 15, 2017

guruz mentioned this issue Mar 16, 2017

Paginated requests for the file list #13915

Closed

guruz changed the title ~~Make recursive PROPFIND calls cheaper~~ Depth infinity: Make recursive PROPFIND calls cheaper Mar 24, 2017

guruz mentioned this issue Apr 4, 2017

Caching remote discovery results owncloud/client#2976

Open

PVince81 added the enhancement label Dec 18, 2017

ownclouders added the status/STALE label Jan 18, 2018

PVince81 added this to the backlog milestone Nov 5, 2018

michaelstingl mentioned this issue Apr 13, 2021

[POC] PROPFIND with Depth:infinity and streaming XML owncloud/ios-app#950

Closed

stale bot closed this as completed Sep 20, 2021

JammingBen mentioned this issue Oct 11, 2021

[full-ci] Enable streaming for propfind #38583

Merged

11 tasks

butonic mentioned this issue Jun 27, 2022

etag propagation is not always synchronous owncloud/ocis#3988

Closed

Depth infinity: Make recursive PROPFIND calls cheaper #14531

Depth infinity: Make recursive PROPFIND calls cheaper #14531

Comments

danimo commented Feb 26, 2015

Testcase

Actual results

Expected results

Motivation

DeepDiver1975 commented Feb 26, 2015

guruz commented Feb 26, 2015

DeepDiver1975 commented Feb 26, 2015

evert commented Feb 26, 2015

PVince81 commented Feb 26, 2015

evert commented Feb 26, 2015

PVince81 commented Feb 26, 2015

PVince81 commented Feb 26, 2015

evert commented Feb 26, 2015

PVince81 commented Feb 26, 2015

evert commented Feb 26, 2015

evert commented Feb 26, 2015

PVince81 commented Feb 26, 2015

evert commented Feb 26, 2015

PVince81 commented Feb 26, 2015

evert commented Feb 26, 2015

PVince81 commented Feb 27, 2015

PVince81 commented Feb 27, 2015

PVince81 commented Feb 27, 2015

PVince81 commented Apr 23, 2015

jturcotte commented Apr 24, 2015

PVince81 commented Sep 21, 2015

DeepDiver1975 commented Sep 22, 2015

guruz commented Nov 4, 2015

LukeOwlclaw commented Feb 29, 2016

PVince81 commented Feb 29, 2016

PVince81 commented Feb 29, 2016

LukeOwlclaw commented Feb 29, 2016

LukeOwlclaw commented Feb 29, 2016

guruz commented Mar 15, 2017

michaelstingl commented Mar 15, 2017

mrow4a commented Mar 15, 2017 • edited Loading

guruz commented Mar 23, 2017

PVince81 commented Mar 30, 2017

guruz commented Apr 4, 2017

ogoffart commented Apr 5, 2017

hodyroff commented Apr 5, 2017

PVince81 commented Jul 10, 2017

michaelstingl commented Jul 9, 2021

stale bot commented Sep 20, 2021

mrow4a commented Mar 15, 2017 •

edited

Loading