-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Depth infinity: Make recursive PROPFIND calls cheaper #14531
Comments
@icewind1991 just another case for profiling webdav - please have a look - THX |
Note that 1.8 has recursive PROPFIND disabled, this issue only affects 1.7.x Sabre also creates the XML Reply in memory, maybe this could be streamed to the client instead.. |
@evert ?? THX |
Congrats on finding the exact reason I didn't want this feature in the first place, and it's now disabled by default ;). It basically enabled a DDOS attack fairly easily. Do a few of these PROPFINDs in parallel, and the entire server grinds to a halt. However, streaming xml is coming. This requires two things: 1 -> Completion and integration on the 'xml-rewrite2' branch. This is a project I've been on and off working on for about 2 years now and actually getting pretty close to integration. Pull request here: https://github.com/fruux/sabre-dav/pull/602 This spawned a separate project sabre/xml which I'm also working on getting to a full release:
Most of the time when creating the response for this is not just spent writing xml, the most important thing is that the entire WebDAV tree needs to be traversed on the server. Most of the work is in fetching every node, checking access control, etc... To properly enable streaming support, we need to be able to get the results as they come in and create the xml on the fly. Doing this is scheduled after a next major sabre/dav release, as PHP 5.5 is required for full generator support, and I don't want to do a half-baked version of this with Iterators first. However, none of this will really affect the total time spent (7:20) as the exact same processing is still happening. The difference is that there's lower memory usage and the first byte returns a lot quicker. To reduce the total time spent, you should still do some proper profiling as @DeepDiver1975 said. |
Why is recursive PROPFIND needed in the first place ? Let me think: |
Only if you talk to broken servers. An ETag shouldn't say anything about the contents of a collection, so you shouldn't rely on this behavior if you are trying to build a standard client. Instead, you can use the ctag (or better yet) use the sync-collection. |
Can we use our existing etags as ctags ? 😉 (whatever that is, need to look it up) |
Here was a result from blackfire.io I did when testing with Sabre 2.1, it was a PROPFIND with depth 1 on a folder with 69 entries: https://blackfire.io/profiles/6f6c0033-da1e-43d0-b886-b5eeeb62286c/graph |
Ctag is also non-standard, but it's well defined: The problem with using the ETag is that it does have a specific definition, and that definition is incompatible with how you are using the etag. So while this ctag was meant for calendars, it makes sense for any collection. However, sync-tokens are even better. Instead of doing a full PROPFIND every time, you literally ask for all changes since the last sync-token you received which can make this incredibly fast. |
That sounds like our etags. When the collection content changes (or any subcollection), the etag will change. So would it just be a matter of renaming "getetag" to "getctag" ? Last changes since the last sync means the server needs to remember the states that it returned for every sync client so it can diff it, which cannot be done with the current architecture/data model. It was suggested by @icewind1991 here: #4936 (speed up by keeping etag history) |
Implementing this: http://tools.ietf.org/html/rfc6578 Will basically solve any performance problems you have today of this kind. It will greatly reduce memory usage and cpu usage on both server and client. You literally just get a list of deletions and additions back. |
It's also in a different namespace.
That sounds very similar indeed. What you need from a database perspective:
It's pretty lightweight |
So does it mean that for every file change (rename, delete, upload, download, even outside of WebDAV) need to create a new entry there ? Or should these be grouped ? "the path of the file that was changed": the path can change if the file was moved. So possibly need more info about moved stuff. |
It's flexible, you can either do it per file, or per group of changes. I saw a note about remote storage in one of the tickets as well. I can see that that's a little harder. But even there I assume that you must have some way to figure out whether a tree or subtree has changed within remote storage, even today, right? So the change for that would be that you now have to figure out what has changed in a remote storage tree, and log those entries as well. You would only have to do this when the sync-token is requested, and it gives you room for future optimization. A big different for remote storage is that the logic to determine changes is now something that happens on the server, not the client. Plus down the road you get the option to handle changes in remote storage asynchronously. |
Unfortunately the external storage stuff relies mostly on recursive scanning (no subtree magic). It checks the mtime of remote folders and compares with the local cache. If it has changed, it goes deeper. It can be slow. I suppose it should be possible to write this as a separate experimental app/Sabre plugin (if time permits) without being too intrusive. |
I figured that was a possibility.. but then at least you have the option to do this processing server-side, and you don't have to let clients do this. I assume that you already need to do something similar to this today? |
Sounds like a big rewrite 😄 I uploaded 1600 files to my local server (on master, with the new Sabre 2.1) and ran a PROPFIND with infinity through backfire. Here is the result: https://blackfire.io/profiles/488be46f-7890-4be2-8e1b-7098961a572e/graph There aren't many folders, just this:
The numbers are the number of files inside the respective folders. Result doesn't look too bad. |
Note: this was an "allprops" call. I'll maybe redo another one that also request the OC-specific props next week. |
@danimo did your items contain shared files/folders ? |
I hope it got better with 8.1 😄 I remember @jturcotte did some tests, but not sure if they were about recursive PROPFIND or whether they involved shared folders. |
I've set Depth to 1 in all requests: https://github.com/owncloud/administration/blob/master/performance-tests-c%2B%2B/main.cpp#L69 And the test data was only 1 directory deep, so that wouldn't stress it so much either. |
Any new data from 8.1.3 ? 😄 |
let's move this on -> 9.0 |
Since the sync client is not using recursive PROPFIND anymore and the code was removed from the client (since it was still libneon based instead of Qt QNetworkAccessManager) I'm closing this. |
I just had a look at the mysql log for a single, non-recursive
Can this be correct? In particular, are the writing access operations to |
When doing a PROPFIND, you are reading a list of entries in a folder. |
@icewind1991 correct me if I'm wrong |
The only thing that comes to my mind what could go wrong, is that a directory size is not equal to the sum of its containing items (in case dir size is somehow obtained first, then a file is changed in parallel, afterwards containing files are "propfound" (but this sounds like an odd implementation if that happened)). It this case the result is inconsistent in itself, however, a PROPFIND should never "see" and report an inconsistent DB state as this must be ensured by the write operations using appropriate locking. |
My apologies, I used version 8.2.0. Now using 8.2.2 performance is much better (~factor 100). Probably due to the fact that there are no |
Re-opening as per @hodyroff @michaelstingl @ogoffart @jturcotte @SergioBertolinSG |
… adding @mrow4a to the discussion about performance… |
Some data points: (no memory / XML streaming operations yet). |
Found this PR about PROPFIND streaming: https://github.com/fruux/sabre-dav/pull/898 |
When implementing this, please also try to gzip the response stream. Right now in our recommended web server configs we disable gzip because some web servers mess with ETags then. But for the text-based PROPFIND it still makes sense. |
We have been using But I wonder about the resuming when downloading files. |
Support as in Support Contract and active fixing = latest version only |
bug with Infinity: #28341 |
Related: #38583 |
This issue has been automatically closed. |
Testcase
curl -XPROPFIND https://user:pwd@server/owncloud/remote.php/webdav -H "Depth:infinity"
Actual results
On a well-equipped x86_64 machine it takes 7:20 minutes under heavy server load to list 5279 items (dirs/files).
Expected results
This should actually be fairly cheap, since all required information is available from the DB. The suspicion is that we could attain some serious speedup by investigating code paths and DB queries and optimizing them.
Motivation
Cheap recursive PROPFINDs would allow the client to move away from doing directory-by-directory discovery, which in turn would significantly speed up syncs, especially all sync runs until completing initial sync.
The text was updated successfully, but these errors were encountered: