Chunked upload performance improvements #47682

provokateurin · 2024-09-02T12:35:07Z

Motivation

Chunked upload is useful in many cases, but also slower than uploading the entire file directly.
Overall it is slower because multiple network requests have to be made.
The speed of requests varies during their lifetime, as it is slower at the beginning and then flattens out at the maximum over time.
The smaller the requests are and the more we make, the worse performance penalty.
Thus to increase chunked upload speed the size of chunks should be increased.

A single upload using cURL is the upper limit of the possible upload speed for any configuration and upload method.
A chunked upload with the chunk size equal to or greater than the file size represents the upper limit for chunked uploads as it only uploads a single chunk.
While reaching the former would be nice, only the latter is achievable (without general performance improvements in WebDAV regardless of the maximum chunk size) and thus represents the theoretical goal.

Testing methodology

Input

dd if=/dev/random of=1G.bin bs=1G count=1

Scenarios

All tests are running on a local instance using the PHP standalone web server with 10 workers and no extra apps enabled.
The machine has a Ryzen 5 5800X (8 threads, 16 cores), 48GB RAM and a NVMe M.2 Samsung 980 SSD 1TB.
Hardware should not be a bottleneck on this setup and external networking can not have an effect either.

1. cURL single upload

Take the Real timing value.

time curl -X PUT "http://localhost:8080/remote.php/webdav/1G.bin" -u admin:admin --upload-file 1G.bin

Runs:
5.412s
5.223s
5.100s
Average: 5.245s

Note: I once saw an outlier that only took about 4.7s, but this never happened again.

Chunked upload via browser

Open Firefox Devtools and filter network requests by dav/uploads.
Upload 1G.bin via web interface.
Take Started of first (MKCOL) and the Downloaded of the last (MOVE) request and subtract them (See the Timings tab of each request).
This includes some constant overhead for the MKCOL and MOVE requests which is not relevant for comparing chunked upload timing results as they all have the same overhead, but when comparing to the cURL scenario it accurately measures the overall time for the upload process.

According to https://firefox-source-docs.mozilla.org/devtools-user/network_monitor/throttling/index.html "Wi-Fi" throttling means a maximum speed of 15 Mbps.
Sadly this is the "fastest" speed one can select for throttling and there is no way to set a custom speed.
It should represent a worst-case, while most uploads are probably done with 2-3x that speed in the real world if the Nextcloud instance is not on the same network.

Adjusting the default maximum chunk size can be done in

server/apps/files/lib/App.php

Line 45 in 7964058

    
           $maxChunkSize = (int)Server::get(IConfig::class)->getAppValue('files', 'max_chunk_size', (string)(10 * 1024 * 1024));

2. Chunk size 10MiB (current default), unlimited bandwidth

Chunks: 103
Runs:
47.16s
47.65s
47.33s
Average: 47.38s

3. Chunk size 100MiB, unlimited bandwidth

Chunks: 11
Runs:
8.53s
8.64s
8.63s
Average: 8.6s

4. Chunk size 1024MiB, unlimited bandwidth

Chunks: 1
Runs:
6.37s
6.34s
6.34s
Average: 6.35s

5. Chunk size 10MiB (current default), throttled "Wi-Fi"

Chunks: 103
Runs:
551.40s
551.40s
551.40s
Average: 551.40s

6. Chunk size 100MiB, throttled "Wi-Fi"

Chunks: 11
Runs:
552.60s
549.60s
551.40s
Average: 551.2s

7. Chunk size 1024MiB, throttled "Wi-Fi"

Chunks: 1
Runs:
568.20s
555.60s
553.11s
Average: 558.97s

Conclusions

Upload speed in Nextcloud is very consistent regardless of upload method. Great!
Chunked upload in general takes about 21% longer in scenarios with unlimited bandwidth (scenario 1 and 4). Whether this overhead can be eliminated easily is not clear, but at least there is no hard limitation since both uploads are done through WebDAV and thus use the same underlying stack (also see other interesting findings section below).
In the current default configuration with unlimited bandwidth chunked upload is takes 646% longer than the maximum speed (scenario 2 and 4). By increasing the chunk size by 10x keeping the bandwidth ulimited it only takes 35% longer than the maximum speed (scenario 3 and 4). This is a 5.5x increase in total throughput (scenario 2 and 3).
In bandwidth limited scenarios increasing the chunk size has almost no positive (and no negative effect; scenario 5 and 6). This is expected as the slow speed at the beginning of each chunk is a lot smaller on relation to the overall speed or even exactly the same.
Increasing the chunk size helps uploads on fast connections while it has no downsides on slow connections speed wise. Slow networks can be correlated with unstable networks, so having fewer and larger chunks could result in a higher rate of aborted chunk uploads. This downside should be taken into consideration when choosing a new maximum chunk size.
A new maximum chunk size still needs to be figured out by collecting more data for different chunk sizes. It needs to hit a sweet spot of maximum speed with minimum size to account for the before mentioned drawback on unstable networks (basically the point of diminishing returns). This investigation was only to prove that we can increase the chunked upload speed.

Other interesting findings

While uploading with a single chunk and unlimited bandwidth Firefox displayed that the request needed 637ms to send but had to wait 2.10s after that (reproducible). This might show that we have a bottleneck in processing the uploads on the backend side. Maybe it would be possible to stream the request data directly into the file while should cut down the waiting time a lot. It should be possible to profile these requests and figure out where the time is spent.

For single chunks the MOVE request still takes quite some time. I assume this happens because it concatenates the chunks while there is only one (which is slow because it has to read and write all the data). This case could be detected and the file moved without reading and writing it (which is not possible for all storages AFAICT, i.e. it needs to be on the same physical disk to take advantage of it). This only affects uploads where the file size is less than the maximum chunk size. Due to the current low limit it is not really noticable, but with a higher maximum chunk size this would affect a lot more and bigger uploads and could lead to quite a performance improvement for those.

Upload ETA calculations are all over the place due to the varying speed of uploading chunks over their lifetime. It could be improved by taking the overall time a chunk needs to upload and multiplying it by number of remaining chunks. This should be a lot more reliable as every chunk should have a similar upload characteristics. To smooth out the the value it could take into account the last 3-5 chunks.

The text was updated successfully, but these errors were encountered:

joshtrichards · 2024-09-02T13:27:37Z

Good stuff.

A couple thoughts:

The Desktop client has already traveled the bumps from 10MB -> 1000MB -> 5000MB so there are some data points there. The main problem with the desktop client's approach at the moment is that we don't have back-off logic in place when the higher chunk size doesn't work. This creates problems when it runs up against web/proxy limits (default or otherwise)¹. But that is fixable (I think).
To avoid unnecessary support issues, invalid bug reports, and a poor user experience: Increasing the default chunk size should probably account for widespread limitations² (or) have a good dynamic/back-off logic to reconfigure if the end-to-end path doesn't seem to be working with the larger chunk size (or) be really well documented. Preferably either/both of the former two. :-)
We have variance in behavior among our various clients. Ideally we unify defaults and behavior. Unless there's a compelling reason not too of course.
A shared ability to set the chunk size either globally or on a per client type basis might be nice. Currently the server-side parameter is only applicable to the Web client. And the chunk size isn't even configurable in our Android client (for example).

e.g. See "Connection closed" message when syncing files larger then +- 100Mb desktop#4278 ↩
Reasons this happens: Apache LimitRequestBody defaults to 1GB these days (used to be unlimited by default), CloudFlare freebie level (100MB) which leads to lots of false bug reports/poor user experiences, and various other web/proxy body/transaction size limits. ↩

provokateurin · 2024-09-02T13:36:25Z

@joshtrichards thanks for your input! Highly appreciated having pointers to related topics :)

provokateurin · 2024-09-02T15:01:01Z

Hi, @nextcloud/desktop would it be possible to sit down with some of you at the Conference in two weeks and discuss this? I think we can figure out something together, solve the problems and unify the behavior across clients and the web.

provokateurin · 2024-09-02T15:03:08Z

Also @joshtrichards (and anyone else) feel free to join in case this takes place. You seem to know quite a lot of paint points of this problem which would be really helpful to consider.

camilasan · 2024-09-02T18:03:44Z

@provokateurin Matthieu and me will be at the conference, could you put something on the calendar for this?
will you join us @joshtrichards? :)

provokateurin · 2024-09-02T18:33:57Z

I'll try to find a time slot (the calendar is really full huh) and get it added to the official agenda so everyone interested can join. To avoid collisions with any other event it probably has to be late afternoon.

provokateurin · 2024-09-03T09:20:20Z

Ok, the meeting will take place Wednesday 18.9. 16:00 at Motion Lab. The room is still to be determined, but will be available in the public agenda and calendar on time.

joshtrichards · 2024-09-03T19:40:11Z

I wish I could join, but I'll not in be town for the conference.

provokateurin · 2024-09-07T19:56:05Z

I just realized I might have made a mistake. Instead of the 10 workers I think only 4 were available. Given that the chunked upload tries to upload 5 concurrent chunks at least one of the might have been stalled the whole time.
I will make new measurements to confirm if this is the case and affected the results. Overall it shouldn't changed the picture though, as only up-to 20% of the performance was lost due to this error which is a lot less than the noticed overall performance loss utilizing multiple concurrent chunks.

cfiehe · 2024-09-11T12:56:45Z

During my analysis of #47856, I have found some interesting points regarding file upload optimization in Nextcloud:

One problem in case of local storage is that OCA\DAV\Connector\Sabre\Directory does not handle moveInto efficiently. It always falls back to Sabre’s default copy-delete handling instead of making use of a more efficient renaming way when source and target are located on the same storage.

Another problem is in View.php, where Nextcloud tells the storage to use rename instead of moveFromStorage in the only case, when source and target storage are represented by the same object $storage1 === $storage2. This criterion is too strict in my opinion and will never be true in case of groupfolders, where an individual storage representation gets used. The criterion does also not play well with all those storage wrappers and prevents a more efficient way to handle file renamings on the same storage.

provokateurin · 2024-09-11T13:03:48Z

@cfiehe thanks for your insights! We will consider them in the discussion, although this issue itself is more about the chunked upload performance itself. I only noticed similar issues and noted them down here so they can be further investigated in the future.

provokateurin · 2024-09-19T19:09:23Z

Hi everyone,
we had a really good discussion about chunked upload performance.
I will try to summarize the outcome and ideas we had:

The upload speed for the existing chunked upload methods can be improved by choosing an appropriate chunk size and we settled on 100MiB based on the benchmarking and the input @joshtrichards gave. This setting should be configurable by the admin in config.php and exposed to the clients and web via capabilities. The same goes for the number of parallel chunks.
To fully saturate the upload bandwidth and reduce the number of connections a continious upload using HTTP Content-Range (https://racineennis.ca/2017/03/21/Resumable-file-transfers-with-content-range-header) can be implemented. This is not backwards compatible with the existing chunked upload method and needs to be implemented differently. The main benefit is that we don't manually control the chunk size by guessing a sensible value and that we instead let the request upload as long as possible. At some point a timeout will occur, but then we can just continue where we stopped. This means there is only 1 connection per file upload needed and we use the maximum time possible to upload which minimizes the overhead of each request as much as possible. This also has benefits on unstable networks as the chunk size is dynamic by definition, so even if your connection drops after every 1KiB, you will be able to upload large files.
~~Assembly of single chunks can be optimized by just moving the chunk to be the file instead of reading the chunk to memory and writing it as the file.~~ See comment below
The assembly of the chunks is very time consuming in the upload process. As I already discovered earlier, it even happens when you only upload a single chunk which can be optimized (just move the chunk/file instead of reading and writing it). Due to relying on an HTTP request that will also run into a timeout eventually there is actually a hard limit on the maximum file size one can upload to a particular server, as any file that can not be assembled within the timeout time can never be uploaded to the server. We had the idea to do the assembly "lazy" in the background, such that the final MOVE request would only indicate that the upload is done. There are two variants to implement this:

1. Do the assembly in the background using a background job and display to the user that the file still needs to be processed. This needs adaption in the UI and doesn't provide a great UX.
1. Same as i, but we display the file like any regular file to the user (including read&write) and keep a reference to the unassembled chunks as long as they have not been assembled by the background job yet. When reading the data of the file we just chain the streams of the chunks, such that they represent the entire file. Performance wise this could have some downsides as the speed would go up and down at the chunk borders, but it provides a much better UX as the file is directly usable after the upload. This way to implement it also has the benefit that it doesn't need changes in the UI and business logic on the web and clients.

Items 1, 2, 3 and 4 can all be implemented indepently of another, while 1 and 3 and very easy, 2 needs quite some work on the backend and client sides and 4 needs more research/talking to experts to figure out if option 4.ii is possible.
The combination of all these ideas should lead to the best possible performance and UX which is of course a super nice goal to achieve.

Thank you again to all the people who attended, the outcome is way better than what I had hoped for.

CC @sorbaugh @AndyScherzinger @tobiasKaminsky

provokateurin · 2024-10-17T08:19:19Z

Assembly of single chunks can be optimized by just moving the chunk to be the file instead of reading the chunk to memory and writing it as the file.

I implemented this, but it actually makes no difference, because the chunk is read into memory anyway. It even seemed to slow down the process a few ms, maybe because it no longer used streams but the entire data directly.

juliusknorr · 2024-10-17T10:01:51Z

Assembly of single chunks can be optimized by just moving the chunk to be the file instead of reading the chunk to memory and writing it as the file.

I implemented this, but it actually makes no difference, because the chunk is read into memory anyway. It even seemed to slow down the process a few ms, maybe because it no longer used streams but the entire data directly.

One additional limitation there is on the PHP side (ref #43636 (comment)). I could imagine some performance gain if we'd use a POST for upload of chunks where PHP exposes the temporary file directly, but doesn't seem to be possible with the current PUT approach for chunks. In the current state there is always a file written by PHP itself which we then read into memory through fopen('input://) as a stream and write again to the actual destination.

provokateurin · 2024-10-17T10:18:46Z

@juliushaertl thanks for the insight. I will keep it in mind for point 2, then it can be even faster 👍

provokateurin added 1. to develop Accepted and waiting to be taken care of feature: dav performance 🚀 labels Sep 2, 2024

provokateurin self-assigned this Sep 2, 2024

sorbaugh added this to the Nextcloud 31 milestone Sep 2, 2024

provokateurin added 2. developing Work in progress and removed 1. to develop Accepted and waiting to be taken care of labels Sep 9, 2024

provokateurin mentioned this issue Sep 11, 2024

Fix slow move on same object bucket during file upload. nextcloud/groupfolders#3205

Closed

provokateurin mentioned this issue Sep 24, 2024

changing default chunk size to 100MB nextcloud/desktop#7161

Merged

sorbaugh added the 🍀 2025-Spring label Sep 26, 2024

provokateurin changed the title ~~Chunked upload performance investigation~~ Chunked upload performance improvements Sep 28, 2024

provokateurin mentioned this issue Sep 30, 2024

perf(files): Adjust default chunk size for chunked upload to 100MiB #48461

Merged

4 tasks

provokateurin mentioned this issue Oct 9, 2024

Implement chunked uploading nextcloud/neon#1123

Open

This was referenced Oct 17, 2024

feat(files): Expose chunked upload config via capabilities #48758

Merged

Rely on server chunked upload config if present nextcloud/desktop#7347

Open

provokateurin mentioned this issue Oct 21, 2024

feat: Respect parallel count config nextcloud-libraries/nextcloud-upload#1444

Merged

szaimen mentioned this issue Oct 22, 2024

Add ENV variables to adjust max_file_uploads and max_chunk_size nextcloud/all-in-one#5449

Draft

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunked upload performance improvements #47682

Chunked upload performance improvements #47682

provokateurin commented Sep 2, 2024

joshtrichards commented Sep 2, 2024

provokateurin commented Sep 2, 2024 •

edited

Loading

provokateurin commented Sep 2, 2024

provokateurin commented Sep 2, 2024

camilasan commented Sep 2, 2024

provokateurin commented Sep 2, 2024

provokateurin commented Sep 3, 2024 •

edited

Loading

joshtrichards commented Sep 3, 2024

provokateurin commented Sep 7, 2024

cfiehe commented Sep 11, 2024

provokateurin commented Sep 11, 2024

provokateurin commented Sep 19, 2024 •

edited

Loading

provokateurin commented Oct 17, 2024

juliusknorr commented Oct 17, 2024 •

edited

Loading

provokateurin commented Oct 17, 2024

This comment has been minimized.

This comment has been minimized.

Chunked upload performance improvements #47682

Chunked upload performance improvements #47682

Comments

provokateurin commented Sep 2, 2024

Motivation

Testing methodology

Input

Scenarios

1. cURL single upload

Chunked upload via browser

2. Chunk size 10MiB (current default), unlimited bandwidth

3. Chunk size 100MiB, unlimited bandwidth

4. Chunk size 1024MiB, unlimited bandwidth

5. Chunk size 10MiB (current default), throttled "Wi-Fi"

6. Chunk size 100MiB, throttled "Wi-Fi"

7. Chunk size 1024MiB, throttled "Wi-Fi"

Conclusions

Other interesting findings

joshtrichards commented Sep 2, 2024

Footnotes

provokateurin commented Sep 2, 2024 • edited Loading

provokateurin commented Sep 2, 2024

provokateurin commented Sep 2, 2024

camilasan commented Sep 2, 2024

provokateurin commented Sep 2, 2024

provokateurin commented Sep 3, 2024 • edited Loading

joshtrichards commented Sep 3, 2024

provokateurin commented Sep 7, 2024

cfiehe commented Sep 11, 2024

provokateurin commented Sep 11, 2024

provokateurin commented Sep 19, 2024 • edited Loading

provokateurin commented Oct 17, 2024

juliusknorr commented Oct 17, 2024 • edited Loading

provokateurin commented Oct 17, 2024

This comment has been minimized.

This comment has been minimized.

provokateurin commented Sep 2, 2024 •

edited

Loading

provokateurin commented Sep 3, 2024 •

edited

Loading

provokateurin commented Sep 19, 2024 •

edited

Loading

juliusknorr commented Oct 17, 2024 •

edited

Loading