Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Upload speed drops while uploading 1k files in a folder #5061

Closed
michaelstingl opened this issue Nov 16, 2022 · 12 comments · Fixed by #5092
Closed

[Performance] Upload speed drops while uploading 1k files in a folder #5061

michaelstingl opened this issue Nov 16, 2022 · 12 comments · Fixed by #5092
Labels
Priority:p1-urgent Consider a hotfix release with only that fix Type:Bug

Comments

@michaelstingl
Copy link
Contributor

michaelstingl commented Nov 16, 2022

Describe the bug

During upload of 1k files to a directory, performance drops heavily. Upload 1k files to a new directory is initially faster, then drops again. All where done in a single sync run with the 3.0-prerelease desktop client.

Steps to reproduce

Steps to reproduce the behavior:

  1. Create 1000x small files in a folder for n in {0..999}; do dd if=/dev/urandom of=random$( printf %03d "$n" ).file bs=1 count=$(( RANDOM + 1024 )); done
  2. Duplicate this folder 10x
  3. Sync all this files in a single sync run with 3.0-prerelease desktop client

Expected behavior

Upload of 1k small files shouldn't slow down.

Actual behavior

During upload of 1k files to a directory, performance drops heavily.
Picture 2

Setup

oCIS Server:

https://ocis.owncloud.com/

Desktop Client

% /Applications/owncloud.app/Contents/MacOS/owncloud --version
ownCloud ownCloud 3.0.0.9105-daily20221115
https://github.com/owncloud/client/commit/4614194ca6295c00f21e573565c643e0bfc472b0
Libraries Qt 5.15.5, OpenSSL 1.1.1q  5 Jul 2022
Using virtual files plugin: suffix
osx-21.5.0

Desktop was started this way:

QT_LOGGING_RULES='*=false;sync.httplogger=true' \
/Applications/owncloud.app/Contents/MacOS/owncloud \
--logdebug --logflush --logfile ~/upload.log --showsettings

Desktop client was running on Mac mini in Hetzner datacenter. (1 Gbit/s)

Additional context

Desktop log uploaded here:

@butonic
Copy link
Member

butonic commented Nov 16, 2022

I can confirm this locally, uploading 1000 4Kb binary files with testpilot client 3.0.0.9006-beta1. no search service running.

2022-11-16T12:32:39Z INF access-log bytes=0 duration=22.028038 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzU5LCJpYXQiOjE2Njg2MDE5NTksInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy8zZTNhM2ZhYi1jMTIwLTRiM2UtYjNmZS1lY2M3Yjg0OGNmNzYifQ.SDjsWciU1bFeJuXeYoKv_TFuQlF6Cgnva_qR4UgY5rE proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-008190 service=proxy status=204
2022-11-16T12:32:39Z INF access-log bytes=0 duration=23.812366 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzU5LCJpYXQiOjE2Njg2MDE5NTksInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy80YWY3YzJjMy1hMzJiLTQ1YWUtYTAzZC0xNDIyMThlZTM3OGMifQ.hV2hp6cnhQmXz-isItIt-Eq8gKcbmtNGzOTUPaJwUCc proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-008189 service=proxy status=204
2022-11-16T12:32:39Z INF access-log bytes=0 duration=45.441521 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzU4LCJpYXQiOjE2Njg2MDE5NTksInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy84MjY4MmFiMC0zMDhhLTRmMDgtOGUwZS05NWEzYzRlZWFjNzUifQ.9akP3j5NtVK9DCMWgP6Nf9k1U9DHPVav_tuOfWSKcdU proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-008186 service=proxy status=204
2022-11-16T12:32:39Z INF access-log bytes=0 duration=17.022728 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzU5LCJpYXQiOjE2Njg2MDE5NTksInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy9lODBjYjY4Ny1hN2QxLTQ4NzYtYTQwNC01ZjA3YTYxNmEyMTMifQ._3k6vMaSuWzmUMFDcQDATUb9qQ1aleYVrncER1tk0jw proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-008196 service=proxy status=204
2022-11-16T12:32:39Z INF access-log bytes=0 duration=27.138679 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzU5LCJpYXQiOjE2Njg2MDE5NTksInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy80MzZhMTkxZS0zNjMxLTRkMjEtODhkMS01YjZmZjE2NWFjMDgifQ.92krmsw4yR0Db8MDyyIdovMl5ymJWZRsrUzXoNSpL8E proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-008195 service=proxy status=204
2022-11-16T12:32:39Z INF access-log bytes=0 duration=11.246171 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzU5LCJpYXQiOjE2Njg2MDE5NTksInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy81MDRhZjhkMi1mNDMyLTQxZDgtYWE2Yi00YzRlYWMwNTI4OTMifQ.eIhz6Y4OIUlZk8acDZHoYBa-1QnhMJ0v_WDRR_yrGUU proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-008201 service=proxy status=204

vs last 6 PATCH requests

2022-11-16T12:33:02Z INF access-log bytes=0 duration=62.488943 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzgyLCJpYXQiOjE2Njg2MDE5ODIsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy8wOTZjM2NlOC1jZmQ5LTQyNTAtYWJmMy05Yzc2ZmUxNmVkMGYifQ.1WwCCsN14w4R6WNMX8E1qys5BzIsk2Ck5SsgkqBWno8 proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-011042 service=proxy status=204
2022-11-16T12:33:02Z INF access-log bytes=0 duration=64.663950 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzgyLCJpYXQiOjE2Njg2MDE5ODIsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy9hM2JlOGU3MS0yMjE2LTQzMWEtYjk1NC0zYjQxM2E2ODBhZTgifQ.ZFhZR2PI2QgDeb-rNv7-XROkeFhuQQCVFtxcNUp_TQw proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-011045 service=proxy status=204
2022-11-16T12:33:02Z INF access-log bytes=0 duration=65.938703 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzgyLCJpYXQiOjE2Njg2MDE5ODIsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy81OWU3MjBmOC04ZmZiLTQ2MzMtOGNhNS03NGM0NmQzZjcyZjcifQ.Ls-RTST8xY6Vn2G80eyAZKjdQZr1suLQQQ5_VrJFRMA proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-011048 service=proxy status=204
2022-11-16T12:33:02Z INF access-log bytes=0 duration=57.070028 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzgyLCJpYXQiOjE2Njg2MDE5ODIsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy9lN2EyNmVlZi05NjM1LTQ2YWItYmY5MC1kODMzN2I3ZDBhYTAifQ.GUs7r4CQuQMq0woxeIjGfhqWQfXxLk1VxnLOplGMs50 proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-011052 service=proxy status=204
2022-11-16T12:33:02Z INF access-log bytes=0 duration=63.078671 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzgyLCJpYXQiOjE2Njg2MDE5ODIsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy9jOWY5NmE1Ny0zYTExLTQ5NzMtYWFmMC0xZDU2MmFhMjhhNTUifQ.CZ6l7VJzTej-JbXZ_X6wnD7_HM92omEAnMqsh1zw0sw proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-011051 service=proxy status=204
2022-11-16T12:33:02Z INF access-log bytes=0 duration=56.673964 method=PATCH path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY4Njg4MzgyLCJpYXQiOjE2Njg2MDE5ODIsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy8xNDY1NjQ0Ni0yMDU1LTQ2NDQtYWRjZC0xNGQ2YjMwNTgzMmEifQ.ha1JyMGoxIczn9AAfDeGHMm1H-4fr5ziJOpjWGuqAAE proto=HTTP/1.1 remote-addr=172.19.0.1 request-id=ocis/KG83wMnlrl-011057 service=proxy status=204

⛏️ 👀

@butonic
Copy link
Member

butonic commented Nov 16, 2022

we can even get a graph from jaeger when setting the search parameters right
image

@butonic
Copy link
Member

butonic commented Nov 16, 2022

unfortunately, no ListFolder is involved ...
⛏️ 👀

@butonic
Copy link
Member

butonic commented Nov 16, 2022

hmmm might be size aggregation 🤔

@butonic
Copy link
Member

butonic commented Nov 16, 2022

when commenting the propagation (which includes the treesize aggregation) we get
image

@butonic
Copy link
Member

butonic commented Nov 16, 2022

only size aggregation disabled:
image

@butonic
Copy link
Member

butonic commented Nov 16, 2022

only size aggregation, no etag propagation
image

@butonic
Copy link
Member

butonic commented Nov 16, 2022

to calculate the folder size we

  1. iterate over all filenames,
  2. stat them to determine if it is a file or dir
  3. read the treesize (dir) or the blobsize (file)

there might be a small optimization, but I have to check if it has a real impact.

meh jaeger is limited to 1500 traces

uploading 10k files:
image

I'll submit a PR that slightly improves calculating the folder size by using f.Readdir(0) instead of f.Readdirnames(0) and subsequent stats for each child.

@butonic
Copy link
Member

butonic commented Nov 16, 2022

that is just a small improvement ... AFAIR we made the upload finish async in euperimental ... so the problem should not arise there ...

in https://github.com/cs3org/reva/blob/experimental/pkg/storage/utils/decomposedfs/upload/upload.go#L262-L274

in line 264 we could make postprocessing async in a go routine. at least the size aggregation can be done async

@butonic
Copy link
Member

butonic commented Nov 16, 2022

other options to address the calculatetreesize performance:

lock parent

instead of summing up all children lock the parent, read the treesize, add the size of the uploaded file, write pack the treesize, unlock the parent. This is an improvement that can be done independend of the below async propagation, but special care needs to be taken to make sure we do not loose a file upload because the parent was locked and we did not wailt long enough. scary. anyway, we should have a cli tool to recalculate the treesize of a space anyway.

fully async upload with upload journal

on experimental we already have async postprocessing
we want to move the etag propagation and treesize aggregation to an async process. for this we need two things:

  1. introduce a Prefer: async header
  2. a journal that we can use to pick up propagation when the service crashes

we could use the tus upload info as a journal as it already keeps track of uploads, but we would have to defer deletion of the upload info until we finished propagation. currently, the file is deleted before we do the propagation.

@butonic
Copy link
Member

butonic commented Nov 17, 2022

before we can propagate the size diff we first have to be able to atomically calculate it based on the previous version and the current file. While looking into thet we found some corner cases and had to rewrite the finish upload code.
cs3org/reva#3473

@butonic
Copy link
Member

butonic commented Nov 21, 2022

Ok, I have a run of uploading 1k files that propagates the diff instead of calculating the treesize
image

so it roughly stays flat, but there are some spikes. I guess whenn some a lot of go coroutine figthing fo a lock on the parent while propagating...

trying to limit to requests < 20ms:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:p1-urgent Consider a hotfix release with only that fix Type:Bug
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants