Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] 2nd upload of 10k files much much slower than 1st upload #5062

Closed
michaelstingl opened this issue Nov 16, 2022 · 7 comments
Closed
Labels
Priority:p1-urgent Consider a hotfix release with only that fix Type:Bug

Comments

@michaelstingl
Copy link
Contributor

Describe the bug

When a upload the 10k files from #5061 a 2nd time (new sync run), the performance is horrible.

Steps to reproduce

Steps to reproduce the behavior:

  1. Create 1000x small files in a folder for n in {0..999}; do dd if=/dev/urandom of=random$( printf %03d "$n" ).file bs=1 count=$(( RANDOM + 1024 )); done
  2. Duplicate this folder 10x
  3. Sync all this files in a single sync run with 3.0-prerelease desktop client
  4. Wait a bit
  5. Duplicate the folder, which triggers a new sync run

Expected behavior

2nd upload of 10k files shouldn't be slower than first upload.

Actual behavior

2nd upload is horrible slow.

Picture 1

Setup

oCIS Server:

https://ocis.owncloud.com/

Desktop Client

% /Applications/owncloud.app/Contents/MacOS/owncloud --version
ownCloud ownCloud 3.0.0.9105-daily20221115
https://github.com/owncloud/client/commit/4614194ca6295c00f21e573565c643e0bfc472b0
Libraries Qt 5.15.5, OpenSSL 1.1.1q  5 Jul 2022
Using virtual files plugin: suffix
osx-21.5.0

Desktop was started this way:

QT_LOGGING_RULES='*=false;sync.httplogger=true' \
/Applications/owncloud.app/Contents/MacOS/owncloud \
--logdebug --logflush --logfile ~/upload.log --showsettings

Desktop client was running on Mac mini in Hetzner datacenter. (1 Gbit/s)

Additional context

Desktop log uploaded here:

@micbar micbar added this to the 2.0.0 General Availability milestone Nov 16, 2022
@micbar micbar added Priority:p1-urgent Consider a hotfix release with only that fix GA-Blocker labels Nov 16, 2022
@butonic
Copy link
Member

butonic commented Nov 16, 2022

We need to double check that the search service is the culprit:

We deliberately made the search service discover changes in a space by starting at the root and descending into changed directories: https://github.com/owncloud/ocis/pull/4777/files

We also increased the debounce time to wait at least a second before stariting to index the space: #4964

Finally, we made the search service only run one indexer per space: #5001

At some point we have to index. I do not see an easy way to give requests by the search service less priority.

AFAICT the read requests and the update requests are fighting for locks. I think we can mitigate a lot of syscall by caching the node metadata in memory and using a single stat call to invalidate the cache. This would be similar to how the jsoncs3 provider caches the data. I did start moving all metadata properties into the node struct in cs3org/reva#3261 but it bitrotted as the GA became more pressing.

@aduffeck might have an opinion on what currently happens in the code.

IMO the search service is using ListConstainer to read directories while the client is uploading files. Every upload propagates to the root. the decomposedfs is reading nodes by id ... all the nodes are in the same space ... spread out over several subdirs ... 🤔

@michaelstingl
Copy link
Contributor Author

  • or redo these tests without starting the search service

Happy to re-test, but I'd need help with a simpler way to generate such graphs. Dashboard? Script that parses the desktop log files?

@butonic
Copy link
Member

butonic commented Nov 24, 2022

to reproduce the graphs in jaeger on v2.0.0-rc.1 I uploaded 1k files, paused and unpaused the sync in the middle with the testclient:

image

Limiting to <200ms requests

image

I then did this again with the search service disabled

image

And then using ocis master (with search enabled):

image

As you can see there are more outliers, but overall not nearly as bad as in v2.0.0-rc.1

Limiting to only requests <20ms:

image

@micbar
Copy link
Contributor

micbar commented Nov 24, 2022

@dragotin @michaelstingl @tbsbdr @hodyroff I think this is enough for GA.

Lets unblock this.

@dragotin
Copy link
Contributor

What I see is: With latest master, in the second run most of the request are still very fast, only a few are breaking out up to 500 msec.
If that interpretation is correct, I think that is an awesome result.

@tbsbdr
Copy link
Contributor

tbsbdr commented Nov 24, 2022

Lets unblock this.

yes, results look GA ready 👍

@micbar
Copy link
Contributor

micbar commented Nov 25, 2022

Ok, closing. We will improve metadata handling in the decomposedFS further.

@micbar micbar closed this as completed Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:p1-urgent Consider a hotfix release with only that fix Type:Bug
Projects
Archived in project
Development

No branches or pull requests

5 participants