-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize PROPFIND for non-shared/shared by reducing the number of que… #27284
Conversation
@mrow4a, thanks for your PR! By analyzing the history of the files in this pull request, we identified @PVince81, @DeepDiver1975 and @phisch to be potential reviewers. |
lib/public/Share/IManager.php
Outdated
* @param Node $path | ||
* @param bool $reshares | ||
* @return IShare[] | ||
* @since 9.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice step forward, this is going in the right direction I think.
Next step is to take an array of nodes so you can do where file_source in (...)
(with array_chunk of 1000 blocks)
@@ -564,6 +564,16 @@ public function deleteFromSelf(IShare $share, $recipient) { | |||
/** | |||
* @inheritdoc | |||
*/ | |||
public function getAllSharesBy($userId, $shareTypes, $node, $reshares) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's directly add $limit
and $offset
to avoid having to change the interface again in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is not interested in any offsets, as name states, ALL :>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALL meaning "the whole result set contains all the shares of that user" but it doesn't mean we shouldn't provide a way to paginate over said result set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the same semantics as if we say "get all users" from an install that has 10000 users, we need a way to paginate as we don't want to return 10000 entries as it would be slow, especially for the UI parts
$allShares = $this->shareManager->getAllSharesBy( | ||
$this->userId, | ||
$requestedShareTypes, | ||
$node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next step is taking an array of nodes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
💥
|
Thomas, I just been doing prove of concept, as discussed I will now adjust unit tests :> Wanted to see how much we can get |
cf89ea6
to
c34c554
Compare
foreach ($allShares as $share) { | ||
$shareType = $share->getShareType(); | ||
if (in_array($shareType, $requestedShareTypes)) { | ||
$shareTypes[] = $shareType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PVince81 we need uniqueness here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we did with @PVince81 using hashmap and array_keys to do it efficiently will push together with unit tests
c34c554
to
7d100cd
Compare
@SergioBertolinSG @PVince81 do we have behat tests for this? if not we need to add them BEFORE this change - THX |
For already obtained nodes through Case 1 reduced by 104 -> 59 queries and -15% shorter execution time Case 2 If all checks pass, please switch to review @DeepDiver1975 @PVince81. Please also check with the customer to "tick" the last check from 1st post |
@DeepDiver1975 Yes, please test it, later I will write more unit tests to check more cases e.g IN() chunking . |
What are the approximate steps here? |
Steps:
The share types are unique, so if the user had two outgoing user shares, the type "user" only appears once. This field is used for the web UI to display the correct share status icon in each file row. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress!
@@ -104,34 +104,31 @@ public function initialize(\Sabre\DAV\Server $server) { | |||
} | |||
|
|||
/** | |||
* Return a list of share types for outgoing shares | |||
* Update cachedShareTypes for specific nodeIDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method is called get
not update
.
I suggest you keep it to be a get
method and transparently handle the cache. The caller doesn't need to know about the caching.
Basically a construct like:
function getSomething($key) {
if (!isset($this->cacheSomething[$key])) {
$this->cacheSomething[$key] = $this->getTheExpensiveThing($key);
}
return $this->cacheSomething[$key];
}
This is what we're doing in a lot of the OC code so let's keep this pattern.
foreach ($allShares as $share) { | ||
$currentNodeID = $share->getNodeId(); | ||
$currentShareType = $share->getShareType(); | ||
$this->cachedShareTypes[$currentNodeID][$currentShareType] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need caching here ? Is this function called repeatedly ?
|
||
// Put node ID into an array and initialize cache for it | ||
$nodeId = intval($childNode->getId()); | ||
array_push($nodeIdsArray, $nodeId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually use $nodeIdsArray[] = $nodeId;
. not sure if this makes a difference performance-wise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I change it? isnt it equivalent?
|
||
// Put node ID into an array and initialize cache for it | ||
$nodeId = intval($childNode->getId()); | ||
array_push($nodeIdsArray, $nodeId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually use $nodeIdsArray[] = $nodeId;
. not sure if this makes a difference performance-wise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I change it? isnt it equivalent?
@@ -90,6 +90,18 @@ public function moveShare(IShare $share, $recipientId); | |||
* Get shares shared by (initiated) by the provided user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust to say Get ALL shares
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
// Cache share-types obtaining them from DB | ||
$this->getNodesShareTypes($nodeIdsArray); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A get function that doesn't return anything. this will likely confuse other developers. Please make that function return something.
$nodeId = $this->userFolder->get($sabreNode->getPath())->getId(); | ||
$this->cachedShareTypes[$nodeId] = []; | ||
$this->getNodesShareTypes([$nodeId]); | ||
$shareTypesHash = $this->cachedShareTypes[$nodeId]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use the cache directly here and rely on the function doing that for you.
$shareTypesHash = $this->getNodesShareTypes([$nodeId]);
); | ||
} | ||
|
||
$qb->andWhere($qb->expr()->in('file_source', $qb->createParameter('file_source_ids'))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need chunking here with array_chunk
(because some databases have a max limit for their IN
operator...). See how it's done here: https://github.com/owncloud/core/blob/v9.1.4/lib/private/Share20/DefaultShareProvider.php#L930
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did chunking in different part of the code, in Manager, should I change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if another part of code written by a future dev calls getAllSharesBy
? It is better if chunking is done inside this to avoid every API consumer to have to think of doing chunking (which is likely to be duplicate code, if not forgotten)
@SergioBertolinSG I will today give you a full script to test it, initializing users/shares and profiling this etc. |
@PVince81 there are already tests about this: |
Ok Vincent I will add the code style you require and let's change label to review so they can test it |
@SergioBertolinSG thanks for the info, that's perfect. |
538f78b
to
4ce4db2
Compare
@PVince81 please review fixes and decide about changing to Review and get it tested on real sized DB? @SergioBertolinSG @SamuAlfageme @DeepDiver1975 is everything ok with Jenkins? |
rebase onto master to get rid of that JS failure |
4ce4db2
to
b0a63a6
Compare
6150e57
to
dd620b6
Compare
now the test failures are related to your changes |
dd620b6
to
331b82e
Compare
We should probably also smashbox this PR against all tests, especially share related. |
@DeepDiver1975 @PVince81 Please change flag to review to indicate the PR is ready |
@DeepDiver1975 @PVince81 Ok, flag changed, I just need to fullfil last tick:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not yet sure if all changes to the unit tests are valid .... I need to have a closer look
@@ -529,9 +529,20 @@ public function testGetAllSharedByWithReshares() { | |||
->setNode($node); | |||
$this->provider->create($share2); | |||
|
|||
for($i = 0; $i < 200; $i++) { | |||
$receiver = strval($i)."user2@server.com"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it relevant? It is just test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well - as a student you are allowed to learn even from tests 😉
@@ -529,9 +529,20 @@ public function testGetAllSharedByWithReshares() { | |||
->setNode($node); | |||
$this->provider->create($share2); | |||
|
|||
for($i = 0; $i < 200; $i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you create 200 shared here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test chunking of Node IDs.
|
||
$orX = $qb->expr()->orX(); | ||
$nodeIdsChunks = array_chunk($nodeIDs, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 100? should be 1000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in other parts of the code we do 100
|
||
$nodeIdsChunks = array_chunk($nodeIDs, 100); | ||
foreach ($nodeIdsChunks as $nodeIdsChunk) { | ||
$qb->select('*') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory you don't need to rebuild the whole query in this loop, you only need to re-set the value of the IN
operator.
You can leave this as is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, the andWhere calls will add duplicate where conditions to the query builder.
Code looks good to me now 👍 According to @SergioBertolinSG we already have integration tests for the "oc:share-types" property so since the tests passed I think this is ready to merge. It seems obvious to me that the performance will be significantly better and we anyway want this in 10.0. Perf testing can be done with tomorrow's daily after this is merged. To be discussed: backporting. @DeepDiver1975 any objections ? |
General conclusion here is: if you will have questions from customers -> we have function |
@mrow4a I think |
* | ||
* @param \OCP\Files\Node $node file node | ||
* @param IShare[] array containing shares |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix phpdoc
* | ||
* @return int[] array of share types | ||
* @param int[] array of folder/file nodeIDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix PHPDoc
@@ -462,6 +462,59 @@ public function move(\OCP\Share\IShare $share, $recipient) { | |||
/** | |||
* @inheritdoc | |||
*/ | |||
public function getAllSharesBy($userId, $shareTypes, $nodeIDs, $reshares) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the interface method looks like this
Node $node vs nodeIds - please correct
/**
* Get all shares by the given user for specified shareTypes array
*
* @param string $userId
* @param int[] $shareTypes
* @param Node|null $node
* @param bool $reshares Also get the shares where $user is the owner instead of just the shares where $user is the initiator
* @return \OCP\Share\IShare[]
* @since 10.0.0
*/
public function getAllSharesBy($userId, $shareTypes, $node, $reshares);
c29cbfc
to
274fe9b
Compare
274fe9b
to
ca8e114
Compare
Rebased. Now Jenkins, I know it's Friday, but still please do your job! |
…ries For already obtained nodes through `$folderNode->getDirectoryListing()`, query all share-types using IN() clause, injecting nodeID predicates directly into the query Full coverage with unit tests and fixes
ca8e114
to
28ff6fe
Compare
Backporting would have been nice but there are interface additions which could break. @DeepDiver1975 @mrow4a thoughts ? |
I consider backporting dark magic, but if it is possible why not :> |
Note: this code is in It looks like there is no "backport to stable10" because the code was already in master at the point where the stable10 branch was created. |
Hello guys, after exams I am back at work with a lot of new energy, knowledge and ideas. Lets start the fun with this PR.
The idea here is behavior of SharedPlugin, which I found out reviewing our code with BlackFire e.g:
This is example query if you have 100 NON-SHARED files, it put my attention on quering sharing table while we dont have shares there, and why dont we just 1 query or so..
I have been interested in what causes that amount of share_table accesses and started optimizing the code: