Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Openstack Swift files download #18883

Merged
merged 1 commit into from
Apr 30, 2020
Merged

Optimize Openstack Swift files download #18883

merged 1 commit into from
Apr 30, 2020

Conversation

adrb
Copy link

@adrb adrb commented Jan 14, 2020

fix #14116

It improves efficiency when downloading files from Swift storage.
Before, files were downloaded and then pushed back to user.
That behaevior causes all kinds of performance problems.

Now, files are streamed directly to user.

@adrb
Copy link
Author

adrb commented Jan 14, 2020

I see that some tests failed but it doesn't seems that it was caused by patch, for e.g. that one

@kesselb kesselb added 3. to review Waiting for reviews bug enhancement and removed bug labels Jan 14, 2020
@kesselb
Copy link
Contributor

kesselb commented Jan 14, 2020

How does this optimization work? ;) I don't have a setup with openstack for testing. Out of curiosity: What happens with the patch below?

Index: lib/private/Files/ObjectStore/Swift.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- lib/private/Files/ObjectStore/Swift.php	(revision 092a1fb90f57639a15adc5a8e565b53debe26a8a)
+++ lib/private/Files/ObjectStore/Swift.php	(date 1579009971686)
@@ -107,12 +107,12 @@
 			}
 		}
 		$objectContent->rewind();
+//
+//		$stream = $objectContent->detach();
+//		// save the object content in the context of the stream to prevent it being gc'd until the stream is closed
+//		stream_context_set_option($stream, 'swift', 'content', $objectContent);
 
-		$stream = $objectContent->detach();
-		// save the object content in the context of the stream to prevent it being gc'd until the stream is closed
-		stream_context_set_option($stream, 'swift', 'content', $objectContent);
-
-		return RetryWrapper::wrap($stream);
+		return RetryWrapper::wrap($objectContent);
 	}
 
 	/**

@adrb
Copy link
Author

adrb commented Jan 14, 2020

Now it works pretty much the same as for S3. It setup authorization headers and then stream Swift file with fopen.

I don't think that patch you quoted will cause any difference since it will still waits for download to complete before returning from readObject function

@llebout
Copy link

llebout commented Jan 14, 2020

What about upload as well?

@icewind1991
Copy link
Member

The old code should already do streaming downloads from how I understand the openstack sdk/guzzle

@llebout
Copy link

llebout commented Jan 14, 2020

The old code should already do streaming downloads from how I understand the openstack sdk/guzzle

I could experience that it did not. I had to wait for the file to be downloaded fully on the server to be served a download in my browser.

@adrb
Copy link
Author

adrb commented Jan 14, 2020

What about upload as well?

I didn't touch that yet. But plan is to migrate my installation to Swift and upload also needs to handle large objects. However I already saw that at moment it don't, so I will probably take care of it too

The old code should already do streaming downloads from how I understand the openstack sdk/guzzle

I could experience that it did not. I had to wait for the file to be downloaded fully on the server to be served a download in my browser.

That's correct, It streams already downloaded files, which results in excessive disk space and I/O consumption. On top of that, waiting for multi gigabyte file results in connection timeout which is devastating, because what users do next? Hit the refresh button, and keep in mind that earlier file is still being downloaded ;)

@maniac777
Copy link

maniac777 commented Jan 14, 2020

@kesselb regarding your patch it does not change nature of the issue as file is downloaded few lines of code earlier.

public function readObject($urn) {
		try {
			$object = $this->getContainer()->getObject($urn);
			// we need to keep a reference to objectContent or
			// the stream will be closed before we can do anything with it
			$objectContent = $object->download();

here exacly to be precise - in download().

@icewind1991: unfortunetly the old code calls download() function from php-opencloud library. Just for conveniece I'm pasting it below. I believe that file is downloaded due to calling execute(). It blocks and saves entire file in temp dir.

    /**
     * This call will perform a `GET` HTTP request for the given object and return back its content in the form of a
     * Guzzle Stream object. Downloading an object will transfer all of the content for an object, and is therefore
     * distinct from fetching its metadata (a `HEAD` request). The body of an object is not fetched by default to
     * improve performance when handling large objects.
     *
     * @param array $data {@see \OpenStack\ObjectStore\v1\Api::getObject}
     */
    public function download(array $data = []): StreamInterface
    {
        $data += ['name' => $this->name, 'containerName' => $this->containerName];
        /** @var ResponseInterface $response */
        $response = $this->execute($this->api->getObject(), $data);
        $this->populateHeaders($response);
        return $response->getBody();
    }

@kesselb
Copy link
Contributor

kesselb commented Jan 14, 2020

regarding your patch it does not change nature of the issue as file is downloaded few lines of code earlier.

OK ;) I think we should keep the BadResponseError handling. That can still happen. You probably have to call $object->retrieve to trigger it.

@adrb
Copy link
Author

adrb commented Jan 15, 2020

OK ;) I think we should keep the BadResponseError handling. That can still happen. You probably have to call $object->retrieve to trigger it.

Even after successful HEAD request from $object->retrieve, actual reading with fopen can fail, so i'm not sure if it is worth it

@icewind1991
Copy link
Member

Ok, fair enough, I'm ok with this approach, thanks for the thorough explanation.

you should be able to remove the call to getObject I think to reduce overhead, since publicUri can be constructed from the bucket and object name.

Also a line or 2 documentation for getHeaders would be appreciated :)

@kesselb
Copy link
Contributor

kesselb commented Jan 15, 2020

Even after successful HEAD request from $object->retrieve, actual reading with fopen can fail, so i'm not sure if it is worth it

try {
return $this->objectStore->readObject($this->getURN($stat['fileid']));
} catch (NotFoundException $e) {
$this->logger->logException($e, [
'app' => 'objectstore',
'message' => 'Could not get object ' . $this->getURN($stat['fileid']) . ' for file ' . $path,
]);
throw $e;
} catch (\Exception $ex) {
$this->logger->logException($ex, [
'app' => 'objectstore',
'message' => 'Could not get object ' . $this->getURN($stat['fileid']) . ' for file ' . $path,
]);
return false;
}

A NotFoundException is handled different. #12500 introduced it. Probably @icewind1991 is able to tell if we still need to handle 404 different.

@icewind1991
Copy link
Member

For error handling you'll need to check the result from fopen and use $http_response_header to throw the relevant exceptions or return false.

@icewind1991
Copy link
Member

Since the cachedToken is not always set when constructing the swift client, some extra logic is needed to set it after authenticating, something like

https://gist.github.com/icewind1991/3f3088c17917c783ceb21d002e966727

@kesselb
Copy link
Contributor

kesselb commented Jan 15, 2020

Thanks @icewind1991 👍

I would also prefer to not add the getHeaders method to SwiftFactory. Of course it aligns with S3ObjectTrait but it's misleading IMO. A method to fetch the cached token from SwiftFactory should be enough. We can still add a getHeaders method later if needed.

@adrb
Copy link
Author

adrb commented Jan 15, 2020

Thank you for your input, pushed second version, I hope that meets your expectations.

@icewind1991

you should be able to remove the call to getObject I think to reduce overhead, since publicUri can be constructed from the bucket and object name.

Object store url is not the same as in swiftFactory::params['url'], and I didn't found simple solution for doing that. Second thing, it can be changed in further Swift versions.

@kesselb

I would also prefer to not add the getHeaders method to SwiftFactory. Of course it aligns with S3ObjectTrait but it's misleading IMO. A method to fetch the cached token from SwiftFactory should be enough. We can still add a getHeaders method later if needed.

Since constructing $cacheKey is not easy thing, we need some kind of helper method. But I agree that getHeaders wasn't the best solution.

@icewind1991
Copy link
Member

Looks good otherwise

thanks for the work 👍

@icewind1991 icewind1991 added this to the Nextcloud 19 milestone Jan 15, 2020
]
);

} catch (BadResponseException $e) {
if ($e->getResponse()->getStatusCode() === 404) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check getResponse for null.

Copy link
Contributor

@kesselb kesselb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍 Great contribution 🎉

@rullzer rullzer mentioned this pull request Apr 4, 2020
80 tasks
@rullzer rullzer mentioned this pull request Apr 9, 2020
59 tasks
@skjnldsv
Copy link
Member

Ping! 🔔

This was referenced Apr 15, 2020
@rullzer rullzer mentioned this pull request Apr 23, 2020
11 tasks
Copy link
Member

@rullzer rullzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐘

@rullzer
Copy link
Member

rullzer commented Apr 24, 2020

@adrb could you rebase this on latest master so the full CI can run one more time and then we get it in

Improves efficiency when downloading files from Swift storage.
Before, files were downloaded and then pushed back to user.
That behaevior causes all kinds of performance problems.

Now, files are streamed directly to user.

Signed-off-by: Adrian Brzezinski <adrian.brzezinski@eo.pl>
@adrb
Copy link
Author

adrb commented Apr 27, 2020

Could you also review changes from #18955? It's complementary to this one

@rullzer rullzer merged commit a1c1b35 into nextcloud:master Apr 30, 2020
@welcome
Copy link

welcome bot commented Apr 30, 2020

Thanks for your first pull request and welcome to the community! Feel free to keep them coming! If you are looking for issues to tackle then have a look at this selection: https://github.com/nextcloud/server/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22

@kesselb
Copy link
Contributor

kesselb commented Apr 30, 2020

Thanks @adrb 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Openstack Swift large file download
7 participants