Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If a file download fails with 503, don't abort the whole sync #5088

Closed
guruz opened this issue Jul 28, 2016 · 28 comments
Closed

If a file download fails with 503, don't abort the whole sync #5088

guruz opened this issue Jul 28, 2016 · 28 comments
Assignees
Labels
Discussion ReadyToTest QA, please validate the fix/enhancement type:bug
Milestone

Comments

@guruz
Copy link
Contributor

guruz commented Jul 28, 2016

The failing file will fail with 503 Service Unavailable, the 2 other concurrently syncing files will get Operation Cancelled.

We could actually continue to sync the other 2 files.

HOWEVER what would this mean if all other files (more than the 3 currently syncing) lead to a 503? We'd probably not want to get an endless 503 sync..

@guruz guruz added this to the 2.3.0 milestone Jul 28, 2016
@dragotin
Copy link
Contributor

Where does the 503 come from?

The definition of 503 is: If the 503 is returned to a PROPFIND on root, it means that the server is in maintenance, and the whole sync goes offline.

If the 503 is returned on any operation on an individual file or (more imnportant) folder, it means that the folder is not available current, and is IGNORED. That behavior is important for external storages that can go away because of their own problems. The server will return 503 for them in that case, to avoid the client remove all their contents in the dir.

@guruz
Copy link
Contributor Author

guruz commented Jul 28, 2016

@dragotin See bug subject, file download :-)

@guruz guruz removed this from the 2.3.0 milestone Aug 24, 2016
@guruz
Copy link
Contributor Author

guruz commented Sep 26, 2016

Related #5187

@moscicki
Copy link
Contributor

Yes, we also see this and it affects our production services. One GET 503 will block the entire synchronization for all unrelated files as well.

@guruz
Copy link
Contributor Author

guruz commented Nov 16, 2018

@moscicki (semi-related) How long are your files usually 503?

@guruz guruz added the type:bug label Nov 16, 2018
@guruz guruz added this to the 2.6.0 milestone Nov 16, 2018
@ckamm
Copy link
Contributor

ckamm commented Nov 21, 2018

Mmh, looks like the client needs more careful backoff then. The other ask for 503s was "please give the server some breathing room when you see it" (for the overload or maintenance case). I guess we could keep going after the first and only abort when a certain number of errors have accumulated.

@moscicki If I understand you correctly you're using 503s on a resource level and they can be persistent? Is this related to the overloading with "503 storage unavailable"?

@guruz
Copy link
Contributor Author

guruz commented Nov 21, 2018

I guess we could keep going after the first and only abort when a certain number of errors have accumulated.

But what's the number?
Indeed maybe it would be better for @moscicki to mark the whole directory as "503 storage unavailable" as this won't abort the sync?

@moscicki
Copy link
Contributor

moscicki commented Nov 21, 2018 via email

@guruz
Copy link
Contributor Author

guruz commented Nov 22, 2018

@butonic @DeepDiver1975 @PVince81 @ogoffart Didn't we somewhere say we could support sending a Retry-After header (optional) on 503 GET in case we think the file will come back up soon and should not abort the whole sync?

@PVince81
Copy link
Contributor

AFAIK the server returns 503 either when the whole server is not available "Service not available" or whenever the underlying external storage is temporarily unavailable "Storage not available".

We have some logic that detects "unknown errors" in the storage and maps them to "Storage not available" so it is possible that in some specific yet undiscovered scenarios, legit storage specific errors that should actually be mapped to other errors.

As for "Retry-After", we could add that in the StorageNotAvailableException on Sabre level.

@guruz
Copy link
Contributor Author

guruz commented Nov 23, 2018

@moscicki Would this be OK for you? CERNBox could send Retry-After, then we treat the 503 differently?

@moscicki
Copy link
Contributor

So if I understand correctly if a file GET would return 503 + Retry-After then the sync would not be aborted but a GET retried after the specified amount of time? Without a Retry-After header the sync would be aborted as it is now.

Is this interpretation correct?

@moscicki
Copy link
Contributor

Ping.

Also, is there any contradiction of what we discuss here with this issue: #3932 ?

ogoffart added a commit that referenced this issue Nov 29, 2018
ogoffart added a commit that referenced this issue Nov 29, 2018
ogoffart added a commit that referenced this issue Dec 17, 2018
@ogoffart ogoffart added the ReadyToTest QA, please validate the fix/enhancement label Dec 17, 2018
@guruz
Copy link
Contributor Author

guruz commented Dec 17, 2018

It looks like the Retry-After suggestion is indeed #3932 (CC @butonic )

@ogoffart has implemented something else though in https://github.com/owncloud/client/pull/6906/files#diff-7e6f8d2579afb992a2e4ed25a154935bR78 ... so as long as the 503 doesnt look like the maintenance-mode-503 the sync will go on.
@ogoffart when will the next sync be triggered?

@labkode
Copy link
Contributor

labkode commented Jul 19, 2019

@guruz what is the status of this?

@michaelstingl
Copy link
Contributor

michaelstingl commented Jul 22, 2019

@guruz what is the status of this?

@labkode It should be improved in the upcoming 2.6.0. Please verify with the 2.6.0-alpha1 and provide feedback:
https://github.com/owncloud/client/releases/tag/v2.6.0-alpha1

@ckamm ckamm assigned ogoffart and unassigned ckamm Jul 23, 2019
@HanaGemela
Copy link
Contributor

HanaGemela commented Jul 29, 2019

Why have I got the message twice? Logs shared with @ckamm and @guruz
image

Also the Not synced protocol shows two entries - one without file name

image

@HanaGemela
Copy link
Contributor

Not really sure how to recreate but sometimes the message looks like this. Logs shared

Screenshot 2019-07-30 at 10 56 49

Screenshot 2019-07-30 at 10 56 18

@HanaGemela
Copy link
Contributor

I've tried with server in maintenance mode. While in maintenance mode, I've unchecked one folder in selective sync list and disabled VFS for another account. After turning off the maintenance mode, the client crashed 48a065f1-c5e8-4541-b427-9355b200dd62
Not sure if it is related to this issue or not. I've found the scenario with maintenance mode in the top level issue. In case it is not related, I'll raise a new issue

@ckamm
Copy link
Contributor

ckamm commented Jul 30, 2019

@HanaGemela Thanks for the logs. EDIT: Made a PR against the error duplication.

The crash is unrelated and worth its own issue if you can reproduce it. EDIT: The crash on sentry looks like an unspecific error in the Qt network stack unfortunately: https://sentry.io/organizations/owncloud/issues/1131537840/events/48a065f1c5e84541b4279355b200dd62/?project=79001&statsPeriod=14d&utc=true

ckamm added a commit that referenced this issue Jul 30, 2019
Previously fatal error texts were duplicated: Once they entered the
SyncResult via the SyncFileItem and once via syncError().

syncError is intended for folder-wide sync issues that are not pinned
to particular files. Thus that duplicated path is removed.

For #5088
ckamm added a commit that referenced this issue Jul 31, 2019
Previously fatal error texts were duplicated: Once they entered the
SyncResult via the SyncFileItem and once via syncError().

syncError is intended for folder-wide sync issues that are not pinned
to particular files. Thus that duplicated path is removed.

For #5088
@HanaGemela
Copy link
Contributor

@ckamm Great, now I see the error message only once.

Still some issues here:

  1. Sometimes the error is e:http:response_header as already described above
  2. When syncing a folder that is unavailable, the rest of the files are not synced
  3. When the folder becomes available, the client doesn't recover - folder is still shown as unavailable. I believe this is a server issue though

@HanaGemela HanaGemela removed the ReadyToTest QA, please validate the fix/enhancement label Aug 6, 2019
@ckamm
Copy link
Contributor

ckamm commented Aug 7, 2019

@HanaGemela

  1. "Undefined variable: http_response_header" is an error message the server sends. I don't know why it's intermittent.
  2. A 503 during discovery should mean "ignore that folder" and the sync should continue with other files. A 503 during propagation is normally a normal error where other files will be synced after. The exception is if the 503 reply contains ">Sabre\DAV\Exception\ServiceUnavailable<" because then we assume maintenance mode has been switched on and abort the sync.

@HanaGemela
Copy link
Contributor

@ckamm then it's broken, the sync should continue but it doesn't

@ckamm
Copy link
Contributor

ckamm commented Aug 7, 2019

@HanaGemela How do you set up the 503 reply for your tests? I need it to reproduce your specific case.

@HanaGemela
Copy link
Contributor

@ckamm Start two servers:

docker run -e OWNCLOUD_SHARING_FEDERATION_ALLOW_HTTP_FALLBACK=true -p 8080:8080 owncloud/server
docker run -e OWNCLOUD_SHARING_FEDERATION_ALLOW_HTTP_FALLBACK=true -p 8081:8080 owncloud/server

Then:

  1. User1 shares a folder with User2 from another server (federated share)
  2. Turn off the first server 
  3. Check the account of User2

@ckamm
Copy link
Contributor

ckamm commented Aug 12, 2019

Thanks, that helped. An unavilable federated share gets reported as:

<?xml version="1.0" encoding="utf-8"?>
<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
  <s:exception>Sabre\DAV\Exception\ServiceUnavailable</s:exception>
  <s:message>Storage is temporarily not available</s:message>
</d:error>

but the error handling code assumes this ServiceUnavailable means maintenance mode and considers it a fatal error.

ckamm added a commit that referenced this issue Aug 12, 2019
This is an unreliable workaround. The real fix will need to be deferred
to another release.

For #5088
@ckamm
Copy link
Contributor

ckamm commented Aug 12, 2019

@HanaGemela I've added a PR for an unreliable workaround in 2.6.0 - a reliable fix will need to be deferred to 2.6.1. (it currently relies on the s:message string, which is a user-facing translatable string; a correct solution would trigger a follow-up request to status.php to check for maintenance mode)

@ckamm ckamm added ReadyToTest QA, please validate the fix/enhancement and removed PR available labels Aug 22, 2019
ckamm added a commit that referenced this issue Aug 22, 2019
This is an unreliable workaround. The real fix will need to be deferred
to another release.

For #5088
@HanaGemela
Copy link
Contributor

Works as expected on 2.6.0rc1 (build 12411), mac 10.14.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion ReadyToTest QA, please validate the fix/enhancement type:bug
Projects
None yet
Development

No branches or pull requests

9 participants