Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never ever abort sync runs / check that the connection is indeed broken before aborting on a timeout #5859

Closed
michaelstingl opened this issue Jun 26, 2017 · 25 comments
Assignees
Labels
Milestone

Comments

@michaelstingl
Copy link
Contributor

Some server side errors cause the client to abort the current sync run. If the ownCloud admin or user is not able to fix the error, EVERY sync run will abort at the same position, and following files that don't have an error will NEVER sync because of this.

@SamuAlfageme We discussed this a time ago. Please link existing issues here or open new ones that could could cause this behaviour. Please also elaborate on the server improvements.

@guruz
Copy link
Contributor

guruz commented Jun 26, 2017

Please report only if this happens with latest client version as this is something which changed a lot back and forth :-)
(CC @ckamm @ogoffart )

@guruz guruz added this to the 2.4.0 milestone Jun 26, 2017
@SamuAlfageme
Copy link
Contributor

SamuAlfageme commented Jun 26, 2017

WIP

Sync aborted - on server/network error

Sync aborted - on client/protocol error

Related Fixes

@ckamm
Copy link
Contributor

ckamm commented Jun 28, 2017

Incomplete list of potential sync run aborts during propagation stage, by looking at FatalError in the code

  • can't write to database
  • network error > NoError and <= UnknownProxyError
  • network error 503
  • timeout on download
  • critical disk space error on download

@ogoffart
Copy link
Contributor

I believe the current behavior is the correct one: We only abort the sync run if we experience fatal error (network down, server on maintainnence, ...)

@michaelstingl Do you have any concrete example of error that causes the sync to abort while it should not?

@michaelstingl michaelstingl modified the milestones: 2.4.0, 2.5.0 Jul 28, 2017
@cdamken
Copy link
Contributor

cdamken commented Aug 1, 2017

I believe the current behavior is the correct one: We only abort the sync run if we experience fatal error (network down, the server on maintenance, ...)

Wenn a user has some shares and/or has some external storages and/or federated shares and/or some inconsistency in the oc_filecache because the database has a problem (in some cases is normally fixed with occ files:scan to the user, the user that shares, etc) the synclient just stops on the error and does not sync the new files and folders (even if they are in a different path)

IMO, the expected behavior, the sync-client should mark those files, and try to upload the new files (because those files are probably needed for someone else) and at the end send a summary the main errors.

Like:
files x, y, z in folder w could not be downloaded because the server reported: connection timeout. ->Contact your administrator
files a, b, c could not be uploaded because chunks weren't received correctly, we will try to upload later. -> if the problem persist please contact your administrator.
files m,n,o were successfully uploaded. -> Synced after the errors occurred.

@ogoffart
Copy link
Contributor

ogoffart commented Aug 1, 2017

files x, y, z in folder w could not be downloaded because the server reported: connection timeout. ->Contact your administrator

The problem is that "Connection timeout" is also the error we get when the network is down.
Maybe when we get a "connection timeout", we should run the connection validator once more and only report a FatalError if the connection is indeed broken. otherwise mark it as a NormalError. That's some work and it may not play well with the parallelism, but that's an option.

@cdamken
Copy link
Contributor

cdamken commented Aug 1, 2017

The client get the error that is delivered from the client server, but as you said, is not necessarily disconnected, only for some files or folder that could have problems finishing the download/upload

@guruz guruz changed the title Never ever abort sync runs Never ever abort sync runs / be resilient to flaky network connection or wifi Aug 3, 2017
@michaelstingl
Copy link
Contributor Author

@SamuAlfageme Should we add #4622 (comment) to this meta-issue?

The current user experience is bad: in this case the client shows 403 error and aborts current sync run. The it restarts a new run which shows 403 and aborts. And so on.

@michaelstingl
Copy link
Contributor Author

michaelstingl commented Sep 25, 2017

@SamuAlfageme Another one More: #6049 #6050 – could you verify?

@michaelstingl
Copy link
Contributor Author

@guruz my initial focus wasn't about client side problems like flaky wifi, more about error states on the server side that abort the sync run every time at the same position, so that later files will never be synced (like in @cdamken 's example)

At ownCloud Conference 2017 a user even proposed to randomize the order to get a bigger chance that later files get synced, but I of course, I would prefer to fix the root cause of the problem.

@SamuAlfageme
Copy link
Contributor

@michaelstingl

@SamuAlfageme Another one More: #6049 #6050 – could you verify?

Those are not sync aborts but crashes. Will look into them to see what introduced 'em.

@michaelstingl
Copy link
Contributor Author

Those are not sync aborts but crashes.

Pain is the same: Later files will never get synced. oC desktop sync client need better strategies to deal with oC server or infrastructure fuckup.

@guruz
Copy link
Contributor

guruz commented Sep 25, 2017

At ownCloud Conference 2017 a user even proposed to randomize the order to get a bigger chance that later files get synced

This is something for @mrow4a ;-) SCNR

oC desktop sync client need better strategies to deal with oC server or infrastructure fuckup.

But if the local storage or the network connection go away then there is only so much you can do.. :-)

In general we all agree with the goal of making the discovery less costly and therefore earlier going to sync. 2.4 should already be better in this.

@michaelstingl
Copy link
Contributor Author

But if the local storage or the network connection go away then there is only so much you can do.. :-)

@guruz In the cases above it's not about local storage and network connection. It's about mounted storages that the oC server can't access temporarily or permanently. My expectation is, that oC client continues sync with the next available file/folder. And it seems we have enough cases where this isn't the case.

@ogoffart ogoffart changed the title Never ever abort sync runs / be resilient to flaky network connection or wifi Never ever abort sync runs / check that the connection is indeed broken before aborting on a timeout Sep 26, 2017
@ogoffart
Copy link
Contributor

The solution is what i write in
#5859 (comment)

When a propagator jobs end in a connection timeout, we assume it is because the network is down and we abort the sync. But before aborting the sync, we should do a connection validator to check if the network is down or if it is a problem with that file.

@mrow4a
Copy link
Contributor

mrow4a commented Sep 26, 2017

@ogoffar while it make sense for single file, when whole folder gets unavailable it is a signal that sth is wrong with the server. It could mean it is flaky network drive (user should manually remove it in server UI and client should halt for that period not to allow moving files across folders and materialize the state on faulty server) or federated share (which might lead to even worse because multiple users do changes and patent node of folder is down or faulty). What do you think ?

@michaelstingl
Copy link
Contributor Author

Looks like another one: #6435

@SamuAlfageme
Copy link
Contributor

@michaelstingl
Copy link
Contributor Author

Another candidate: #6826

ogoffart added a commit that referenced this issue Nov 29, 2018
ogoffart added a commit that referenced this issue Nov 29, 2018
ogoffart added a commit that referenced this issue Dec 17, 2018
@ogoffart ogoffart added the ReadyToTest QA, please validate the fix/enhancement label Dec 17, 2018
@michaelstingl michaelstingl added p3-medium Normal priority and removed p2-high labels May 29, 2019
@HanaGemela
Copy link
Contributor

@ogoffart @michaelstingl what exactly is ready to test here? Only abortion on error 503?

@ogoffart
Copy link
Contributor

ogoffart commented Jun 4, 2019

Right, when the server returns an error 503 for a file, (but the server is not in maintainence mode) it should still sync all other files.
I don't know exactly how to get error 503 on the server, this is usually because of a problem in the storage backend.
However, when the server is in maintainence mode, the error is also 503, but in this case, the sync needs to stop. (because otherwise all the files would be 503)

Maybe we can just close this issue.

@HanaGemela
Copy link
Contributor

Closing this aggregating issue. All individual issues will be/were tested separately. 503 error will be tested in #5088

@michaelstingl
Copy link
Contributor Author

Closing this aggregating issue. All individual issues will be/were tested separately.

👍 😍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants