Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do Discovery and Propagation in parallel. #7589

Open
ogoffart opened this issue Nov 13, 2019 · 5 comments
Open

Do Discovery and Propagation in parallel. #7589

ogoffart opened this issue Nov 13, 2019 · 5 comments
Milestone

Comments

@ogoffart
Copy link
Contributor

Right now, the sync algorithm first do a discovery step to see what changes over the whole sync folder, then propagates files.
As a result, it might take a very long time before even starting to download or upload the first file, and restarting the sync has to do the discovery step from scratch again.
In order to solve the problem, we could do the propagation of downloading / uploading files, as long as we've discovered that folder.

How:

There might be different approach. But I guess the easier would be to merge the discovery's ProcessDirectoryJob and the PropagateDirectory into a single operation. I believe this might be the easier.
They could be kept separate and still run in parallel, but then it might be harder to synchronize.
Note that in any case, all the removes would still be done at the end.
Currently, only the removal of directory are done at the end, but under the new approach, we would also have to perform the file removal at the end.

Problems:

Some features will stop working. Notably:

  • The progress indication: It will be impossible to give a good progress of the sync since we have no idea how much is to be done in total.

Non-problems

Something that will not be a problem is the detection of moves/rename. We do not need to have the whole sync tree in memory for that. The new discovery algorithm already does move detection quite well without that.

Alternative

We may choose not to do that at all and cache the result of the discovery in a new table: #2976
This would solve the problem that restarting the first sync before the discovery is finished restart from zero.
This will however not solve the memory usage problem which will be solved by the first approach, and it will still (i think) be slower that the paralelization.

@ogoffart ogoffart added this to the 2.7.0 milestone Nov 13, 2019
@mmattel
Copy link
Contributor

mmattel commented Nov 13, 2019

The progress indication

Is it possible to dynamically not only change the current value but also change the target value (means the 100% reference value)? If this is possible, then you have a full dynamic progressbar, feeded by two dynamic streams. The current processed items and the taget number which changes based on parallel running gathering. If described, this would not be confusing.

@ogoffart
Copy link
Contributor Author

Is it possible to dynamically not only change the current value but also change the target value

Yes, that's possible, but that means that the progress might actually go backward.
(Start at 0%, then reach 50% then back to 25% then 85% then 10% then 90% and finally 100%, as files are discovered and transferred)

@mmattel
Copy link
Contributor

mmattel commented Nov 13, 2019

What about to write during syncing something like x of y items to process?
Both values will be dynamically written.
image

@michaelstingl
Copy link
Contributor

Yes, that's possible, but that means that the progress might actually go backward.
(Start at 0%, then reach 50% then back to 25% then 85% then 10% then 90% and finally 100%, as files are discovered and transferred)

Worth a try maybe if users see that the total number of discovered files is also increasing.

@ckamm
Copy link
Contributor

ckamm commented Nov 15, 2019

This looks sensible to me.

The progress indication may be a reason for keeping the discovery and propagation separate, even if propagation would now already start before discovery finishes. That way progress indication is possible as soon as discovery finishes, exactly like it is right now.

@TheOneRing TheOneRing modified the milestones: 2.7.0, 2.9.0 Mar 17, 2021
@TheOneRing TheOneRing modified the milestones: 2.11, 2.12 May 4, 2022
@TheOneRing TheOneRing modified the milestones: 4.0, Backlog Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants