-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPT+RF of zarr downloads: do not wait for full files listing + compute %done from total zarr size #1443
Conversation
…ating downloads Delay assignment of file_qty until it is known, but otherwise proceed to initiating downloads
Since maxsize is dynamically computed as we go through the files. The idea, I guess, was that it would grow rapidly before actual downloads commense but it is not the case, so we endup with done% being always close to 100% since we get those reports on final downloads completed close to when individual files are downloaded. So this should close #1407 . But for total zarr file to be used, we needed to account also for skipped files. I added reporting of sizes for skipped files as well. It seems there is no negative side effect on regular files download. So now for the %done of zarr we might be getting to 100% of original size having downloaded nothing. But IMHO it is ok since user does not care as much of how many "subparts" are downloaded, but rather to have adequate progress report back. There also could be side effects if -e skip and we skip download of some updated files which would be smaller than the local ones so altogether we would get over 100% total at the end.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1443 +/- ##
==========================================
+ Coverage 88.61% 88.63% +0.01%
==========================================
Files 77 77
Lines 10563 10572 +9
==========================================
+ Hits 9360 9370 +10
+ Misses 1203 1202 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
… the modified code logic
e6d71bd
to
b17aa63
Compare
final_out: dict | None = None | ||
with interleave( | ||
[pairing(p, gen) for p, gen in download_gens.items()], | ||
downloads_gen(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interleave()
evaluates/consumes the iterable in its entirety before returning, so you haven't accomplished anything by making downloads_gen()
into a generator.
In order to add downloads to interleave()
as they're yielded by asset.iterfiles()
, you'll need to:
- Replace the call to
interleave()
with a directly-constructedInterleaver
instance - In separate thread, call the
Interleaver
'ssubmit()
method whenever a new file is yielded byasset.iterfiles()
, and then call itsfinalize()
method at end of iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you be so kind to approach this or otherwise address this issue the way you like it while also ideally reducing that delay / not demanding full listing to arrive to start the download?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or may be we could merge this (as resolves the issue) and then improve on top of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the error handling right for the solution I suggested would be very difficult, so we should probably just merge the current version for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's proceed! Do you want me to create a dedicated issue or you keep this on your plate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a dedicated issue.
Co-authored-by: John T. Wodder II <jwodder@users.noreply.github.com>
Thank you @yarikoptic and @jwodder. |
🚀 PR was released in |
Individual commits have more description.
Closes: #1407 . @kabilar could you please check if works for you ok'ish?
edit 1: I think initial OPT commit is not entirely complete -- somewhere in the sandwich of all the generators there is still waiting for an iterable to exhaust itself.