Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_list_oids_traverse is much slower than _list_oids. #178

Closed
daavoo opened this issue Jan 4, 2023 · 4 comments
Closed

_list_oids_traverse is much slower than _list_oids. #178

daavoo opened this issue Jan 4, 2023 · 4 comments
Labels
performance improvement over resource / time consuming tasks

Comments

@daavoo
Copy link
Contributor

daavoo commented Jan 4, 2023

I have no context behind the introduction of _list_oids_traverse but, on every scenario and remote size I have tested, it is slower than plain _list_oids.

This has a negative performance impact for many operations, I reported the dvc gc case in

I am guessing the idea of list traverse was introduced before the adoption of fsspec as backend.
The reality today is that the combination of ThreadPoolExecutor and the single fsspec async loop causes the operation to be significantly slower than a single _list_oids call.

@dtrifiro dtrifiro added the performance improvement over resource / time consuming tasks label Jan 4, 2023
@dberenbaum
Copy link

@daavoo Do you have results to share for different scenarios?

@pmrowla
Copy link
Contributor

pmrowla commented Jan 5, 2023

for context on traverse/no-traverse:

https://iterative.ai/blog/dvc-vs-rclone
iterative/dvc#3488
iterative/dvc#3501


The main premise behind the optimization should still be valid even with fsspec/asyncio. For push/fetch/status it's better for us to estimate size of the remote based on the size of a single prefix, and then try and determine whether it's faster to finish listing the full remote or make individual exists calls.

For gc the size estimation isn't relevant since we have to list the full remote regardless, but the original implementation also included parallelizing the full listing by prefix which was a valid gc optimization. But this parallelization (via the threadpoolexecutor) is probably what is broken now, since with fsspec this all gets done in the single async thread instead.

Fixing this is really related to getting rid of our ThreadPoolExecutor usage in favor of letting fsspec handle the parallelization w/asyncio instead.

@daavoo
Copy link
Contributor Author

daavoo commented Jan 5, 2023

@daavoo Do you have results to share for different scenarios?

I have realized that I was incorrectly "micro benchmarking" _list_oids 🤦

I was benchmarking full GC and thought that I was only including the (_list_oids) change but, in reality, I was also including the removal of _expand_path in the underlying filesystem.

I quickly rerun for 50k, 100k, 150k, and 200k. _list_oids_traverse does start to be faster at 150k.

getting rid of our ThreadPoolExecutor usage in favor of letting fsspec handle the parallelization w/asyncio instead.

I think this still applies, though.

@dberenbaum
Copy link

@daavoo Do you have those times and comparisons to AWS CLI? It would be helpful to get an idea if it's 1.5x slower vs 10x slower.

@skshetry skshetry closed this as not planned Won't fix, can't repro, duplicate, stale Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance improvement over resource / time consuming tasks
Projects
None yet
Development

No branches or pull requests

5 participants