`_list_oids_traverse` is much slower than `_list_oids`. #178

daavoo · 2023-01-04T11:18:00Z

I have no context behind the introduction of _list_oids_traverse but, on every scenario and remote size I have tested, it is slower than plain _list_oids.

This has a negative performance impact for many operations, I reported the dvc gc case in

I am guessing the idea of list traverse was introduced before the adoption of fsspec as backend.
The reality today is that the combination of ThreadPoolExecutor and the single fsspec async loop causes the operation to be significantly slower than a single _list_oids call.

The text was updated successfully, but these errors were encountered:

dberenbaum · 2023-01-04T20:01:42Z

@daavoo Do you have results to share for different scenarios?

pmrowla · 2023-01-05T02:12:47Z

for context on traverse/no-traverse:

https://iterative.ai/blog/dvc-vs-rclone
iterative/dvc#3488
iterative/dvc#3501

The main premise behind the optimization should still be valid even with fsspec/asyncio. For push/fetch/status it's better for us to estimate size of the remote based on the size of a single prefix, and then try and determine whether it's faster to finish listing the full remote or make individual exists calls.

For gc the size estimation isn't relevant since we have to list the full remote regardless, but the original implementation also included parallelizing the full listing by prefix which was a valid gc optimization. But this parallelization (via the threadpoolexecutor) is probably what is broken now, since with fsspec this all gets done in the single async thread instead.

Fixing this is really related to getting rid of our ThreadPoolExecutor usage in favor of letting fsspec handle the parallelization w/asyncio instead.

daavoo · 2023-01-05T09:27:32Z

@daavoo Do you have results to share for different scenarios?

I have realized that I was incorrectly "micro benchmarking" _list_oids 🤦

I was benchmarking full GC and thought that I was only including the (_list_oids) change but, in reality, I was also including the removal of _expand_path in the underlying filesystem.

I quickly rerun for 50k, 100k, 150k, and 200k. _list_oids_traverse does start to be faster at 150k.

getting rid of our ThreadPoolExecutor usage in favor of letting fsspec handle the parallelization w/asyncio instead.

I think this still applies, though.

dberenbaum · 2023-01-06T19:29:31Z

@daavoo Do you have those times and comparisons to AWS CLI? It would be helpful to get an idea if it's 1.5x slower vs 10x slower.

dtrifiro added the performance improvement over resource / time consuming tasks label Jan 4, 2023

daavoo mentioned this issue Jan 4, 2023

gc: parallelize garbage collection iterative/dvc#5961

Closed

skshetry closed this as not planned Won't fix, can't repro, duplicate, stale Mar 6, 2024

zldrobit mentioned this issue Apr 18, 2024

add EXTRA_TRAVERSE_WEIGHT_MULTIPLIER environment var #291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`_list_oids_traverse` is much slower than `_list_oids`. #178

`_list_oids_traverse` is much slower than `_list_oids`. #178

daavoo commented Jan 4, 2023

dberenbaum commented Jan 4, 2023

pmrowla commented Jan 5, 2023

daavoo commented Jan 5, 2023

dberenbaum commented Jan 6, 2023

_list_oids_traverse is much slower than _list_oids. #178

_list_oids_traverse is much slower than _list_oids. #178

Comments

daavoo commented Jan 4, 2023

dberenbaum commented Jan 4, 2023

pmrowla commented Jan 5, 2023

daavoo commented Jan 5, 2023

dberenbaum commented Jan 6, 2023

`_list_oids_traverse` is much slower than `_list_oids`. #178

`_list_oids_traverse` is much slower than `_list_oids`. #178