You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be better to traverse it starting from the requested key(s), following through with their dependencies. A task does not need to be checked if all its dependents are already stored and will be loaded.
For instance, consider the following graph, a -> b -> c, where b is already stored.
If we request c, which is not stored, then we would need to check b, which is stored and hence loaded. Then, we don't need to check a, which is simply removed from the graph.
Currently, the optimizer step traverses the full dask graph in no particular order:
https://github.com/maurosilber/pipeline/blob/0cac8b8954b4def43e593040dced79807ac37f3a/pipeline/storage.py#L50-L58
It would be better to traverse it starting from the requested key(s), following through with their dependencies. A task does not need to be checked if all its dependents are already stored and will be loaded.
For instance, consider the following graph,
a -> b -> c
, whereb
is already stored.If we request
c
, which is not stored, then we would need to checkb
, which is stored and hence loaded. Then, we don't need to checka
, which is simply removed from the graph.We could adapt the
dask.cull
implementation.The text was updated successfully, but these errors were encountered: