Does graphchain make sense for persisting partial computations of xarrays as zarr? #93

SohumB · 2022-08-17T00:21:38Z

Hi there!

I'm trying to evaluate graphchain for our usecase, and I'm not sure I fully understand the model. I'd appreciate help clarifying, if possible!

We have some relatively simple dask computations; backed by climate science zarr datastores, that eventually we want to compute location-specific metrics from. At any given time/compute step, only a small subset of these metrics will be necessary. Ideally, we'd like to persist and cache any subset of these metrics we've computed, but we don't really want to precompute and cache metrics for the entire world.

As I understand it: graphchain will examine the dask compute graph, identify nodes as caching opportunities, and persist them. I think this means that if a subset operation happens late in the compute graph, then dask moves that subset op back up the compute graph, and graphchain then starts by looking at the already subsetted computation graph?

Graphchain then persists this computed node; by default as a joblib dump, but you can write custom serializers that do different things if it sees a dask dataframe. So... what happens if I use to_zarr in a custom serializer? Are the dataframes we're talking about here already subsetted, or would calling to_zarr on them force them and compute the entire globe's worth of metric? The latter can't be the case; to_parquet would do the same... so does the key then include the chunk being persisted? Are those chunks the same as zarr chunks; if only by accident because our input is a zarr datastore?

I get the feeling that I'm not quite understanding something here. I'll be running some experiments of my own to try and figure out what graphchain is doing, but I figured I should just as well also ask the question!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does graphchain make sense for persisting partial computations of xarrays as zarr? #93

Does graphchain make sense for persisting partial computations of xarrays as zarr? #93

SohumB commented Aug 17, 2022

Does graphchain make sense for persisting partial computations of xarrays as zarr? #93

Does graphchain make sense for persisting partial computations of xarrays as zarr? #93

Comments

SohumB commented Aug 17, 2022