Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if it is possible to avoid reading all coordinate chunks when opening a dataset with xarray #34

Closed
abarciauskas-bgse opened this issue Jul 28, 2023 · 2 comments

Comments

@abarciauskas-bgse
Copy link
Contributor

Right now, xarray's open_zarr and open_dataset are significantly slower when coordinates are chunked as all coordinate chunks result in a request to S3.

Is it possible to either

  1. create a datastore where coordinates are not chunked
  2. open a dataset which has chunked coordinates but not fetch all the chunks.

Note: I tried decode_coords=False and the same issue results.

Related:
pydata/xarray#6633
pydata/xarray#7368
https://discourse.pangeo.io/t/puzzling-s3-xarray-open-zarr-latency/1074/11

From @maxrjones

a case in which the data are chunked along a dimension but the coordinates are not chunked. This is what we did for the CMIP6-downscaling pyramids to fetch the coordinates with one request but only fetch specific chunks of the data, e.g.,

import zarr
store = zarr.open("s3://carbonplan-cmip6/flow-outputs/results/0.1.9/pyramid/01df7816c64b3999/0/", mode="r")
print(f'tasmin chunks: {store["tasmin"].chunks}')
print(f'time chunks: {store["time"].chunks}')

tasmin chunks: (25, 128, 128)
time chunks: (1020,)
@abarciauskas-bgse
Copy link
Contributor Author

I've opened an issue in pangeo-forge about this pangeo-forge/pangeo-forge-recipes#554

But for now will probably use this code: https://github.com/developmentseed/tile-benchmarking/blob/feat/dont_chunk_coordinates/profiling/cmip6_zarr/rechunking.ipynb to produce zarr stores

@abarciauskas-bgse
Copy link
Contributor Author

Closing this for now as the issue is with pangeo-forge and being worked on in pangeo-forge/pangeo-forge-recipes#556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant