Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the cloud storage credentials when opening a datatree from ZarrStore (#9197) #9198

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vlevasseur073
Copy link

Copy link

welcome bot commented Jul 1, 2024

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@shoyer
Copy link
Member

shoyer commented Jul 16, 2024

This look good, but I'm concerned that it might break again in the future if we don't add any tests.

Is there any way we could test that storage_options gets passed on properly to Zarr? Maybe with a local directory store?

@vlevasseur073
Copy link
Author

Hi Stephan,
Indeed I am not really sure how to efficiently set tests on this. It should probably rather focus on the ZarrStore rather than this backend higher-level API, since here we are only passing this kwarg storage_options to the store. What is always a bit difficult is to get the correct structure of this storage_options to correctly pass the credentials. This could be very carefully documented, but it depends on the backend.
Another possibility could be to have more like an integration test where we simply try to open a sample zarr product from a private cloud storage. This requires to have such a storage and to add the credentials in the CI environment variables.
Please tell me if this helps, or if I can investigate anything.

@TomNicholas
Copy link
Member

Is this made redundant by all the extra kwargs that were passed through by #9377? That PR also passes through storage_options.

@TomNicholas
Copy link
Member

Actually this PR is still needed because

ds = open_dataset(filename_or_obj, group=parent, engine="zarr", **kwargs)
doesn't pass through storage_options.

The fix here ideally would refactor to use _iter_groups to build a full dict to pass to DataTree.from_dict.

@TomNicholas TomNicholas added the topic-zarr Related to zarr storage library label Sep 3, 2024
@TomNicholas
Copy link
Member

@vlevasseur073 please try this with main - I'm 80% sure we have already implemented what you need!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion needs work topic-backends topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Opening a datatree from S3 bucket with zarr store
5 participants