-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharding array chunks across hashed sub-directories #115
Comments
See the extended conversation starting at https://gitter.im/zarr-developers/community?at=601d471fc83ec358be27944f Quick summary: @d-v-b points out that at least for S3 the nested storage strategy that has been put in place suffices to achieve the "5500 GET requests per second per prefix in a bucket" described under https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html i.e. my hope with |
@joshmoore thanks for the pointers! I agree, nested storage solves many of these issues. In fact, we have already been using it in some cases to solve exactly this problem. My main concern is that it only works well if your arrays have many chunks along multiple dimensions. But it's not uncommon to have most or all your chunks along a single dimension, e.g., a bunch of images stacked only along the "time" dimension. In these cases, you would still end up with either a very large number of sequential sub-directories or filenames, depending on dimension order. Hashing seems like a more comprehensive fix. |
As a point of reference, it looks like Neuroglancer & TensorStore align chunk file-names via a "compressed morton code": |
This has some similarities with the proposal in issue ( #82 ) |
Consider the case where we want to concurrently store and read many array chunks (e.g., millions). This is inherently pretty reasonable with many distributed storage systems, but not with Zarr's default keys for chunks of the form
{array_name}/{i}.{j}.{k}
:If we store a 10 TB array as a million 10 MB files in a single directory with sequential names, it would violate all of these guidelines!
It seems like a better strategy would be to store array chunks (and possibly other Zarr keys) across multiple sub-directories, where the sub-directory name is some apparently random but deterministic function of the keys, e.g., of the form
{array_name}/{hash_value}/{i}.{j}.{k}
or{hash_value}/{array_name}/{i}.{j}.{k}
, wherehash_value
is produced by applying any reasonable hash function to the original key{array_name}/{i}.{j}.{k}
.The right number of hash buckets would depend on the performance characteristics of the underlying storage system. But the key feature is that the random prefixes/directory names make it easier to shard load, and avoid the typical performance bottleneck of reading/writing a bunch of nearby keys at the same time.
Ideally the specific hashing function / naming scheme (including the number of buckets) would be stored as part of the Zarr metadata in a standard way, so as to facilitate reading/writing data with different implementations.
Any thoughts? Has anyone considered these sort of solutions, or encountered these scaling challenges in the wild? I don't quite have a strong need for this feature yet, but I imagine I may soon.
The text was updated successfully, but these errors were encountered: