-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
car fsspec reference file system output #2
Comments
That's a very neat idea! Do you want to give it a try? -- Otherwise I might also have a try on it, as I've been playing around with building reference files for grib data a few days ago. |
I may get there, but I would not be surprised if you beat me to it :-) |
😬 ... there might be an issue though: the 2MiB block size limit which is necessary to protect users from being DoS-ed by bad peers on IPFS. TLDR: you can only verify the correctness of a block after computing the hash of the entire thing. If someone sends you 100GB, you'd have to accept it only to throw it away after recognizing that it's just garbage. Anyways, as a consequence, one might have to split zarr chunks up into smaller blocks using e.g. Flexible Byte Layout (FBL) if it should be possible to transfer the blocks via IPFS. But as far as I understand the ReferenceFileSystem spec, it's not possible to assemble a fsspec-file from multiple parts of a (CAR-) file, which would be required to translate FBL into refs. As long as zarr chunks would stay below the 1 MiB soft-limit or the 2 MiB hard-limit, FBLs wouldn't be required and ReferenceFileSystem should be simple to build. Probably this isn't too bad though, as #1 could provide a nice way of packing the small 1MiB chunks into larger objects again, which would be pretty close to zarr sharding zarr-developers/zarr-python#877. |
Yes, the block size limit is a constraint. For my use cases, chunks that are 256x256 or 64x64x64 in size would be OK. Yes, maybe we need sharding support in zarr before this could be used generally. |
A sidenote that I mentioned to @thewtex, I think the second prototype from @jstriebel (zarr-developers/zarr-python#947) since it doesn't touch the store itself may be testable with this store?! |
@joshmoore, I'd say yes and no 😬 ... Technically, it should work on this store, as zarr would pass on "shards" to this store in stead of "chunks", but for the store it would look just the same. However, problems would arise indirectly due to the increased size of shards when compared to chunks. This is at least for two (related) reasons: verifying correctnessThis store builds up a Merkle-DAG from the pieces (shards / chunks) provided to the store. Usually (e.g. without any inlining or splitting of objects), each of these pieces whould be hashed as one thing and put into the DAG. If correctness of the data should be verified, the entire piece has to be downloaded and the hash has to be computed again. If a reader only wants to obtain a smaller "chunk" from a larger "shard", the reader can't verify correctness of the chunk. The same problem would arise if any form of Merkle-Tree would be used as discussed in zarr-developers/zarr-python#392. obtaining data from untrusted sourcesIf data should be received from untrusted sources (e.g. as common in p2p networks), verification of the pieces is a lot more important. For this reason, p2p networks usually place a hard upper limit on the size of individually hashed blocks. In case of bitswap (the IPFS network protocol), this is 2 MB. So it would be possible to encode larger blocks using what could we do?The main issue here is that hashes must be computed on a tree of smaller pieces in order to be able to verify parts of the array. This looks a lot like the "compression problem": we also want to compress only small chunks of the array in order to be able to read smaller parts. We then pack many of those parts into a larger shard, such that we can store multiple of those pieces together. If the shard would be arranged as a "little Merkle Tree" itself, we would be able to verify the correctness of individual chunks without having to download the entire shard. option 1: hashing on chunk-levelThis would require to have the hashes / CIDs / checksums of each "chunk" listed within some toplevel object, which could be the shard index. The toplevel object would then have to be hashed itself, such that there would be one single hash value for the entire shard, which can be used to verify the correctness of all the chunk-hashes. However, we maybe don't want to include the offset positions of the chunks within the shard as part of the toplevel hash, because then hashes would change if the order of chunks within the shard would be modified. Thus, a "verifyable shard" might conceptually look like:
A possible (maybe not ideal) format for such a "verifyable shard" would be something like CARv2. It's a bit redundant for this particular usecase, but maybe not too much... option 2: hashing on arbitrary slices of the shardAnother option would be to slice up the shards again on arbitrary boundaries, build the same kind of Merkle-Tree as above on top of the slices and store that in a similar format as the "verifyable shard" above. Thats about what the "flexibly byte layout" does. An advantage of this would be, that it would work on arbitrary chunk sizes (even if they are larger than any limit of any transport mechanism). The disadvantage would be, that there'll be quite a lot of duplicated work. Also I'd guess that "double chunking" turns out to be quite bad for performance. I don't yet have a clear picture which "route to sharding" would be best to get there. But I can't really see a way around hashing on chunk-level, if we want any kind of checksumming. In the |
I completely agree on this, and I arrived at this conclusion, also! I am going to take a stab at implementing a SharedStore. 💭 |
Tried this brilliantness! Looks to be very close, but I get an error?
Reproducible locally with: |
😬 that's unfortunate... I couldn't exactly reproduce your notebook ( |
oi ! Also, it seems we need the IPFS implementations to catch up so they can cat dag-cbor :-( |
yes, that's the big issue with the direct IPLD-approach... However, it's planned (ipfs/kubo#8640) for the next release of go-ipfs to support raw blocks and CAR export from gateways. This would likely simplify the use of generic IPLD structures over gateways. The other option probably would be to build additional unixfs trees on top of the IPLD raw-leafs (#3). Most of the data would not have to be duplicated, but still data would be available in both worlds... I'm really torn between "zarr on unixfs on IPLD" and "zarr on IPLD". The former seems to be more compatible whereas the latter seems to be more elegant (and may be more useful also as a StorageTransformer for other storage backends). Anyways, to go forward on this issue, you could either try installing multiformats from git, or we could write out own keys in the |
Great! -- raw-blocks + fsspec on gateways could be a big performance bump!
Worked! 🎉 🌮 |
|
I feel like I should applaud but I'm a bit lost as to what for :) Are you guys up for a pres., demo, or blog at some point? cc: @MSanKeys963 |
Hey @joshmoore thanks for getting back! There's probably not too much to applaud for yet, it's still work in progress. Probably later on, one might want to use (indexed) CARv2 instead and an CARv2-fs on top. Conceptually that's the same as using the reference-fs, however, the references would already be built in and don't have to be carried around separately. |
Hmmm..... so what's not working? 😄 |
Ok, so it's currently possible to create one large CAR of zarr-on-IPLD objects using
On its own, it's more or less a complicated way of storing a zarr into a zip store (or any other single-file backend), but it helps in exploring :-) What's not yet implemented, but has been discussed a bit here (probably out of scope of this issue) would be:
|
Hi, I've been working on a similar issue behind the scene. I was searching for a way to
I basically combined all the great work you did and added a few extra lines to build car_referencer -c "carfiles.*.car" -p preffs.parquet -r ROOT-HASH import xarray as xr
ds = xr.open_zarr("preffs:preffs.parquet") Maybe it is helpful for you as well. Feedback welcome. |
@d70-t what do you think about outputing an fsspec ReferenceFileSystem representation:
https://fsspec.github.io/kerchunk/spec.html
for the resulting car files? The motivation is to store the car files locally or AWS S3, e.g., and access them in Python while removing any need for an IPFS server or gateway.
The text was updated successfully, but these errors were encountered: