-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: automatic client side CAR chunking for large data #588
Conversation
Deploying with Cloudflare Pages
|
3b24c29
to
e39ae0c
Compare
packages/client/src/token.js
Outdated
} | ||
|
||
const { root: metadataJsonCid } = await pack({ | ||
// @ts-ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment why ignore is needed here
packages/client/src/token.js
Outdated
const { root: metadataJsonCid } = await pack({ | ||
// @ts-ignore | ||
input: [ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be more straight forward ?
[{ path: 'metadata.json', content: JSON.stringify(data) }]
Which should be type compatible as per
https://github.com/ipfs/js-ipfs/blob/eba5fe6832858107b3e1ae02c99de674622f12b4/packages/ipfs-core-types/src/utils.ts#L21-L33
https://github.com/ipfs/js-ipfs/blob/eba5fe6832858107b3e1ae02c99de674622f12b4/packages/ipfs-core-types/src/utils.ts#L50-L57
And in node would remove need for web-stream polyfill etc..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: We probably need to factor this pattern into simple API because we keep repeating it bunch e.g. https://github.com/ipfs-shipyard/nft.storage/blob/f8b56d1577f5abc1fb640bcd19f8b3e4ade03e85/packages/niftysave/src/car.js#L4-L26
packages/client/src/token.js
Outdated
}, | ||
], | ||
blockstore, | ||
wrapWithDirectory: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: If we don't wrap it in dir isn't providing a path pointless ?
packages/client/src/token.js
Outdated
codec: dagCbor, | ||
hasher: sha256, | ||
}) | ||
await blockstore.put(block.cid, block.bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really confused on what is happening to this block, I see you write it to the block store but then I don't see you creating a car from it or passing it outside so would not this block just vanish ?
If this does the right thing can you write some comments to explain so the reader can have an easier time with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have figured it out now. It seems that blockstore isn't really optional in which case I understand that block doesn't vanish but rather ends up in the store.
Yet I would much rather change the interface as per other comments so that this function produces result as opposed to mutates things passed into it.
packages/client/src/lib.js
Outdated
@@ -222,7 +222,7 @@ class NFTStorage { | |||
validateERC1155(input) | |||
const blockstore = new Blockstore() | |||
try { | |||
const token = await Token.encode(input, blockstore) | |||
const token = await Token.Token.fromTokenInput(input, { blockstore }) | |||
onRootCidReady && onRootCidReady(token.ipnft) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I is not the change introduced here, but I think it is a mistake to expose options like onRootCidReady, onStoredChunk, maxRetries
in high level API.
I would suggest handling such use case with more low level API e.g. NFTStorage.encodeCar(input):Promise<{ cid:CID, read():BlockstoreCarReader }>
so that:
-
User that needs to do something with a root ASAP, can do so as follows
const car = await NFTStorage.encodeNFT({ ... }) console.log(car.cid) await NFTStorage.storeCar(service, car.read(), { maxRetries: retryLimit })
-
User can control the flow not just observe it
const car = await NFTStorage.encodeNFT({ ... }) if (shouldIupload(car.cid)) { await await NFTStorage.storeCar(service, car, options) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I'm clear, you're saying storeBlob
, storeDirectory
and store
are high level and storeCar
is low level - right?
I'm happy to not expose onRootCidReady, onStoredChunk, maxRetries
from those high level APIs but retain onStoredChunk, maxRetries
as options to storeCar
.
In this world we'd have the following static functions to encode data as CARs that all return { cid: CID, car: BlockstoreCarReader }
:
encodeNFT
- encodes an object to a CARencodeBlob
- encodes a blob to a CARencodeDirectory
- encodes a directory of files to a CAR
So then you could use any one of those static functions to do the two step:
const { cid, car } = await NFTStorage.encodeNFT({ ... })
await NFTStorage.storeCar(service, car, options)
The existing storeBlob
, storeDirectory
and store
methods will call the appropriate static functions.
Does that sound reasonable? I hope so - I'm going to update the PR accordingly.
packages/client/src/lib.js
Outdated
@@ -222,7 +222,7 @@ class NFTStorage { | |||
validateERC1155(input) | |||
const blockstore = new Blockstore() | |||
try { | |||
const token = await Token.encode(input, blockstore) | |||
const token = await Token.Token.fromTokenInput(input, { blockstore }) | |||
onRootCidReady && onRootCidReady(token.ipnft) | |||
const car = new BlockstoreCarReader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find control flow very confusing, contract seems to be:
Token.fromTokenInput
must be passblockstore
(even though it is optional argument)Token.fromTokenInput
will give you back object withipnft
and ensure that corresponding block is in blockstore.- Caller than can use that blockstore to create a car with
ipfnt
block in it.
I suggest changing interface for Token.fromTokenInput
(I would even rename it to Token.encodeAsCar(input)
) so that it would just return object with car
property representing a BlockstoreCarReader
instead.
Or if we want to make sure root
handling is possible ahead of BlockstoreCarReader
instantiation than car
property can become a read()
method instead.
packages/client/src/token.js
Outdated
* @template {API.TokenInput} T | ||
* @param {API.Encoded<T, [[Blob, Blob]]>} input | ||
* @param {object} [options] | ||
* @param {Blockstore} [options.blockstore] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is declared optional but it really must be provided.
* @param {Blockstore} [options.blockstore] | |
* @param {Blockstore} options.blockstore |
packages/client/src/token.js
Outdated
* @returns {Promise<API.Token<T>>} | ||
*/ | ||
static async fromTokenInput(input, options = {}) { | ||
const blockstore = options.blockstore || new Blockstore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per other comments this is not optional in current implementation
const blockstore = options.blockstore || new Blockstore() | |
const blockstore = options.blockstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey sorry for the delay for some reason it was not showing up in github notifications so I had to remember to do this.
I think this revision looks great, only problem is bunch of functions switched from T
input for metadata to API.Encoded<T, [[Blob, Blob]]>
which I think is the mistake & will also cause API docs to become confusing as it will no longer be clear what users needs to pass.
If this change was motivated by trouble with type checker, please instead do localized ts-ignore
and feel free to assign me a followup issue to resolve it.
I marked this as "request changes" but I think once types are changed back this can land without another review around.
Thanks again for taking over this
fb2c33c
to
a42760e
Compare
storeBlob
,storeDirectory
andstore
now construct CAR files and then callstoreCar
(whichPOST
s to/upload
). This automatically gives them CAR chunking capability.It adds the following static functions:
After encoding your CAR and obtaining the root CID you can call:
🚨 There are trade offs here:
We're always sending CAR files so our
type
field is going to always beCar
for uploads from the JS client.To mitigate this we could simply inspect the root node of the CAR, from this we can assertain the type of the data. We should talk about our
type
field and what it means. Right now we've mapped it to the method of upload...lets resolve this and submit a separate PR to fix.FYI, I'm going to be implementing #355 soon so will be inspecting the root node of the CAR anyway for validation purposes.
The
files
property is not set for theMultipart
orNft
type uploads (since it's being uploaded as a CAR).I actually can't see any requests to the
/status/:cid
API (literally 0 - everyone uses/check/:cid
) in the cloudflare logs which is the only place this data is exposed to users. I don't think folks will miss it. However if we inspect the root node we can get a shallow directory listing to put here.For
Nft
types we can't really set it anymore. There's also a regression right now where we're not setting it anyway. As an aside I'm not sure how much value it has since the data is just a list of file names, without paths within the object that is created...resolves #220