Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: automatic client side CAR chunking for large data #588

Merged
merged 23 commits into from
Nov 16, 2021

Conversation

alanshaw
Copy link
Contributor

@alanshaw alanshaw commented Oct 12, 2021

storeBlob, storeDirectory and store now construct CAR files and then call storeCar (which POSTs to /upload). This automatically gives them CAR chunking capability.

It adds the following static functions:

NFTStorage.encodeBlob(blob: Blob): Promise<{ cid: CID, car: CarReader }>

NFTStorage.encodeDirectory(files: Blob[]): Promise<{ cid: CID, car: CarReader }>

NFTStorage.encodeNFT<T extends TokenInput>(input: T): Promise<{ cid: CID, token: Token<T>, car: CarReader }>

After encoding your CAR and obtaining the root CID you can call:

await client.storeCar(car)

🚨 There are trade offs here:

  1. We're always sending CAR files so our type field is going to always be Car for uploads from the JS client.

    To mitigate this we could simply inspect the root node of the CAR, from this we can assertain the type of the data. We should talk about our type field and what it means. Right now we've mapped it to the method of upload...lets resolve this and submit a separate PR to fix.

    FYI, I'm going to be implementing #355 soon so will be inspecting the root node of the CAR anyway for validation purposes.

  2. The files property is not set for the Multipart or Nft type uploads (since it's being uploaded as a CAR).

    I actually can't see any requests to the /status/:cid API (literally 0 - everyone uses /check/:cid) in the cloudflare logs which is the only place this data is exposed to users. I don't think folks will miss it. However if we inspect the root node we can get a shallow directory listing to put here.

    For Nft types we can't really set it anymore. There's also a regression right now where we're not setting it anyway. As an aside I'm not sure how much value it has since the data is just a list of file names, without paths within the object that is created...

resolves #220

@alanshaw alanshaw changed the title feat: automatic CAR chunking for storeBlob and storeDirectory feat: automatic client side CAR chunking for large data Oct 12, 2021
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Oct 12, 2021

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: f2a211d
Status: ✅  Deploy successful!
Preview URL: https://0d6e791d.nft-storage-1at.pages.dev

View logs

@alanshaw alanshaw force-pushed the feat/auto-car-chunking-for-big-data branch from 3b24c29 to e39ae0c Compare October 15, 2021 14:01
@alanshaw alanshaw marked this pull request as ready for review October 19, 2021 10:07
}

const { root: metadataJsonCid } = await pack({
// @ts-ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment why ignore is needed here

const { root: metadataJsonCid } = await pack({
// @ts-ignore
input: [
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be more straight forward ?

[{ path: 'metadata.json', content: JSON.stringify(data) }]

Which should be type compatible as per

https://github.com/ipfs/js-ipfs/blob/eba5fe6832858107b3e1ae02c99de674622f12b4/packages/ipfs-core-types/src/utils.ts#L21-L33
https://github.com/ipfs/js-ipfs/blob/eba5fe6832858107b3e1ae02c99de674622f12b4/packages/ipfs-core-types/src/utils.ts#L50-L57

And in node would remove need for web-stream polyfill etc..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: We probably need to factor this pattern into simple API because we keep repeating it bunch e.g. https://github.com/ipfs-shipyard/nft.storage/blob/f8b56d1577f5abc1fb640bcd19f8b3e4ade03e85/packages/niftysave/src/car.js#L4-L26

},
],
blockstore,
wrapWithDirectory: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: If we don't wrap it in dir isn't providing a path pointless ?

codec: dagCbor,
hasher: sha256,
})
await blockstore.put(block.cid, block.bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really confused on what is happening to this block, I see you write it to the block store but then I don't see you creating a car from it or passing it outside so would not this block just vanish ?

If this does the right thing can you write some comments to explain so the reader can have an easier time with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have figured it out now. It seems that blockstore isn't really optional in which case I understand that block doesn't vanish but rather ends up in the store.

Yet I would much rather change the interface as per other comments so that this function produces result as opposed to mutates things passed into it.

@@ -222,7 +222,7 @@ class NFTStorage {
validateERC1155(input)
const blockstore = new Blockstore()
try {
const token = await Token.encode(input, blockstore)
const token = await Token.Token.fromTokenInput(input, { blockstore })
onRootCidReady && onRootCidReady(token.ipnft)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I is not the change introduced here, but I think it is a mistake to expose options like onRootCidReady, onStoredChunk, maxRetries in high level API.

I would suggest handling such use case with more low level API e.g. NFTStorage.encodeCar(input):Promise<{ cid:CID, read():BlockstoreCarReader }> so that:

  1. User that needs to do something with a root ASAP, can do so as follows

    const car = await NFTStorage.encodeNFT({ ... })
    console.log(car.cid)
    await NFTStorage.storeCar(service, car.read(), { maxRetries: retryLimit })
  2. User can control the flow not just observe it

     const car = await NFTStorage.encodeNFT({ ... })
     if (shouldIupload(car.cid)) {
       await await NFTStorage.storeCar(service, car, options)
     }

Copy link
Contributor Author

@alanshaw alanshaw Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I'm clear, you're saying storeBlob, storeDirectory and store are high level and storeCar is low level - right?

I'm happy to not expose onRootCidReady, onStoredChunk, maxRetries from those high level APIs but retain onStoredChunk, maxRetries as options to storeCar.

In this world we'd have the following static functions to encode data as CARs that all return { cid: CID, car: BlockstoreCarReader }:

  • encodeNFT - encodes an object to a CAR
  • encodeBlob - encodes a blob to a CAR
  • encodeDirectory - encodes a directory of files to a CAR

So then you could use any one of those static functions to do the two step:

const { cid, car } = await NFTStorage.encodeNFT({ ... })
await NFTStorage.storeCar(service, car, options)

The existing storeBlob, storeDirectory and store methods will call the appropriate static functions.

Does that sound reasonable? I hope so - I'm going to update the PR accordingly.

@@ -222,7 +222,7 @@ class NFTStorage {
validateERC1155(input)
const blockstore = new Blockstore()
try {
const token = await Token.encode(input, blockstore)
const token = await Token.Token.fromTokenInput(input, { blockstore })
onRootCidReady && onRootCidReady(token.ipnft)
const car = new BlockstoreCarReader(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find control flow very confusing, contract seems to be:

  1. Token.fromTokenInput must be passblockstore (even though it is optional argument)
  2. Token.fromTokenInput will give you back object with ipnft and ensure that corresponding block is in blockstore.
  3. Caller than can use that blockstore to create a car with ipfnt block in it.

I suggest changing interface for Token.fromTokenInput (I would even rename it to Token.encodeAsCar(input)) so that it would just return object with car property representing a BlockstoreCarReader instead.

Or if we want to make sure root handling is possible ahead of BlockstoreCarReader instantiation than car property can become a read() method instead.

* @template {API.TokenInput} T
* @param {API.Encoded<T, [[Blob, Blob]]>} input
* @param {object} [options]
* @param {Blockstore} [options.blockstore]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is declared optional but it really must be provided.

Suggested change
* @param {Blockstore} [options.blockstore]
* @param {Blockstore} options.blockstore

* @returns {Promise<API.Token<T>>}
*/
static async fromTokenInput(input, options = {}) {
const blockstore = options.blockstore || new Blockstore()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per other comments this is not optional in current implementation

Suggested change
const blockstore = options.blockstore || new Blockstore()
const blockstore = options.blockstore

@alanshaw alanshaw requested a review from Gozala November 9, 2021 16:00
Copy link
Contributor

@Gozala Gozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry for the delay for some reason it was not showing up in github notifications so I had to remember to do this.

I think this revision looks great, only problem is bunch of functions switched from T input for metadata to API.Encoded<T, [[Blob, Blob]]> which I think is the mistake & will also cause API docs to become confusing as it will no longer be clear what users needs to pass.

If this change was motivated by trouble with type checker, please instead do localized ts-ignore and feel free to assign me a followup issue to resolve it.

I marked this as "request changes" but I think once types are changed back this can land without another review around.

Thanks again for taking over this

packages/client/src/lib.js Outdated Show resolved Hide resolved
packages/client/src/lib.js Outdated Show resolved Hide resolved
packages/client/src/lib.js Outdated Show resolved Hide resolved
@alanshaw
Copy link
Contributor Author

@alanshaw reminder to update #730 when this merges.

@alanshaw alanshaw changed the title feat: automatic client side CAR chunking for large data feat!: automatic client side CAR chunking for large data Nov 16, 2021
@alanshaw alanshaw force-pushed the feat/auto-car-chunking-for-big-data branch from fb2c33c to a42760e Compare November 16, 2021 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic client side CAR chunking for large data
2 participants