feat!: automatic client side CAR chunking for large data #588

alanshaw · 2021-10-12T15:48:10Z

storeBlob, storeDirectory and store now construct CAR files and then call storeCar (which POSTs to /upload). This automatically gives them CAR chunking capability.

It adds the following static functions:

NFTStorage.encodeBlob(blob: Blob): Promise<{ cid: CID, car: CarReader }>

NFTStorage.encodeDirectory(files: Blob[]): Promise<{ cid: CID, car: CarReader }>

NFTStorage.encodeNFT<T extends TokenInput>(input: T): Promise<{ cid: CID, token: Token<T>, car: CarReader }>

After encoding your CAR and obtaining the root CID you can call:

await client.storeCar(car)

🚨 There are trade offs here:

We're always sending CAR files so our type field is going to always be Car for uploads from the JS client.

To mitigate this we could simply inspect the root node of the CAR, from this we can assertain the type of the data. We should talk about our type field and what it means. Right now we've mapped it to the method of upload...lets resolve this and submit a separate PR to fix.

FYI, I'm going to be implementing #355 soon so will be inspecting the root node of the CAR anyway for validation purposes.
The files property is not set for the Multipart or Nft type uploads (since it's being uploaded as a CAR).

I actually can't see any requests to the /status/:cid API (literally 0 - everyone uses /check/:cid) in the cloudflare logs which is the only place this data is exposed to users. I don't think folks will miss it. However if we inspect the root node we can get a shallow directory listing to put here.

For Nft types we can't really set it anymore. There's also a regression right now where we're not setting it anyway. As an aside I'm not sure how much value it has since the data is just a list of file names, without paths within the object that is created...

resolves #220

cloudflare-workers-and-pages · 2021-10-12T15:51:44Z

Deploying with Cloudflare Pages

Latest commit:	`f2a211d`
Status:	✅ Deploy successful!
Preview URL:	https://0d6e791d.nft-storage-1at.pages.dev

View logs

Gozala · 2021-10-19T20:24:38Z

packages/client/src/token.js

+      }
+
+      const { root: metadataJsonCid } = await pack({
+        // @ts-ignore


Can you please add a comment why ignore is needed here

Gozala · 2021-10-19T20:31:09Z

packages/client/src/token.js

+      const { root: metadataJsonCid } = await pack({
+        // @ts-ignore
+        input: [
+          {


Would this be more straight forward ?

[{ path: 'metadata.json', content: JSON.stringify(data) }]

Which should be type compatible as per

https://github.com/ipfs/js-ipfs/blob/eba5fe6832858107b3e1ae02c99de674622f12b4/packages/ipfs-core-types/src/utils.ts#L21-L33
https://github.com/ipfs/js-ipfs/blob/eba5fe6832858107b3e1ae02c99de674622f12b4/packages/ipfs-core-types/src/utils.ts#L50-L57

And in node would remove need for web-stream polyfill etc..

Aside: We probably need to factor this pattern into simple API because we keep repeating it bunch e.g. https://github.com/ipfs-shipyard/nft.storage/blob/f8b56d1577f5abc1fb640bcd19f8b3e4ade03e85/packages/niftysave/src/car.js#L4-L26

Gozala · 2021-10-19T21:25:44Z

packages/client/src/token.js

+          },
+        ],
+        blockstore,
+        wrapWithDirectory: false,


Question: If we don't wrap it in dir isn't providing a path pointless ?

Gozala · 2021-10-19T21:44:13Z

packages/client/src/token.js

+        codec: dagCbor,
+        hasher: sha256,
+      })
+      await blockstore.put(block.cid, block.bytes)


I am really confused on what is happening to this block, I see you write it to the block store but then I don't see you creating a car from it or passing it outside so would not this block just vanish ?

If this does the right thing can you write some comments to explain so the reader can have an easier time with this.

I have figured it out now. It seems that blockstore isn't really optional in which case I understand that block doesn't vanish but rather ends up in the store.

Yet I would much rather change the interface as per other comments so that this function produces result as opposed to mutates things passed into it.

Gozala · 2021-10-19T21:57:20Z

packages/client/src/lib.js

@@ -222,7 +222,7 @@ class NFTStorage {
    validateERC1155(input)
    const blockstore = new Blockstore()
    try {
-      const token = await Token.encode(input, blockstore)
+      const token = await Token.Token.fromTokenInput(input, { blockstore })
      onRootCidReady && onRootCidReady(token.ipnft)


I is not the change introduced here, but I think it is a mistake to expose options like onRootCidReady, onStoredChunk, maxRetries in high level API.

I would suggest handling such use case with more low level API e.g. NFTStorage.encodeCar(input):Promise<{ cid:CID, read():BlockstoreCarReader }> so that:

User that needs to do something with a root ASAP, can do so as follows

const car = await NFTStorage.encodeNFT({ ... }) console.log(car.cid) await NFTStorage.storeCar(service, car.read(), { maxRetries: retryLimit })

User can control the flow not just observe it

const car = await NFTStorage.encodeNFT({ ... }) if (shouldIupload(car.cid)) { await await NFTStorage.storeCar(service, car, options) }

Ok so I'm clear, you're saying storeBlob, storeDirectory and store are high level and storeCar is low level - right?

I'm happy to not expose onRootCidReady, onStoredChunk, maxRetries from those high level APIs but retain onStoredChunk, maxRetries as options to storeCar.

In this world we'd have the following static functions to encode data as CARs that all return { cid: CID, car: BlockstoreCarReader }:

encodeNFT - encodes an object to a CAR

encodeBlob - encodes a blob to a CAR

encodeDirectory - encodes a directory of files to a CAR

So then you could use any one of those static functions to do the two step:

const { cid, car } = await NFTStorage.encodeNFT({ ... }) await NFTStorage.storeCar(service, car, options)

The existing storeBlob, storeDirectory and store methods will call the appropriate static functions.

Does that sound reasonable? I hope so - I'm going to update the PR accordingly.

Gozala · 2021-10-19T22:07:12Z

packages/client/src/lib.js

@@ -222,7 +222,7 @@ class NFTStorage {
    validateERC1155(input)
    const blockstore = new Blockstore()
    try {
-      const token = await Token.encode(input, blockstore)
+      const token = await Token.Token.fromTokenInput(input, { blockstore })
      onRootCidReady && onRootCidReady(token.ipnft)
      const car = new BlockstoreCarReader(


I find control flow very confusing, contract seems to be:

Token.fromTokenInput must be passblockstore (even though it is optional argument)

Token.fromTokenInput will give you back object with ipnft and ensure that corresponding block is in blockstore.

Caller than can use that blockstore to create a car with ipfnt block in it.

I suggest changing interface for Token.fromTokenInput (I would even rename it to Token.encodeAsCar(input)) so that it would just return object with car property representing a BlockstoreCarReader instead.

Or if we want to make sure root handling is possible ahead of BlockstoreCarReader instantiation than car property can become a read() method instead.

Gozala · 2021-10-19T22:08:05Z

packages/client/src/token.js

+   * @template {API.TokenInput} T
+   * @param {API.Encoded<T, [[Blob, Blob]]>} input
+   * @param {object} [options]
+   * @param {Blockstore} [options.blockstore]


This is declared optional but it really must be provided.

Suggested change

* @param {Blockstore} [options.blockstore]

* @param {Blockstore} options.blockstore

Gozala · 2021-10-19T22:08:48Z

packages/client/src/token.js

+   * @returns {Promise<API.Token<T>>}
+   */
+  static async fromTokenInput(input, options = {}) {
+    const blockstore = options.blockstore || new Blockstore()


As per other comments this is not optional in current implementation

Suggested change

const blockstore = options.blockstore || new Blockstore()

const blockstore = options.blockstore

Gozala

Hey sorry for the delay for some reason it was not showing up in github notifications so I had to remember to do this.

I think this revision looks great, only problem is bunch of functions switched from T input for metadata to API.Encoded<T, [[Blob, Blob]]> which I think is the mistake & will also cause API docs to become confusing as it will no longer be clear what users needs to pass.

If this change was motivated by trouble with type checker, please instead do localized ts-ignore and feel free to assign me a followup issue to resolve it.

I marked this as "request changes" but I think once types are changed back this can land without another review around.

Thanks again for taking over this

packages/client/src/lib.js

alanshaw · 2021-11-15T17:41:57Z

@alanshaw reminder to update #730 when this merges.

alanshaw requested review from Gozala and hugomrdias October 12, 2021 15:48

alanshaw changed the title ~~feat: automatic CAR chunking for storeBlob and storeDirectory~~ feat: automatic client side CAR chunking for large data Oct 12, 2021

Gozala force-pushed the main branch from 234aebb to 093b83d Compare October 13, 2021 07:19

alanshaw force-pushed the feat/auto-car-chunking-for-big-data branch from 3b24c29 to e39ae0c Compare October 15, 2021 14:01

alanshaw marked this pull request as ready for review October 19, 2021 10:07

Gozala suggested changes Oct 19, 2021

View reviewed changes

alanshaw requested a review from Gozala November 9, 2021 16:00

Gozala suggested changes Nov 11, 2021

View reviewed changes

packages/client/src/lib.js Outdated Show resolved Hide resolved

packages/client/src/lib.js Outdated Show resolved Hide resolved

packages/client/src/lib.js Outdated Show resolved Hide resolved

alanshaw changed the title ~~feat: automatic client side CAR chunking for large data~~ feat!: automatic client side CAR chunking for large data Nov 16, 2021

Alan Shaw added 15 commits November 16, 2021 11:16

feat: automatic CAR chunking for storeBlob and storeDirectory

53e0891

fix: tests

6ad52a2

wip: auto chunking for store

2d7b391

wip: auto chunking for store

de7e22a

feat: implement client side IPNFT creation

a9c3740

refactor: move encoding code into token.js

eb3fd32

test: add coverage

786fcb2

fix: build issues

cd1ea1b

chore: update ipfs-car dep

14735d3

fix: documentation

c08d672

chore: some prettier formatting

867e911

fix: typo

64de528

fix: test assertion

6218e94

refactor: address feedback from review

41ce043

fix: input types

a42760e

alanshaw force-pushed the feat/auto-car-chunking-for-big-data branch from fb2c33c to a42760e Compare November 16, 2021 12:22

Alan Shaw added 2 commits November 16, 2021 12:24

fix: import consistency

fe76e88

fix: revert unnecessary change

11ad064

Alan Shaw added 6 commits November 16, 2021 12:47

fix: tweaks to tests

1108d58

refactor: extract car pack code to function

983ea2b

docs: add documentation for new static functions

f6ee37b

fix: some type fixes

981f2c2

fix: type issues

c2ecfe0

feat: also return CID from encodeNFT

f2a211d

alanshaw merged commit 437ae4f into main Nov 16, 2021

alanshaw deleted the feat/auto-car-chunking-for-big-data branch November 16, 2021 15:50

github-actions bot mentioned this pull request Nov 16, 2021

chore: release nft.storage 4.0.0 #790

Merged

alanshaw mentioned this pull request Nov 16, 2021

docs: add encodeBlob example #791

Merged

This was referenced Feb 8, 2022

chore(main): release client 6.0.0 #1294

Closed

chore(main): release nft.storage 6.0.0 #1290

Closed

chore(main): release nft.storage 6.0.0 #1300

Closed

chore(main): release nft.storage 6.0.0 #1303

Closed

chore(main): release nft.storage 6.0.0 #1305

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: automatic client side CAR chunking for large data #588

feat!: automatic client side CAR chunking for large data #588

alanshaw commented Oct 12, 2021 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 12, 2021 •

edited

Loading

Gozala Oct 19, 2021

Gozala Oct 19, 2021

Gozala Oct 19, 2021

Gozala Oct 19, 2021

Gozala Oct 19, 2021

Gozala Oct 19, 2021

Gozala Oct 19, 2021

alanshaw Nov 9, 2021 •

edited

Loading

Gozala Oct 19, 2021

Gozala Oct 19, 2021

Gozala Oct 19, 2021

Gozala left a comment

alanshaw commented Nov 15, 2021

	* @param {Blockstore} [options.blockstore]
	* @param {Blockstore} options.blockstore

	const blockstore = options.blockstore \|\| new Blockstore()
	const blockstore = options.blockstore

feat!: automatic client side CAR chunking for large data #588

feat!: automatic client side CAR chunking for large data #588

Conversation

alanshaw commented Oct 12, 2021 • edited Loading

cloudflare-workers-and-pages bot commented Oct 12, 2021 • edited Loading

Deploying with Cloudflare Pages

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alanshaw Nov 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala left a comment

Choose a reason for hiding this comment

alanshaw commented Nov 15, 2021

alanshaw commented Oct 12, 2021 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 12, 2021 •

edited

Loading

alanshaw Nov 9, 2021 •

edited

Loading