feat: add normalise input function #5

achingbrain · 2019-08-31T07:49:17Z

This PR follows on from this comment and allows an input normalisation function to be shared between ipfs and the http client.

All supported types are tested, at the moment they are (from the test output):

  normalise-input
    String
      ✓ String
      ✓ { path: '', content: String }
      ✓ Iterable<{ path: '', content: String }
      ✓ AsyncIterable<{ path: '', content: String }
      ✓ PullStream<{ path: '', content: String }
    Buffer
      ✓ Buffer
      ✓ Iterable<Buffer>
      ✓ AsyncIterable<Buffer>
      ✓ PullStream<Buffer>
      ✓ { path: '', content: Buffer }
      ✓ { path: '', content: Iterable<Buffer> }
      ✓ { path: '', content: AsyncIterable<Buffer> }
      ✓ { path: '', content: PullStream<Buffer> }
      ✓ Iterable<{ path: '', content: Buffer }
      ✓ AsyncIterable<{ path: '', content: Buffer }
      ✓ PullStream<{ path: '', content: Buffer }
      ✓ Iterable<{ path: '', content: Iterable<Buffer> }>
      ✓ Iterable<{ path: '', content: AsyncIterable<Buffer> }>
      ✓ AsyncIterable<{ path: '', content: Iterable<Buffer> }>
      ✓ AsyncIterable<{ path: '', content: AsyncIterable<Buffer> }>
      ✓ PullStream<{ path: '', content: PullStream<Buffer> }>
    Blob
      ✓ Blob
      ✓ { path: '', content: Blob }
      ✓ Iterable<{ path: '', content: Blob }
      ✓ AsyncIterable<{ path: '', content: Blob }
      ✓ PullStream<{ path: '', content: Blob }
    Iterable<Number>
      ✓ Iterable<Number>
      ✓ { path: '', content: Iterable<Number> }
      ✓ Iterable<{ path: '', content: Iterable<Number> }
      ✓ AsyncIterable<{ path: '', content: Iterable<Number> }
      ✓ PullStream<{ path: '', content: Iterable<Number> }
    TypedArray
      ✓ TypedArray
      ✓ Iterable<TypedArray>
      ✓ AsyncIterable<TypedArray>
      ✓ PullStream<TypedArray>
      ✓ { path: '', content: TypedArray }
      ✓ { path: '', content: Iterable<TypedArray> }
      ✓ { path: '', content: AsyncIterable<TypedArray> }
      ✓ { path: '', content: PullStream<TypedArray> }
      ✓ Iterable<{ path: '', content: TypedArray }
      ✓ AsyncIterable<{ path: '', content: TypedArray }
      ✓ PullStream<{ path: '', content: TypedArray }
      ✓ Iterable<{ path: '', content: Iterable<TypedArray> }>
      ✓ Iterable<{ path: '', content: AsyncIterable<TypedArray> }>
      ✓ AsyncIterable<{ path: '', content: Iterable<TypedArray> }>
      ✓ AsyncIterable<{ path: '', content: AsyncIterable<TypedArray> }>
      ✓ PullStream<{ path: '', content: PullStream<TypedArray> }>

Can you think of any others? Could add tests for browser-centric typed arrays I guess?

Some notes:

We will now support ipfs.adding a bunch of new types for content such as strings, arrays full of numbers for a slightly more human-friendly API.
I didn't include Pull Streams because in the brave new future the calling code should transform them before passing it to the core ipfs.add.
I've tried to avoid Buffer.from unless its absolutely necessary because it involves copying buffers which is slow.
Detecting Iterable<Number> involves peeking at the iterable and is a little fragile as it assumes the iterable is full of the same type
Maybe at some point in the future we should revisit this and convert everything to BufferLists to avoid the Buffer.from copy problem

I haven't integrated this with ipfs yet, just putting this up for early feedback.

Allows input normalisation function to be shared between ipfs and the http client.

alanshaw · 2019-08-31T12:21:05Z

src/files/normalise-input.js

+function toFileObject (input) {
+  return {
+    path: input.path || '',
+    content: toAsyncIterable(input.content || input)


Both path and content are optional - i.e. you can pass { path: '/path/to/dir' } so this won't work in this case.

alanshaw · 2019-08-31T12:27:18Z

src/files/normalise-input.js

+
+  // Iterator<?>
+  if (input[Symbol.iterator]) {
+    if (!isNaN(input[0])) {


I'm not sure this is the right check here? When would you pass an [NaN]?

Suggested change

if (!isNaN(input[0])) {

if (Number.isInteger(input[0])) {

You've just determined this object is iterable, but that does not mean it's an array and you can access values like input[0]. Strictly speaking you should input[Symbol.iterator]().next().value

Not is not a number means it's a number 😉 . Of course it'll happily accept floats and negative numbers, etc which would lead to 🤪. Number.isInteger is probably better in this case..

I'd hoped to avoid calling .next() on the iterator in case it consumes a value that otherwise wouldn't be re-iterated - e.g. more of a .peek()-type mechanism. Could probably wrap the iterator in another iterator(!) that does something a bit more sensible.

Not is not a number means it's a number 😉 . Of course it'll happily accept floats and negative numbers, etc which would lead to 🤪.

😉 ...and other values are are not NaN. isNaN is just the worst. For the benefit of anyone else reading this the following non-numbers would also make it through!:

!isNaN(null) // true !isNaN('123') // true !isNaN(' ') // true !isNaN([]) // true !isNaN(new Date) // true

alanshaw · 2019-08-31T12:34:59Z

src/files/normalise-input.js

+  }
+
+  throw new Error('Unexpected input: ' + typeof chunk)
+}


After the test for isBytes(chunk) you could just remove the other tests and return Buffer.from(chunk) since it'll already throw if it doesn't get one of the things you're checking for.

alanshaw · 2019-08-31T12:47:24Z

src/files/normalise-input.js

+  if (input[Symbol.asyncIterator]) {
+    return (async function * () { // eslint-disable-line require-await
+      for await (const chunk of input) {
+        yield toFileObject(chunk)


Does not support AsyncIterable<Bytes>? I think fs.createReadStream works because the stream it creates has a path property and it is assumed to be a file object earlier in the code, but not all AsyncIterable<Bytes>'s will necessarily have this property.

alanshaw · 2019-08-31T12:51:47Z

src/files/normalise-input.js

+ * AsyncIterable<{ path, content: Blob }>
+ * AsyncIterable<{ path, content: Iterable<Buffer> }>
+ * AsyncIterable<{ path, content: AsyncIterable<Buffer> }>
+ * ```


No support for pull streams?

I was hoping to not include it here. My thinking was that all ipfs.add* methods could route to ipfs._addAsyncIterator, and it would be down to those methods to convert their arguments to something this method accepts. E.g. ipfs.addPullStream would do the pull-stream source to async iterator conversion, ipfs.addFromFs would convert glob-matched paths to async iterators of file objects with paths & content readable streams, etc.

I'm working on integrating that with ipfs and ipfs-http-client so we'll see how realistic that is..

That sounds nice, but you can call ipfs.add({ content: PullStreamSource }) so I think it's necessary, unless we want to duplicate a lot of the normalise work in both places.

alanshaw · 2019-08-31T13:11:19Z

src/files/normalise-input.js

+ * Transform one of:
+ *
+ * ```
+ * Buffer|ArrayBuffer|TypedArray


For the sake of simplicity and understanding maybe we should setup an alias for this:

Suggested change

* Buffer|ArrayBuffer|TypedArray

* Bytes (Buffer|ArrayBuffer|TypedArray|Iterable<Number>)

* Bloby (Blob|File)

When I see Bytes I think that they are part of a set of bytes for a single file (albeit there may only be one), when I see Bloby I think that this is a whole file.

alanshaw · 2019-08-31T13:17:15Z

src/files/normalise-input.js

+      })()
+    }
+
+    // Iterable<Buffer>


It's werid that Iterable<Buffer> means multiple files but AsyncIterable<Buffer> is a single file...

As I mentioned in an earlier comment I reckon we should interpret sets of Bytes as a single file and sets of Blobys as multiple files.

Yeah - multiple Buffers should be treated as one file, multiple Blobs are multiple files. That's the intention anyway, I guess the comment isn't clear.

Realistically you'd need { path, content } objects for multiple Blobs as they have no path themselves and the importer will fail complaining of multiple roots.

Alternatively we could widen it to accept HTML File objects which have a .name property and use that as the path? The user would still need to pass the wrapWithDirectory arg to the importer though because I think the .name property is supposed to just be the filename, not the path to the file within a directory structure as with the { path, content } object.

Urgh, what fun.

alanshaw · 2019-09-01T13:16:46Z

In general though I'm super happy to see a normalise function that actually defines and deals with all the cases properly. 🚀

alanshaw · 2019-09-03T09:01:35Z

src/files/normalise-input.js

+  return typeof obj === 'object' && (obj.path || obj.content)
+}
+
+function blobToAsyncGenerator (blob) {


Extract as a module blob-to-it?

or just mkdir iterables && touch blob-to-it.js in this repo for less overhead

* feat: support pull streams This PR updates the `normaliseInput` function to accept pull streams. I've also made the following changes: 1. Update the docs for supported inputs * `Buffer|ArrayBuffer|TypedArray` is aliased as `Bytes` * `Blob|File` is aliased as `Bloby` * Added info for what a input "means" i.e. causes single/multiple files to be added 1. Peek the first item of an (async) iterator properly 1. Move file object check below `input[Symbol.asyncIterator]` check because Node.js streams have a path property that will false positive the `isFileObject` check 1. Fix `toFileObject` to allow objects with no `content` property 1. Simplify `toBuffer` to remove checks that `Buffer.from` already does License: MIT Signed-off-by: Alan Shaw <alan.shaw@protocol.ai> * fix: tests License: MIT Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>

hugomrdias · 2019-09-03T16:58:48Z

src/files/normalise-input.js

+async function * readBlob (blob, options) {
+  options = options || {}
+
+  const reader = new global.FileReader()


Suggested change

const reader = new global.FileReader()

const reader = new globalThis.FileReader()

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

hugomrdias · 2019-09-03T16:59:49Z

src/files/normalise-input.js

+}
+
+function isBloby (obj) {
+  return typeof Blob !== 'undefined' && obj instanceof global.Blob


Suggested change

return typeof Blob !== 'undefined' && obj instanceof global.Blob

return typeof globalThis.Blob !== 'undefined' && obj instanceof globalThis.Blob

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

hugomrdias · 2019-09-03T17:06:48Z

src/files/normalise-input.js

+  }
+
+  // PullStream<?>
+  if (typeof input === 'function') {


https://github.com/dominictarr/is-pull-stream/blob/master/index.js

hugomrdias · 2019-09-03T17:08:31Z

test/files/normalise-input.spec.js

+let BLOB
+
+if (supportsFileReader) {
+  BLOB = new global.Blob([


Suggested change

BLOB = new global.Blob([

BLOB = new globalThis.Blob([

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

License: MIT Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>

hugomrdias · 2019-09-04T08:26:12Z

is this ready to be merged ?

achingbrain · 2019-09-04T08:49:14Z

achingbrain added the status/in-progress In progress label Aug 31, 2019

achingbrain requested review from alanshaw and hugomrdias August 31, 2019 07:49

achingbrain force-pushed the normalise-input branch 3 times, most recently from 103e999 to d46ba26 Compare August 31, 2019 09:20

feat: consolidate ipfs.add input normalisation

e51ff72

Allows input normalisation function to be shared between ipfs and the http client.

achingbrain force-pushed the normalise-input branch from d46ba26 to e51ff72 Compare August 31, 2019 09:31

alanshaw requested changes Aug 31, 2019

View reviewed changes

achingbrain mentioned this pull request Sep 1, 2019

feat: support blobs, file objects, iterators and async iterators #6

Closed

alanshaw reviewed Sep 3, 2019

View reviewed changes

alanshaw approved these changes Sep 3, 2019

View reviewed changes

alanshaw mentioned this pull request Sep 3, 2019

refactor: fetch add ipfs-inactive/js-ipfs-http-client#1087

Merged

3 tasks

hugomrdias suggested changes Sep 3, 2019

View reviewed changes

fix: review comments

2113f0b

License: MIT Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>

alanshaw requested a review from hugomrdias September 3, 2019 20:05

hugomrdias approved these changes Sep 4, 2019

View reviewed changes

hugomrdias merged commit b22b8de into master Sep 4, 2019

hugomrdias deleted the normalise-input branch September 4, 2019 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add normalise input function #5

feat: add normalise input function #5

achingbrain commented Aug 31, 2019 •

edited by alanshaw

Loading

alanshaw Aug 31, 2019

alanshaw Aug 31, 2019

alanshaw Aug 31, 2019

achingbrain Sep 1, 2019 •

edited

Loading

alanshaw Sep 1, 2019

alanshaw Sep 1, 2019

alanshaw Aug 31, 2019

alanshaw Aug 31, 2019

alanshaw Aug 31, 2019

achingbrain Sep 1, 2019

alanshaw Sep 1, 2019

alanshaw Aug 31, 2019

alanshaw Aug 31, 2019

achingbrain Aug 31, 2019

alanshaw commented Sep 1, 2019

alanshaw Sep 3, 2019

hugomrdias Sep 3, 2019

hugomrdias Sep 3, 2019

hugomrdias Sep 3, 2019

hugomrdias Sep 3, 2019

hugomrdias Sep 3, 2019

hugomrdias commented Sep 4, 2019

achingbrain commented Sep 4, 2019

	* Buffer\|ArrayBuffer\|TypedArray
	* Bytes (Buffer\|ArrayBuffer\|TypedArray\|Iterable<Number>)
	* Bloby (Blob\|File)

	const reader = new global.FileReader()
	const reader = new globalThis.FileReader()

	return typeof Blob !== 'undefined' && obj instanceof global.Blob
	return typeof globalThis.Blob !== 'undefined' && obj instanceof globalThis.Blob

feat: add normalise input function #5

feat: add normalise input function #5

Conversation

achingbrain commented Aug 31, 2019 • edited by alanshaw Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achingbrain Sep 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alanshaw commented Sep 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hugomrdias commented Sep 4, 2019

achingbrain commented Sep 4, 2019

achingbrain commented Aug 31, 2019 •

edited by alanshaw

Loading

achingbrain Sep 1, 2019 •

edited

Loading