-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add normalise input function #5
Conversation
103e999
to
d46ba26
Compare
Allows input normalisation function to be shared between ipfs and the http client.
d46ba26
to
e51ff72
Compare
src/files/normalise-input.js
Outdated
function toFileObject (input) { | ||
return { | ||
path: input.path || '', | ||
content: toAsyncIterable(input.content || input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both path
and content
are optional - i.e. you can pass { path: '/path/to/dir' }
so this won't work in this case.
src/files/normalise-input.js
Outdated
|
||
// Iterator<?> | ||
if (input[Symbol.iterator]) { | ||
if (!isNaN(input[0])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right check here? When would you pass an [NaN]
?
if (!isNaN(input[0])) { | |
if (Number.isInteger(input[0])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've just determined this object is iterable, but that does not mean it's an array and you can access values like input[0]
. Strictly speaking you should input[Symbol.iterator]().next().value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not is not a number means it's a number 😉 . Of course it'll happily accept floats and negative numbers, etc which would lead to 🤪. Number.isInteger
is probably better in this case..
I'd hoped to avoid calling .next()
on the iterator in case it consumes a value that otherwise wouldn't be re-iterated - e.g. more of a .peek()
-type mechanism. Could probably wrap the iterator in another iterator(!) that does something a bit more sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it-peek
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not is not a number means it's a number 😉 . Of course it'll happily accept floats and negative numbers, etc which would lead to 🤪.
😉 ...and other values are are not NaN
. isNaN
is just the worst. For the benefit of anyone else reading this the following non-numbers would also make it through!:
!isNaN(null) // true
!isNaN('123') // true
!isNaN(' ') // true
!isNaN([]) // true
!isNaN(new Date) // true
} | ||
|
||
throw new Error('Unexpected input: ' + typeof chunk) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the test for isBytes(chunk)
you could just remove the other tests and return Buffer.from(chunk)
since it'll already throw if it doesn't get one of the things you're checking for.
src/files/normalise-input.js
Outdated
if (input[Symbol.asyncIterator]) { | ||
return (async function * () { // eslint-disable-line require-await | ||
for await (const chunk of input) { | ||
yield toFileObject(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not support AsyncIterable<Bytes>
? I think fs.createReadStream
works because the stream it creates has a path
property and it is assumed to be a file object earlier in the code, but not all AsyncIterable<Bytes>
's will necessarily have this property.
* AsyncIterable<{ path, content: Blob }> | ||
* AsyncIterable<{ path, content: Iterable<Buffer> }> | ||
* AsyncIterable<{ path, content: AsyncIterable<Buffer> }> | ||
* ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No support for pull streams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to not include it here. My thinking was that all ipfs.add*
methods could route to ipfs._addAsyncIterator
, and it would be down to those methods to convert their arguments to something this method accepts. E.g. ipfs.addPullStream
would do the pull-stream
source to async iterator conversion, ipfs.addFromFs
would convert glob-matched paths to async iterators of file objects with paths & content readable streams, etc.
I'm working on integrating that with ipfs
and ipfs-http-client
so we'll see how realistic that is..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds nice, but you can call ipfs.add({ content: PullStreamSource })
so I think it's necessary, unless we want to duplicate a lot of the normalise work in both places.
src/files/normalise-input.js
Outdated
* Transform one of: | ||
* | ||
* ``` | ||
* Buffer|ArrayBuffer|TypedArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of simplicity and understanding maybe we should setup an alias for this:
* Buffer|ArrayBuffer|TypedArray | |
* Bytes (Buffer|ArrayBuffer|TypedArray|Iterable<Number>) | |
* Bloby (Blob|File) |
When I see Bytes
I think that they are part of a set of bytes for a single file (albeit there may only be one), when I see Bloby
I think that this is a whole file.
src/files/normalise-input.js
Outdated
})() | ||
} | ||
|
||
// Iterable<Buffer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's werid that Iterable<Buffer>
means multiple files but AsyncIterable<Buffer>
is a single file...
As I mentioned in an earlier comment I reckon we should interpret sets of Bytes
as a single file and sets of Blobys
as multiple files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - multiple Buffers should be treated as one file, multiple Blobs are multiple files. That's the intention anyway, I guess the comment isn't clear.
Realistically you'd need { path, content }
objects for multiple Blobs as they have no path themselves and the importer will fail complaining of multiple roots.
Alternatively we could widen it to accept HTML File objects which have a .name
property and use that as the path? The user would still need to pass the wrapWithDirectory
arg to the importer though because I think the .name
property is supposed to just be the filename, not the path to the file within a directory structure as with the { path, content }
object.
Urgh, what fun.
In general though I'm super happy to see a normalise function that actually defines and deals with all the cases properly. 🚀 |
return typeof obj === 'object' && (obj.path || obj.content) | ||
} | ||
|
||
function blobToAsyncGenerator (blob) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract as a module blob-to-it
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just mkdir iterables && touch blob-to-it.js
in this repo for less overhead
* feat: support pull streams This PR updates the `normaliseInput` function to accept pull streams. I've also made the following changes: 1. Update the docs for supported inputs * `Buffer|ArrayBuffer|TypedArray` is aliased as `Bytes` * `Blob|File` is aliased as `Bloby` * Added info for what a input "means" i.e. causes single/multiple files to be added 1. Peek the first item of an (async) iterator properly 1. Move file object check below `input[Symbol.asyncIterator]` check because Node.js streams have a path property that will false positive the `isFileObject` check 1. Fix `toFileObject` to allow objects with no `content` property 1. Simplify `toBuffer` to remove checks that `Buffer.from` already does License: MIT Signed-off-by: Alan Shaw <alan.shaw@protocol.ai> * fix: tests License: MIT Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>
src/files/normalise-input.js
Outdated
async function * readBlob (blob, options) { | ||
options = options || {} | ||
|
||
const reader = new global.FileReader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const reader = new global.FileReader() | |
const reader = new globalThis.FileReader() |
and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top
src/files/normalise-input.js
Outdated
} | ||
|
||
function isBloby (obj) { | ||
return typeof Blob !== 'undefined' && obj instanceof global.Blob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return typeof Blob !== 'undefined' && obj instanceof global.Blob | |
return typeof globalThis.Blob !== 'undefined' && obj instanceof globalThis.Blob |
and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top
src/files/normalise-input.js
Outdated
} | ||
|
||
// PullStream<?> | ||
if (typeof input === 'function') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/files/normalise-input.spec.js
Outdated
let BLOB | ||
|
||
if (supportsFileReader) { | ||
BLOB = new global.Blob([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BLOB = new global.Blob([ | |
BLOB = new globalThis.Blob([ |
and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top
License: MIT Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>
is this ready to be merged ? |
This PR follows on from this comment and allows an input normalisation function to be shared between ipfs and the http client.
All supported types are tested, at the moment they are (from the test output):
Can you think of any others? Could add tests for browser-centric typed arrays I guess?
Some notes:
ipfs.add
ing a bunch of new types for content such as strings, arrays full of numbers for a slightly more human-friendly API.ipfs.add
.Buffer.from
unless its absolutely necessary because it involves copying buffers which is slow.Iterable<Number>
involves peeking at the iterable and is a little fragile as it assumes the iterable is full of the same typeBuffer.from
copy problemI haven't integrated this with ipfs yet, just putting this up for early feedback.