Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Backwards compatibility of ipfs.add(urlSource(url)) #3195

Closed
Gozala opened this issue Jul 24, 2020 · 2 comments · Fixed by ipfs/js-ipfs-utils#53
Closed

Backwards compatibility of ipfs.add(urlSource(url)) #3195

Gozala opened this issue Jul 24, 2020 · 2 comments · Fixed by ipfs/js-ipfs-utils#53
Assignees
Labels

Comments

@Gozala
Copy link
Contributor

Gozala commented Jul 24, 2020

This came up in #3081 (comment) and me and @achingbrain agreed to continue discussion on the separate channel (this issue)

Problem

  1. ipfs.add is supposed to take a single file input (right now it will happily accept many, but that is going to change in the future).
  2. ursSource returns multiple files (see: https://github.com/ipfs/js-ipfs-utils/blob/master/src/files/url-source.js#L5)

Options to address this:

  1. Make ipfs.add support multiple files.
  2. Make ursSource return singe FileObject.
  3. Embrace native data types (E.g. DOM Response object)

My personal opinions are as follows

  • I really don't like the option 1. It would prevent us from taking advantage of add / addAll separation to reduce complexity.
  • Option 2. seems like a good compromise
    • Looking into urlSource implementation, there is no reason it for it to return AsyncIterable<FileObject> instead it could return { path: string, * content() { yield * (await new Http().get(url, options)).iterator() } }.
    • What's bad about urlSource is that it will cause buffering in browser because of streaming.
      • This could probably addressed by introducing another code path for browser which would return { path: sting, content: AsyncIterable<Blob> }
  • Option 3. Makes most sense to me, but it's a larger breaking change
    • urlSource essentially returns an async task (not the result) that add will run and consume results
      • This seems wrong to me add is not a scheduler for running tasks, it should take results instead.
    • Standard Response represents result of fetching remote data.
    • Conveniently it allows consumer to choose betwen blob and stream consumption strategies.
    • However it would require await fetch(url) on user end, but that makes sense (add isn't a task scheduler).
      • If we add should take on a task scheduler role, then it could accept standard Request instances.

With all that said I think best compromise would be to make ursSource return FileObject and I can take on that as a followup to #3081 that would enable tests that use urlSource now.

@Gozala Gozala added the need/triage Needs initial labeling and prioritization label Jul 24, 2020
@lidel
Copy link
Member

lidel commented Jul 27, 2020

Unsure what would be the generic use case for urlSource in browser context ("paste URL here to import it to IPFS" page for re-hosting stuff?), but for what its worth IPFS Companion would most likely switch to urlSource for importing images/videos/pages via context menu on pages. For this specific use case, buffering should not be a problem, because the data is already in the browser cache.

I agree the best compromise for now is Option 2.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 29, 2020

Unsure what would be the generic use case for urlSource in browser context ("paste URL here to import it to IPFS" page for re-hosting stuff?)

Not sure what you mean by generic use case, but https://github.com/inkandswitch/xcrpt add-on we did crawl the page to archive it into IPFS. However as I pointed out in the description (quoting below)

  • urlSource essentially returns an async task (not the result) that add will run and consume results
    • This seems wrong to me add is not a scheduler for running tasks, it should take results instead.

That makes urlSource(url) inadequate for that, because:

  1. Some URLs are inaccessible
  • Some resources can't be fetched (e.g. images) due to of CORS headers, even though they are in the cache.
  • Some URLs may refer to not found resources (404).
  1. To workaround some CSP issues you need to do some fetching in the add-on context as opposed to page context and pass things back and forth.

All this to suggest that ipfs.add / ipfs.addAll can't possibly recover from such errors, but user space code can. Which is why I find urlSource to be inadequate for most production use cases.

but for what its worth IPFS Companion would most likely switch to urlSource for importing images/videos/pages via context menu on pages. For this specific use case, buffering should not be a problem, because the data is already in the browser cache.

  1. You will likely face problems outlined in the section above, which would make urlSource to be a poor choice.
  2. If that is the case we should consider urlSource to use different code path in browser. Or else you'd fall of happy path as per Embracing web native FormData / File where possible #3029, because instead of passing blobs pointing to the in-memory resource you'd effectively be reading it via stream and then buffering it back into blob.

@jacobheun jacobheun added status/in-progress In progress and removed need/triage Needs initial labeling and prioritization labels Aug 20, 2020
achingbrain added a commit to ipfs/js-ipfs-utils that referenced this issue Jan 15, 2021
Changes the return type of this `urlSource` from `AsyncIterable<{ path: string, content: AsyncIterable<Uint8Array> }>` to `{ path: string, content: AsyncIterable<Uint8Array> }` removing one layer of asynchronicity and making return be a single file as opposed to collection of files that always happen to contain one file.

fixes: ipfs/js-ipfs#3195

Co-authored-by: achingbrain <alex@achingbrain.net>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants