Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

ipfs.add async Iterable not truly asynchronous. Does not function as intended. #2838

Closed
vaultec81 opened this issue Feb 19, 2020 · 18 comments
Closed
Labels
env:browser exp/expert Having worked on the specific codebase is important P1 High: Likely tackled by core team if no one steps up pkg:http-client Issues related only to ipfs-http-client status/deferred Conscious decision to pause or backlog

Comments

@vaultec81
Copy link

Hello
I have noticed after doing some experimentation that ipfs-http-client requires the input async iterable to be completed in order for the request to proceed. There is no output until the input async iterable is completed. Meaning the code functions no different then if an array was specified instead.
However, if using js-ipfs in process daemon ipfs.add will function as intended. Returning CIDs through the output async iterable as the input async iterable is processed.
In my use case I am reading the output, and continuing to pass input as it is processed.
Example code

//Source with no definite end.
for await (var out of ipfs.add(source)) {
    console.log(out.cid.string);
    //Never gets executed
}
@achingbrain
Copy link
Member

When you see this behaviour are you running against a remote go or js daemon?

@vaultec81
Copy link
Author

@achingbrain go-ipfs. I have confirmed it works once async iterable is finished.

@achingbrain
Copy link
Member

achingbrain commented Feb 19, 2020

Is the behaviour the same with a js-ipfs daemon? (I'm assuming when you said "if using js-ipfs ipfs.add will function as intended" you meant an in-process node rather than a remote one).

@vaultec81
Copy link
Author

The behavior is different (working as intended) with an in process js-ipfs daemon. Sorry I did not make that clear.

@vaultec81
Copy link
Author

Doing some further analysis reveals no request is actually sent to HTTP API Port 5001 for go-ipfs daemon. I believe the issue lies here https://github.com/ipfs/js-ipfs-http-client/blob/e9ce88d2523663b9730a0ffa28c6791f45976549/src/add/form-data.js#L15-L61

@alanshaw
Copy link
Member

@vaultec81 are you running your code in the browser or in Node.js? If in the browser, ensure you've setup CORS correctly on your daemon. If in Node.js then what version are you running? There's an issue with ky in Node.js v13 that could be the same problem as this. If you're using v13 please try again with Node.js v12.

@vaultec81
Copy link
Author

@alanshaw I am running node v12.

@alanshaw
Copy link
Member

Ah, I just re-read your issue and yes this is a problem with how FormData works with the fetch API. File contents may be streamed but FormData needs to know the full list of files before it can be passed to the fetch API.

@hugomrdias
Copy link
Member

yet there is no support for streaming Requests in browsers yet, only streaming Responses.
Either we implement chunking support or we wait for browsers to implement streaming.

@vaultec81
Copy link
Author

@hugomrdias I don't see it being too hard to implement streamed uploads. it does require some changes to the ipfs http api specifications. I don't if its possible to do streamed multipart uploads. Potentially a websocket connection for /api/v0/add could be used along side the normal mechanism.

Either we implement chunking support or we wait for browsers to implement streaming.

This isn't specifically a browser issue.

@hugomrdias
Copy link
Member

browsers can not stream requests using HTTP, no way around it.

@vaultec81
Copy link
Author

@hugomrdias Can you please go into more detail what you mean by "stream requests"

@hugomrdias
Copy link
Member

one can only send a blob/buffer from browsers using HTTP, there is no way to stream bytes from the browser.

@vaultec81
Copy link
Author

@hugomrdias I don't see it being a big deal implementing chunked request multipart uploads, Or better yet websocket which can upload data in a streamed fashion. Although websocket is not exactly HTTP, it is widely supported by the majority of web browsers, and common web infrastructure.

@hugomrdias
Copy link
Member

its not that's why i changed this from bug to feature.
we are totally open to PRs that implement chunked uploads. want to give it a go ?

@vaultec81
Copy link
Author

@hugomrdias I believe it should still be treated as if its a bug as it causes functionality differences between js-ipfs in process daemon, and js-http-client. I am open to the possibly of submitting a PR. However, I believe it should be up to the IPFS team to develop a universal solution.

@achingbrain achingbrain transferred this issue from ipfs-inactive/js-ipfs-http-client Mar 9, 2020
@achingbrain achingbrain added the pkg:http-client Issues related only to ipfs-http-client label Mar 9, 2020
@achingbrain
Copy link
Member

The problem here is that in order for ipfs.add to be streaming in the browser, browsers need to support streaming request bodies.

The documented way to do that is to pass a ReadbleStream as the request body, the problem is that no browser has implemented this - see the Send ReadableStream in request body line of the MDN docs.

It doesn't help that the MDN docs talk about it like it's something you can use today.

I think we have a few options:

  1. Ditch RPC over HTTP and do it over websockets
  2. Chunked uploads as per [WIP] feat: support chunked add requests #1540
  3. Copy some of the IPFS core logic into the client, e.g. get it to use the unixfs-importer and orchestrate the various other API calls the core internals make when importing files

My thoughts are not fully-formed on the matter but I think my preference would be for 3 though it'll have an affect on the bundle size.

@bluelovers
Copy link
Contributor

maybe some thing like this
https://javascript.info/resume-upload

@achingbrain achingbrain added the P1 High: Likely tackled by core team if no one steps up label Jun 25, 2020
@jacobheun jacobheun added status/ready Ready to be worked exp/expert Having worked on the specific codebase is important labels Jul 2, 2020
@autonome autonome removed the status/ready Ready to be worked label Jul 2, 2020
@autonome autonome added the status/deferred Conscious decision to pause or backlog label Jul 2, 2020
SgtPooki referenced this issue in ipfs/js-kubo-rpc-client Aug 18, 2022
Adds a server running a gRPC endpoint over websockets running on port 5003, a `ipfs-grpc-client` module to access the server and a `ipfs-client` module that uses the gRPC client with HTTP fallback.

This is to solve shortcomings and limitations of the existing HTTP API and addresses the concerns raised in the 'Streaming HTTP APIs and errors, y u no work?' session we had at IPFS team week in NYC.

## Key points

1. Enables full duplex communication with a remote node

When making an HTTP request in the browser, a [FormData][] object must be created. In order to add all the values to the FormData object, an incoming stream must be consumed in its entirety before the first byte is sent to the server.

This means you cannot start processing a response before the request has been sent, so you cannot have full-duplex communication between client and server over HTTP. This seems unlikely to change in the near future.

With a websocket transport for gRPC-web, individual messages can be sent backwards and forwards by the client or the server enabling full-duplex communication.  This is essential for things like progress events from `ipfs.add` in the short term, and exposing the full stream capabilities of libp2p via remote client in the long term.

2. Enables streaming errors

The existing HTTP API sends errors as HTTP trailers.  No browser supports HTTP trailers so when a stream encounters an error, from the client's point of view the stream just stops with no possibility of finding out what happened.

This can also mask intended behaviour cause users to incorrectly interpret the API. For example if you specify a timeout to a DHT query and that timeout is reached, in the browser the stream ends without an error and you take away the results you've received thinking all is well but on the CLI the same operation results in a non-zero exit code.

A websocket transport has no restrictions here, since full-duplex communication is possible, errors can be received at any time.

3. Listens on websockets with no HTTP fallback

gRPC-web exists and is a way of exposing a gRPC service over HTTP.  Whereas gRPC supports four modes (unary, e.g. one request object and one response object, client streaming, server streaming and bidirectional streaming), gRPC-web only supports [unary and server streaming](https://github.com/grpc/grpc-web#wire-format-mode).  This is due to limitations of the web platform mentioned above and doesn't give us anything over our existing HTTP API.

The gRPC-web team are evaluating several options for client and bidirectional streaming, all of which require new capabilities to be added to browsers and none of which will be available in a reasonable time frame.

Notably they have [no plans to use websockets](https://github.com/grpc/grpc-web/blob/master/doc/streaming-roadmap.md#issues-with-websockets) as a transport, even though it solves the problems we have today.

The team from [improbable](https://improbable.io/) maintain a [gRPC-web-websockets bridge](https://github.com/improbable-eng/grpc-web) which the client added by this PR is compatible with.  Their bridge also has a go implementation of a [reverse proxy](https://github.com/improbable-eng/grpc-web/tree/master/go/grpcwebproxy) for use with gRPC servers to turn them into gRPC-web servers with an optional websocket transport.

My proposal is to embrace the use of websockets to solve our problems right now, then move to whatever streaming primitive the gRPC-web team settle on in the years to come.

As implemented there's only websockets here and no HTTP fallback as the existing HTTP API works fine for unary operations so there's little to be gained by blocking this work on reimplementing the whole of the HTTP API in gRPC-web, and the client can pick and choose which API it'll use per-call.

By running the websocket server on a different port to the existing HTTP API it gives us room to add gRPC-web fallback for the API if we find that useful.

4. Has protobuf definitions for all requests/responses

See the [ipfs-grpc-protocol](https://github.com/ipfs/js-ipfs/tree/feat/add-grpc-server-and-client/packages/ipfs-grpc-protocol) module, which contains definitions for API requests/reponses.

They've been ported from the existing API and will need some checking.

The [ipfs-grpc-server/README.md](https://github.com/ipfs/js-ipfs/blob/feat/add-grpc-server-and-client/packages/ipfs-grpc-server/README.md) has a rundown of the websocket communication protocol that was ported from [improbable-eng/grpc-web](https://github.com/improbable-eng/grpc-web).

5. Options as metadata

When making a request, metadata is sent during the preamble - these take the form of a string identical to HTTP headers as the initial websocket message - I've used this mechanism to send the options for a given invocation.

Notably these are not defined as a protocol buffer, just an unspecified list of simple key/value pairs - maybe they should be to ensure compatibility between implementations?

This will be trivial in the implementation in the PR as it contains a server implementation too but to do it in go will require patching or forking the improbable gRPC proxy.

6. Errors as metadata

Similar to the existing HTTP API, message trailers are used to send errors.  Four fields are used to re-construct the error on the client side:

| Field | Notes | 
| ----- | ----- |
| grpc-status  | 0 for success, 1+ for error |
| grpc-message | An error message |
| grpc-stack   | A stack trace with `\n` delimited lines |
| grpc-code    | A string code such as `'ERROR_BAD_INPUT'` that may be used for i18n translations to show a message to the user in their own language |

Similar to options these fields are unspecified, if a convention is not enough, perhaps they should be specified as a protobuf and the trailer sent as binary?

7. Streams

When sending data as part of an `ipfs.add`, we send repeated messages that contain a path, a content buffer and an index.  The index is used to differentiate between streams - path cannot be used as it could be empty.  Only the first supplied `path` is respected for a given index. On the server separate input streams are created for each file being added.  A file stream is considered closed when an unset or empty content buffer is received.  Ultimately this will allow us to apply backpressure on a per-file basis and read from different file streams in parallel and asymmetrically based on the available server capacity.

8. Performance

Observed performance pegs gRPC-web over websockets as similar to the HTTP Client with pretty much zero optimisation work performed

9. Security

Browsers require TLS for all use of websocket connections to localhost. They do not require it for the loopback address, however, which this PR uses, though loopback means the traffic will not leave the local machine.

The incoming requests start as HTTP requests so have a referer header and user agent so would follow the same restrictions as the existing HTTP API.

Fixes #2519
Fixes #2838
Fixes #2943
Fixes #2854
Fixes #2864

[FormData]: https://developer.mozilla.org/en-US/docs/Web/API/FormData
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
env:browser exp/expert Having worked on the specific codebase is important P1 High: Likely tackled by core team if no one steps up pkg:http-client Issues related only to ipfs-http-client status/deferred Conscious decision to pause or backlog
Projects
None yet
Development

No branches or pull requests

7 participants