Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Gateway++ Phase 1 #100

Closed
wants to merge 2 commits into from
Closed

Proposal: Gateway++ Phase 1 #100

wants to merge 2 commits into from

Conversation

mikeal
Copy link
Contributor

@mikeal mikeal commented Apr 21, 2021

No description provided.

In [nft.storage](http://nft.storage) we have the following high priority needs:

- Add the Pinning API to ipfs-cluster.
- Add transactional CAR file uploads to the Pinning API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind elaborating on how this endpoint should work / look like?
How different it would be from /api/v0/dag/import ?

transactional is being mentioned multiple times in this proposal, but I feel it's used to describe specific behavior in specific use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read: it should basically be /api/v0/dag/import, but nicely integrated into the Pinning API. Send it a blob, receive back a success/fail message, and transactional in the sense that it's all imported or not. If the CAR has a problem part way through, then bail on all of it. Details on how to do this blob of binary should be resolved this week hopefully with the binary API discussion, is multipart/form-data appropriate here? If we're doing this fresh for the Pinning API then we have an opportunity to try out an alt approach that we might choose for a v0 binary solution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster is adding support for CAR files on the /add endpoint (which otherwise mimics api/v0/add).

Some choices cluster (or I) made:

  • It's a POST multipart - even if cluster just accepts a single CAR part with a single root, multipart is how we usually upload things in the web and it is flexible enough to do other things (like normal adding).
  • CAR must have a single root. The cluster API is constrained by being able to Pin one thing, so CARs must have a single root, or otherwise multiple roots would have to be wrapped in a single CID. I see the pinning API also does not have a "multple pin" endpoint so this may be a reasonable limitation also in the pinning API.
  • I did not add a new endpoint because there is significant overlap between adding CARs and adding files normally: replication factors, pin options, stream channels, pin sharding etc. If a Pinning API add endpoint is added for CARs, think it might be expanded in the future to do normal unixfs-adding, or raw block-adding.
  • Cluster added a format=<car/unixfs> query option to the /add endpoint control how things are supposed to be added (choosing a DAG Formatter, which given an input produces ipld.Nodes as output).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the pinning API was designed to handle the operation of pinning content which is a separate operation from pushing content to a particular endpoint. @lidel may be able to fill in the blanks here. I'm having trouble finding the slides at the moment, but this talk from Juan (and the slides in the background) https://youtu.be/Pcv8Bt4HMVU?t=912 setting the background for the pinning API discussion differentiates between the different types of operations that might need to be provided.

Using CAR files as a mechanism for pushing data is wasteful in that it ignores the existence of duplicate data at the endpoint. For example, adding a 10kB file to a 100MB directory now requires uploading 100MB of data. Making CAR file uploads "first class citizens" and the recommended way people interact with our stack is IMO a mistake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's actually high demand for HTTP-only environments having a standardized ingestion format that's not /api/v0/import then providing something here seems reasonable.

However, IMO we should have tooling in place that points people down a more correct path (i.e. a libp2p node that spins up a single WSS connection to the endpoint from the pinning API and sends the data over Bitswap/GraphSync).

Additionally, it might be nice if we could allow people to be more efficient by being able to ask the pinning service "which blocks in this CAR file manifest do you already have?" and then only uploading a CAR with the delta of missing blocks. Since this is an optimization it can be done later if it's a pain.

Copy link
Contributor

@anorth anorth Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add that we should also work towards software like @aschmahmann describes that would support a graphsync upload, outside this proposal, but that would be more friction for the immediate needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsanjuan can you confirm that this item is done as of ipfs-cluster/ipfs-cluster#1343?
So the only thing left in this proposal is the pinning API to cluster? (plus the doc+deploy items below)
Or is there more to do with CAR uploads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the very least i’d expect the CAR upload feature to need to be updated to accept and validate a token in the same way the Pinning API does after the Pinning API lands in cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsanjuan can you confirm that this item is done as of ipfs-cluster/ipfs-cluster#1343?

The item as written is not done. Cluster added CAR-file import to its own REST API which is different than the official Pinning API (which it does not have). When the Pinning API knows how it wants to support CAR file import, it should be easy to re-use the importer that cluster includes now, along with the rest of the Pinning API and the token-based authentication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding "DAG import" endpoint to Pinning Service API is being picked up in ipfs/pinning-services-api-spec#73 (comment) – would appreciate feedback.


#### Alternatives

There are alternative approaches to building thin clients. The proposals around changing/improving the RPC API could be designed for this purpose,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific requirements do you have for those "thin clients"?

I've been discussing "thin clients" with mobile browser and IoT vendors and most of their needs could be accomplished by regular IPFS node with disabled p2p transports and discovery and doing content via CAR import/export via Gateway.

Sounds like the only additional piece here is remote pinning. Perhaps we could identify common needs and spec out a variant of our stack tailored for thin clients? Mobile browsers would really like having this mode as a pre-built preset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regular IPFS node with disabled p2p transports and discovery and doing content via CAR import/export via Gateway

So what's left in a "regular IPFS node" when you strip out these bits? This description sounds just like what this proposal wants but without the notion of being a "regular IPFS node". But that probably comes back to the problems we have of "IPFS node" being something different for everyone! Has import via the gateway been something already on the table? How has that been imagined so far and is there an alternative here to pulling in the Pinning API to achieve this?

Symmetric use of CAR for import and export would certainly be worth exploring as part of this proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's left in a "regular IPFS node" when you strip out these bits?

Integrity guaranteed provided by content addressing (data can be fetched in trustless manner) and ability to use IPLD for advanced data structures.

Has import via the gateway been something already on the table? How has that been imagined so far and is there an alternative here to pulling in the Pinning API to achieve this?

Yes, we are planning to add DAG import/export directly to gateway endpoints (/ipfs/, /ipns/). Longer discussion in
ipfs/in-web-browsers#170 but tldr idea is:

  • Improve the concept of a writable gateway to support DAG import via HTTP PUT /ipfs/{cid}
  • IPNS publishing could be as easy as HTTP PUT /ipns/{libp2p-key}

Copy link
Contributor

@aschmahmann aschmahmann Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of the environments that we're concerned about are unable to sustain a basic libp2p node that makes a single connection via TCP/WebSockets and really need to just have HTTP?

As mentioned in some of my other comments (https://github.com/protocol/web3-dev-team/pull/100/files#r617675012, https://github.com/protocol/web3-dev-team/pull/100/files#r617641097, https://github.com/protocol/web3-dev-team/pull/100/files#r617641520) we can efficiently use libp2p to transfer IPLD data between two peers as all the transports we support have bidirectional streaming, otherwise we lose efficiency by being unidirectional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of the environments that we're concerned about are unable to sustain a basic libp2p node

I think it is less about resources and more about "deployment style", more specifically about preferring stateless-ness where possible.

A libp2p node is an active unit, requiring an actively running process, servicing of periodic protocol chatter, etc.

An HTTP client interface on the other hand is completely and utterly "dumb". You could drive such an "http-only ipfs-node" from a bash script, which is decidedly not possible today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of the environments that we're concerned about are unable to sustain a basic libp2p node that makes a single connection via TCP/WebSockets and really need to just have HTTP?

Serverless (Lambda), Cloudflare Workers, and mobile devices.

Pretty much all the highest growth application environments have trouble with long running processes and connections and prefer or require a stateless protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have trouble with long running processes and connections and prefer or require a stateless protocol.

Is a long running HTTP upload exempt from this? If not then spinning up a temporary libp2p node shouldn't be very different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP upload is not exempt, we can’t push too much data at once. We’re going to have to break up large files by encoding in the client and doing uploads under 100mb to get around CF Worker limits.

Copy link
Contributor

@aschmahmann aschmahmann May 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudflare Workers seem to have support for WebSockets https://blog.cloudflare.com/introducing-websockets-in-workers/ so using libp2p shouldn't be a problem there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has an interface for being a websocket service but there’s no client in CF workers.


### Content Routing for Large Providers

Gateways and large providers need to be directly peered since large providers have too much content to provide in the DHT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we explore if having a provider strategy that only announces file root blocks improve things for big providers?

Most of the data is unixfs, and most of the announced blocks could be skipped. Only file roots matter in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’ll help, but we’re throwing incremental improvements as an exponential problem. nft.storage will have too many CIDs to keep in the DHT by the end of the month even with roots only and the improvements Adin made that havent been released.

We’ve come to the same conclusion other large providers like Pinata came to, we can’t support the DHT with this much content.

But this is going to work out because content discovery has always been about more than just the DHT. We should work on a protocol for a federation of large providers to use and continue to improve the DHT for a larger network of more nodes with smaller amounts of content per node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I wonder if we could leverage DNS hints here.
We discussed having websites and gateway announce own addrs to enable clients to preconnect and skip DHT step: ipfs/kubo#6516


In [nft.storage](http://nft.storage) we have the following high priority needs:

- Add the Pinning API to ipfs-cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for adding Pinning Service API to ipfs-cluster – this will not only help with NFTs, but enable people to self-host pinning infra with ease and use it with ipfs-webui v2.12.0+ and soon ipfs-desktop and Brave.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of adding the pinning service API to cluster when cluster already provides the required push API? Is it purely for client code generation and auth tokens?

Copy link
Contributor

@olizilla olizilla Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It would be nice if Cluster did support the pinning service API, but adding files and pinning dags to a remote cluster is already well supported. We make use of this in adding websites to cluster from CI which is a great example of a constrained environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of that can be replicated in a browser if you had the auth available? Is ipfs-cluster-ctl just a simple wrapper around the REST API + some UnixFS slurping?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! ipfs-cluster-ctl is a wrapper around the cluster REST api. The auth is basic, so with some https, you're good.

ipfs-cluster-ctl is an HTTP API client to the REST API endpoint with full feature-parity that always works with the HTTP API as offered by a cluster peer on the same version. Anything that ipfs-cluster-ctl can do is supported by the REST API.

https://cluster.ipfs.io/documentation/reference/api/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of adding the pinning service API to cluster when cluster already provides the required push API? Is it purely for client code generation and auth tokens?

And swapping Pinning Service providers as needed. But yeah, I don't see it is a blocker. The regular REST API can do the needed things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And swapping Pinning Service providers as needed.

But that doesn't actually provide anything in this scenario as the Pinning Service API doesn't allow for pushing of data, so any supporting Pinning Service Provider won't be able to accept the CAR files unless we extend the pinning services API to include the ability to push.

Not opposed to specifying a Pushing API that services can implement. Actually I feel like this issue is more about pushing data directly and not about pinning at all so it is confusing to try and suggest that the pinning API is useful in solving this problem.


While this maps well to where web developers are today, it's not a "pure p2p" approach to solving problems. We're beefing up the ability to rely on large IPFS nodes that end up
being federated rather than fully decentralized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed approach results in wasted bandwidth (upload from the user and download from the service provider) when some of the data already exists on the service provider.

This pushes developers away from working with modifiable/appendable data structures which is something we have otherwise been encouraging.

Copy link
Contributor Author

@mikeal mikeal Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above regarding “wasted bandwidth.”

We don’t get to determine what data structures people use. NFT developers are already trying to use IPFS and we are not meeting their needs. We may wish they had done something different but we’re past the point of being able to determine their pattern of use.


## Problem Statement

For reading data, the IPFS Gateway is already serving these users quite well. Not only does it allow them to read data from the IPFS network without running a full node, they are able to integrate with existing HTTP caching infrastructure to improve performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite well ... if the data is unixfs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gateway supports block reads https://github.com/ipfs/go-ipfs/blob/master/docs/gateway.md#read-only-api so you can get the data for non-unixfs. for most of the stuff we built in the IPLD team we just used block read/write interfaces, the DAG API was never quite the right fit.

if you’re working with really long chains you’ll need something like Graphsync, or we could go down the GraphQL route like i had in the future section before I pulled it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or #1 :-)

Copy link
Contributor

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this proposal, and the direction it's pointing.

IMO this would much aided by a better name, describing what this proposal actually contains. You can still use "Gateway++" to label a larger direction, but I think it inhibits understanding of the immediate goal and use case. Consider linking to other gateway and API-related proposals to paint the bigger picture.


- Add the Pinning API to ipfs-cluster.
- Add transactional CAR file uploads to the Pinning API.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have ipfs/kubo#6129 as an addition to the HTTP API. It would allow users not using an IPFS node to assess the correct behaviour of the gateway(++).

@rvagg
Copy link
Contributor

rvagg commented Apr 27, 2021

@aschmahmann @lidel I'd like to focus in on this comment for a bit:

IIUC the pinning API was designed to handle the operation of pinning content which is a separate operation from pushing content to a particular endpoint

And the video link @aschmahmann helpfully provided that has this line in it: https://www.youtube.com/watch?v=Pcv8Bt4HMVU&t=912s

Juan mentions a "GitHub thread" in there that touches on this, so there must be more context here. There's also a slide a little later in the video titled "Interface Wishes" that continues the point although Juan doesn't discuss the detail in that slide.

In the Pinning API we have the notion of "provider hints" and I'm wondering, in the framing of Juan's talk, what the distinction might be between "go and find it over there and pin it" vs "here it is, just pin this" (i.e. the provider hint is essentially just "it's right here!"). I imagine the thinking of the Pinning API evolved somewhat from that discussion so it shouldn't necessarily hold us back, but I don't want to miss a key distinction if there is one in here that we're not seeing clearly.

@rvagg
Copy link
Contributor

rvagg commented Apr 27, 2021

Maybe that context comes from this thread: ipfs/notes#378 (comment)

Although the concerns that @lanzafame expressed there seem to be more about DAG construction and formats, which is dealt with by just using a CAR - i.e. we offload DAG construction to the user, entirely, and just take pure IPLD blocks. If you want UnixFS then make it yourself and upload it.

@rvagg
Copy link
Contributor

rvagg commented Apr 27, 2021

Do we actually have a server-side implementation of the Pinning API anywhere or did we just make a spec for the ecosystem pinning services, plus the ability to codegen clients from it? i.e. if we tackle this, we’re going to be implementing the Pinning API from scratch aren’t we, not just merging some things that already exist.

@olizilla
Copy link
Contributor

Do we actually have a server-side implementation of the Pinning API anywhere or did we just make a spec for the ecosystem pinning services

I think both are truthy https://github.com/ipfs-shipyard/rb-pinning-service-api

@momack2
Copy link
Contributor

momack2 commented Jun 14, 2021

Is this still in progress @mikeal ?

@mikeal
Copy link
Contributor Author

mikeal commented Jun 14, 2021

@olizilla this is all either done or being superseded by a yet-to-be-written filecoin.storage doc, are we good to close it out?

@mikeal
Copy link
Contributor Author

mikeal commented Jun 21, 2021

this is being superceeded by filecoin.storage

@mikeal mikeal closed this Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nitro NFT Free for All
Projects
None yet
Development

Successfully merging this pull request may close these issues.