Proposal: Gateway++ Phase 1 #100

mikeal · 2021-04-21T00:09:01Z

No description provided.

lidel · 2021-04-21T01:34:54Z

proposals/gateway-plusplus-phase1.md

+In [nft.storage](http://nft.storage) we have the following high priority needs:
+
+- Add the Pinning API to ipfs-cluster.
+- Add transactional CAR file uploads to the Pinning API.


Mind elaborating on how this endpoint should work / look like?
How different it would be from /api/v0/dag/import ?

transactional is being mentioned multiple times in this proposal, but I feel it's used to describe specific behavior in specific use case.

My read: it should basically be /api/v0/dag/import, but nicely integrated into the Pinning API. Send it a blob, receive back a success/fail message, and transactional in the sense that it's all imported or not. If the CAR has a problem part way through, then bail on all of it. Details on how to do this blob of binary should be resolved this week hopefully with the binary API discussion, is multipart/form-data appropriate here? If we're doing this fresh for the Pinning API then we have an opportunity to try out an alt approach that we might choose for a v0 binary solution.

Cluster is adding support for CAR files on the /add endpoint (which otherwise mimics api/v0/add).

Some choices cluster (or I) made:

It's a POST multipart - even if cluster just accepts a single CAR part with a single root, multipart is how we usually upload things in the web and it is flexible enough to do other things (like normal adding).

CAR must have a single root. The cluster API is constrained by being able to Pin one thing, so CARs must have a single root, or otherwise multiple roots would have to be wrapped in a single CID. I see the pinning API also does not have a "multple pin" endpoint so this may be a reasonable limitation also in the pinning API.

I did not add a new endpoint because there is significant overlap between adding CARs and adding files normally: replication factors, pin options, stream channels, pin sharding etc. If a Pinning API add endpoint is added for CARs, think it might be expanded in the future to do normal unixfs-adding, or raw block-adding.

Cluster added a format=<car/unixfs> query option to the /add endpoint control how things are supposed to be added (choosing a DAG Formatter, which given an input produces ipld.Nodes as output).

IIUC the pinning API was designed to handle the operation of pinning content which is a separate operation from pushing content to a particular endpoint. @lidel may be able to fill in the blanks here. I'm having trouble finding the slides at the moment, but this talk from Juan (and the slides in the background) https://youtu.be/Pcv8Bt4HMVU?t=912 setting the background for the pinning API discussion differentiates between the different types of operations that might need to be provided.

Using CAR files as a mechanism for pushing data is wasteful in that it ignores the existence of duplicate data at the endpoint. For example, adding a 10kB file to a 100MB directory now requires uploading 100MB of data. Making CAR file uploads "first class citizens" and the recommended way people interact with our stack is IMO a mistake.

If there's actually high demand for HTTP-only environments having a standardized ingestion format that's not /api/v0/import then providing something here seems reasonable.

However, IMO we should have tooling in place that points people down a more correct path (i.e. a libp2p node that spins up a single WSS connection to the endpoint from the pinning API and sends the data over Bitswap/GraphSync).

Additionally, it might be nice if we could allow people to be more efficient by being able to ask the pinning service "which blocks in this CAR file manifest do you already have?" and then only uploading a CAR with the delta of missing blocks. Since this is an optimization it can be done later if it's a pain.

Let me add that we should also work towards software like @aschmahmann describes that would support a graphsync upload, outside this proposal, but that would be more friction for the immediate needs.

@hsanjuan can you confirm that this item is done as of ipfs-cluster/ipfs-cluster#1343?
So the only thing left in this proposal is the pinning API to cluster? (plus the doc+deploy items below)
Or is there more to do with CAR uploads?

at the very least i’d expect the CAR upload feature to need to be updated to accept and validate a token in the same way the Pinning API does after the Pinning API lands in cluster.

@hsanjuan can you confirm that this item is done as of ipfs-cluster/ipfs-cluster#1343?

The item as written is not done. Cluster added CAR-file import to its own REST API which is different than the official Pinning API (which it does not have). When the Pinning API knows how it wants to support CAR file import, it should be easy to re-use the importer that cluster includes now, along with the rest of the Pinning API and the token-based authentication.

Adding "DAG import" endpoint to Pinning Service API is being picked up in ipfs/pinning-services-api-spec#73 (comment) – would appreciate feedback.

lidel · 2021-04-21T01:41:11Z

proposals/gateway-plusplus-phase1.md

+
+#### Alternatives
+
+There are alternative approaches to building thin clients. The proposals around changing/improving the RPC API could be designed for this purpose,


What specific requirements do you have for those "thin clients"?

I've been discussing "thin clients" with mobile browser and IoT vendors and most of their needs could be accomplished by regular IPFS node with disabled p2p transports and discovery and doing content via CAR import/export via Gateway.

Sounds like the only additional piece here is remote pinning. Perhaps we could identify common needs and spec out a variant of our stack tailored for thin clients? Mobile browsers would really like having this mode as a pre-built preset.

regular IPFS node with disabled p2p transports and discovery and doing content via CAR import/export via Gateway

So what's left in a "regular IPFS node" when you strip out these bits? This description sounds just like what this proposal wants but without the notion of being a "regular IPFS node". But that probably comes back to the problems we have of "IPFS node" being something different for everyone! Has import via the gateway been something already on the table? How has that been imagined so far and is there an alternative here to pulling in the Pinning API to achieve this?

Symmetric use of CAR for import and export would certainly be worth exploring as part of this proposal.

So what's left in a "regular IPFS node" when you strip out these bits?

Integrity guaranteed provided by content addressing (data can be fetched in trustless manner) and ability to use IPLD for advanced data structures.

Has import via the gateway been something already on the table? How has that been imagined so far and is there an alternative here to pulling in the Pinning API to achieve this?

Yes, we are planning to add DAG import/export directly to gateway endpoints (/ipfs/, /ipns/). Longer discussion in
ipfs/in-web-browsers#170 but tldr idea is:

Improve the concept of a writable gateway to support DAG import via HTTP PUT /ipfs/{cid}

IPNS publishing could be as easy as HTTP PUT /ipns/{libp2p-key}

How many of the environments that we're concerned about are unable to sustain a basic libp2p node that makes a single connection via TCP/WebSockets and really need to just have HTTP?

As mentioned in some of my other comments (https://github.com/protocol/web3-dev-team/pull/100/files#r617675012, https://github.com/protocol/web3-dev-team/pull/100/files#r617641097, https://github.com/protocol/web3-dev-team/pull/100/files#r617641520) we can efficiently use libp2p to transfer IPLD data between two peers as all the transports we support have bidirectional streaming, otherwise we lose efficiency by being unidirectional.

How many of the environments that we're concerned about are unable to sustain a basic libp2p node

I think it is less about resources and more about "deployment style", more specifically about preferring stateless-ness where possible.

A libp2p node is an active unit, requiring an actively running process, servicing of periodic protocol chatter, etc.

An HTTP client interface on the other hand is completely and utterly "dumb". You could drive such an "http-only ipfs-node" from a bash script, which is decidedly not possible today.

How many of the environments that we're concerned about are unable to sustain a basic libp2p node that makes a single connection via TCP/WebSockets and really need to just have HTTP?

Serverless (Lambda), Cloudflare Workers, and mobile devices.

Pretty much all the highest growth application environments have trouble with long running processes and connections and prefer or require a stateless protocol.

have trouble with long running processes and connections and prefer or require a stateless protocol.

Is a long running HTTP upload exempt from this? If not then spinning up a temporary libp2p node shouldn't be very different.

HTTP upload is not exempt, we can’t push too much data at once. We’re going to have to break up large files by encoding in the client and doing uploads under 100mb to get around CF Worker limits.

Cloudflare Workers seem to have support for WebSockets https://blog.cloudflare.com/introducing-websockets-in-workers/ so using libp2p shouldn't be a problem there.

It has an interface for being a websocket service but there’s no client in CF workers.

lidel · 2021-04-21T01:47:13Z

proposals/gateway-plusplus-phase1.md

+
+### Content Routing for Large Providers
+
+Gateways and large providers need to be directly peered since large providers have too much content to provide in the DHT.


Did we explore if having a provider strategy that only announces file root blocks improve things for big providers?

Most of the data is unixfs, and most of the announced blocks could be skipped. Only file roots matter in practice.

It’ll help, but we’re throwing incremental improvements as an exponential problem. nft.storage will have too many CIDs to keep in the DHT by the end of the month even with roots only and the improvements Adin made that havent been released.

We’ve come to the same conclusion other large providers like Pinata came to, we can’t support the DHT with this much content.

But this is going to work out because content discovery has always been about more than just the DHT. We should work on a protocol for a federation of large providers to use and continue to improve the DHT for a larger network of more nodes with smaller amounts of content per node.

Ack. I wonder if we could leverage DNS hints here.
We discussed having websites and gateway announce own addrs to enable clients to preconnect and skip DHT step: ipfs/kubo#6516

lidel · 2021-04-21T02:03:06Z

proposals/gateway-plusplus-phase1.md

+
+In [nft.storage](http://nft.storage) we have the following high priority needs:
+
+- Add the Pinning API to ipfs-cluster.


👍 for adding Pinning Service API to ipfs-cluster – this will not only help with NFTs, but enable people to self-host pinning infra with ease and use it with ipfs-webui v2.12.0+ and soon ipfs-desktop and Brave.

What is the benefit of adding the pinning service API to cluster when cluster already provides the required push API? Is it purely for client code generation and auth tokens?

Good point. It would be nice if Cluster did support the pinning service API, but adding files and pinning dags to a remote cluster is already well supported. We make use of this in adding websites to cluster from CI which is a great example of a constrained environment.

How much of that can be replicated in a browser if you had the auth available? Is ipfs-cluster-ctl just a simple wrapper around the REST API + some UnixFS slurping?

yes! ipfs-cluster-ctl is a wrapper around the cluster REST api. The auth is basic, so with some https, you're good.

ipfs-cluster-ctl is an HTTP API client to the REST API endpoint with full feature-parity that always works with the HTTP API as offered by a cluster peer on the same version. Anything that ipfs-cluster-ctl can do is supported by the REST API.

https://cluster.ipfs.io/documentation/reference/api/

What is the benefit of adding the pinning service API to cluster when cluster already provides the required push API? Is it purely for client code generation and auth tokens?

And swapping Pinning Service providers as needed. But yeah, I don't see it is a blocker. The regular REST API can do the needed things.

And swapping Pinning Service providers as needed.

But that doesn't actually provide anything in this scenario as the Pinning Service API doesn't allow for pushing of data, so any supporting Pinning Service Provider won't be able to accept the CAR files unless we extend the pinning services API to include the ability to push.

Not opposed to specifying a Pushing API that services can implement. Actually I feel like this issue is more about pushing data directly and not about pinning at all so it is confusing to try and suggest that the pinning API is useful in solving this problem.

aschmahmann · 2021-04-21T15:51:43Z

proposals/gateway-plusplus-phase1.md

+
+While this maps well to where web developers are today, it's not a "pure p2p" approach to solving problems. We're beefing up the ability to rely on large IPFS nodes that end up
+being federated rather than fully decentralized.
+


The proposed approach results in wasted bandwidth (upload from the user and download from the service provider) when some of the data already exists on the service provider.

This pushes developers away from working with modifiable/appendable data structures which is something we have otherwise been encouraging.

See my comment above regarding “wasted bandwidth.”

We don’t get to determine what data structures people use. NFT developers are already trying to use IPFS and we are not meeting their needs. We may wish they had done something different but we’re past the point of being able to determine their pattern of use.

anorth · 2021-04-23T00:06:09Z

proposals/gateway-plusplus-phase1.md

+
+## Problem Statement
+
+For reading data, the IPFS Gateway is already serving these users quite well. Not only does it allow them to read data from the IPFS network without running a full node, they are able to integrate with existing HTTP caching infrastructure to improve performance.


quite well ... if the data is unixfs

the gateway supports block reads https://github.com/ipfs/go-ipfs/blob/master/docs/gateway.md#read-only-api so you can get the data for non-unixfs. for most of the stuff we built in the IPLD team we just used block read/write interfaces, the DAG API was never quite the right fit.

if you’re working with really long chains you’ll need something like Graphsync, or we could go down the GraphQL route like i had in the future section before I pulled it :)

anorth

I like this proposal, and the direction it's pointing.

IMO this would much aided by a better name, describing what this proposal actually contains. You can still use "Gateway++" to label a larger direction, but I think it inhibits understanding of the immediate goal and use case. Consider linking to other gateway and API-related proposals to paint the bigger picture.

thibmeu · 2021-04-26T16:19:04Z

proposals/gateway-plusplus-phase1.md

+
+- Add the Pinning API to ipfs-cluster.
+- Add transactional CAR file uploads to the Pinning API.
+


It would be great to have ipfs/kubo#6129 as an addition to the HTTP API. It would allow users not using an IPFS node to assess the correct behaviour of the gateway(++).

rvagg · 2021-04-27T06:55:46Z

@aschmahmann @lidel I'd like to focus in on this comment for a bit:

IIUC the pinning API was designed to handle the operation of pinning content which is a separate operation from pushing content to a particular endpoint

And the video link @aschmahmann helpfully provided that has this line in it: https://www.youtube.com/watch?v=Pcv8Bt4HMVU&t=912s

Juan mentions a "GitHub thread" in there that touches on this, so there must be more context here. There's also a slide a little later in the video titled "Interface Wishes" that continues the point although Juan doesn't discuss the detail in that slide.

In the Pinning API we have the notion of "provider hints" and I'm wondering, in the framing of Juan's talk, what the distinction might be between "go and find it over there and pin it" vs "here it is, just pin this" (i.e. the provider hint is essentially just "it's right here!"). I imagine the thinking of the Pinning API evolved somewhat from that discussion so it shouldn't necessarily hold us back, but I don't want to miss a key distinction if there is one in here that we're not seeing clearly.

rvagg · 2021-04-27T07:07:00Z

Maybe that context comes from this thread: ipfs/notes#378 (comment)

Although the concerns that @lanzafame expressed there seem to be more about DAG construction and formats, which is dealt with by just using a CAR - i.e. we offload DAG construction to the user, entirely, and just take pure IPLD blocks. If you want UnixFS then make it yourself and upload it.

rvagg · 2021-04-27T07:16:55Z

Do we actually have a server-side implementation of the Pinning API anywhere or did we just make a spec for the ecosystem pinning services, plus the ability to codegen clients from it? i.e. if we tackle this, we’re going to be implementing the Pinning API from scratch aren’t we, not just merging some things that already exist.

proposals/gateway-plusplus-phase1.md

olizilla · 2021-04-27T08:33:49Z

Do we actually have a server-side implementation of the Pinning API anywhere or did we just make a spec for the ecosystem pinning services

I think both are truthy https://github.com/ipfs-shipyard/rb-pinning-service-api

momack2 · 2021-06-14T21:08:13Z

Is this still in progress @mikeal ?

mikeal · 2021-06-14T22:50:23Z

@olizilla this is all either done or being superseded by a yet-to-be-written filecoin.storage doc, are we good to close it out?

mikeal · 2021-06-21T18:41:34Z

this is being superceeded by filecoin.storage

Create gateway-plusplus-phase1.md

4f896c2

lidel reviewed Apr 21, 2021

View reviewed changes

aschmahmann reviewed Apr 21, 2021

View reviewed changes

fix: cutting future opportunities to clarify the scope of the proposal

0ec83f2

anorth reviewed Apr 23, 2021

View reviewed changes

thibmeu reviewed Apr 26, 2021

View reviewed changes

lanzafame reviewed Apr 27, 2021

View reviewed changes

proposals/gateway-plusplus-phase1.md Show resolved Hide resolved

jacobheun added the Nitro NFT Free for All label Apr 27, 2021

lidel mentioned this pull request May 11, 2021

Providing content with a pin request ipfs/pinning-services-api-spec#73

Open

BigLep assigned olizilla May 26, 2021

BigLep mentioned this pull request May 26, 2021

Proposal: IPFS Gateway HTTP API #1

Closed

mikeal closed this Jun 21, 2021


		#### Alternatives

		There are alternative approaches to building thin clients. The proposals around changing/improving the RPC API could be designed for this purpose,


		### Content Routing for Large Providers

		Gateways and large providers need to be directly peered since large providers have too much content to provide in the DHT.


		In [nft.storage](http://nft.storage) we have the following high priority needs:

		- Add the Pinning API to ipfs-cluster.


		While this maps well to where web developers are today, it's not a "pure p2p" approach to solving problems. We're beefing up the ability to rely on large IPFS nodes that end up
		being federated rather than fully decentralized.


		## Problem Statement

		For reading data, the IPFS Gateway is already serving these users quite well. Not only does it allow them to read data from the IPFS network without running a full node, they are able to integrate with existing HTTP caching infrastructure to improve performance.


		- Add the Pinning API to ipfs-cluster.
		- Add transactional CAR file uploads to the Pinning API.

Proposal: Gateway++ Phase 1 #100

Proposal: Gateway++ Phase 1 #100

Conversation

mikeal commented Apr 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anorth Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aschmahmann Apr 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aschmahmann May 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olizilla Apr 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikeal Apr 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anorth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg commented Apr 27, 2021

rvagg commented Apr 27, 2021

rvagg commented Apr 27, 2021

olizilla commented Apr 27, 2021

momack2 commented Jun 14, 2021

mikeal commented Jun 14, 2021

mikeal commented Jun 21, 2021

anorth Apr 23, 2021 •

edited

Loading

aschmahmann Apr 21, 2021 •

edited

Loading

aschmahmann May 2, 2021 •

edited

Loading

olizilla Apr 27, 2021 •

edited

Loading

mikeal Apr 21, 2021 •

edited

Loading