Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-412: Signaling Block Order in CARs on HTTP Gateways #412

Merged
merged 15 commits into from
Aug 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions src/http-gateways/path-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -595,11 +595,7 @@ The following response types require an explicit opt-in, can only be requested w
- Raw Block (`?format=raw`)
- Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw).
- CAR (`?format=car`)
- A CAR file or a stream that contains all blocks required to trustlessly verify the requested content path query, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) and :cite[trustless-gateway].
- **Note:** by default, block order in CAR response is not deterministic,
blocks can be returned in different order, depending on implementation
choices (traversal, speed at which blocks arrive from the network, etc).
An opt-in ordered CAR responses MAY be introduced in a future IPIP.
- A CAR file or a stream that contains all blocks required to trustlessly verify the requested content path query, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) and Section 5 (CAR Responses) at :cite[trustless-gateway].
- TAR (`?format=tar`)
- Deserialized UnixFS files and directories as a TAR file or a stream, see :cite[ipip-0288].
- IPNS Record
Expand Down
214 changes: 176 additions & 38 deletions src/http-gateways/trustless-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ editors:
- name: Henrique Dias
github: hacdias
url: https://hacdias.com/
xref:
- url
- path-gateway
- ipip-0412
tags: ['httpGateways', 'lowLevelHttpGateways']
order: 1
---
Expand All @@ -25,31 +29,31 @@ The minimal implementation means:

- response type is always fully verifiable: client can decide between a raw block or a CAR stream
- no UnixFS/IPLD deserialization
- for CAR files:
- the behavior is identical to :cite[path-gateway]
- for raw blocks:
- data is requested by CID, only supported path is `/ipfs/{cid}`
- no path traversal or recursive resolution
- for CAR files:
- the pathing behavior is identical to :cite[path-gateway]

# HTTP API

A subset of "HTTP API" of :cite[path-gateway].

## `GET /ipfs/{cid}[/{path}][?{params}]`

Downloads verifiable data for the specified **immutable** content path.
Downloads verifiable, content-addressed data for the specified **immutable** content path.

Optional `path` is permitted for requests that specify CAR format (`application/vnd.ipld.car`).
Optional `path` is permitted for requests that specify CAR format (`?format=car` or `Accept: application/vnd.ipld.car`).

For RAW requests, only `GET /ipfs/{cid}[?{params}]` is supported.
For block requests (`?format=raw` or `Accept: application/vnd.ipld.raw`), only `GET /ipfs/{cid}[?{params}]` is supported.

## `HEAD /ipfs/{cid}[/{path}][?{params}]`

Same as GET, but does not return any payload.

## `GET /ipns/{key}[?{params}]`

Downloads data at specified IPNS Key. Verifiable :cite[ipns-record] can be requested via `?format=ipns-record`
Downloads data at specified IPNS Key. Verifiable :cite[ipns-record] can be requested via `?format=ipns-record` or `Accept: application/vnd.ipfs.ipns-record`.

## `HEAD /ipns/{key}[?{params}]`

Expand All @@ -63,17 +67,26 @@ Same as in :cite[path-gateway], but with limited number of supported response ty

### `Accept` (request header)

This HTTP header is required when running in a strict, trustless mode.
A Client SHOULD send this HTTP header to leverage content type negotiation
based on section 12.5.1 of :cite[rfc9110].

Below response types MUST be supported:

Below response types MUST to be supported:
- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – requests a single, verifiable raw block to be returned
- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw)
- A single, verifiable raw block to be returned.

Below response types SHOULD to be supported:
- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned
- [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) – requests a verifiable :cite[ipns-record] (multicodec `0x0300`).
Below response types SHOULD be supported:

Gateway SHOULD return HTTP 400 Bad Request when running in strict trustless
mode (no deserialized responses) and `Accept` header is missing.
- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
- Disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be
returned, implementations MAY support optional CAR content type parameters
(:cite[ipip-0412]) and the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request).

- [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record)
- A verifiable :cite[ipns-record] (multicodec `0x0300`).

A Gateway SHOULD return HTTP 400 Bad Request when running in strict trustless
mode (no deserialized responses) and `Accept` header is missing.

## Request Query Parameters

Expand Down Expand Up @@ -113,7 +126,7 @@ When the terminating entity at the end of the specified content path:
specified byte range of that entity.

- When dealing with a sharded UnixFS file (`dag-pb`, `0x70`) and a non-zero
`from` value, the UnixFS data and `blocksizes` determine the
`from` value, the UnixFS data and `blocksizes` determine the
corresponding starting block for a given `from` offset.

- cannot be interpreted as a continuous array of bytes (such as a DAG-CBOR/JSON
Expand Down Expand Up @@ -150,14 +163,14 @@ that includes enough blocks for the client to understand why the requested
returned:

- If the requested `entity-bytes` resolves to a range that partially falls
outside of the entity's byte range, the response MUST include the subset of
outside the entity's byte range, the response MUST include the subset of
blocks within the entity's bytes.
- This allows clients to request valid ranges of the entity without needing
to know its total size beforehand, and it does not require the Gateway to
buffer the entire entity before returning the response.

- If the requested `entity-bytes` resolves to a zero-length range or falls
fully outside of the entity's bytes, the response is equivalent to
fully outside the entity's bytes, the response is equivalent to
`dag-scope=block`.
- This allows client to produce a meaningful error (e.g, in case of UnixFS,
leverage `Data.blocksizes` information present in the root `dag-pb` block).
Expand All @@ -180,69 +193,194 @@ Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gatew

MUST be returned and include additional format-specific parameters when possible.

If a CAR stream was requested, the response MUST include the parameter specifying CAR version.
For example: `Content-Type: application/vnd.ipld.car; version=1`
If a CAR stream was requested:
- the response MUST include the parameter specifying CAR version. For example:
`Content-Type: application/vnd.ipld.car; version=1`
- the response SHOULD include additional content type parameters, as noted in
[CAR format signaling in Response](#car-format-signaling-in-response).

### `Content-Disposition` (response header)

MUST be returned and set to `attachment` to ensure requested bytes are not rendered by a web browser.

## Response Payload

### Block Response
# Block Responses (application/vnd.ipld.raw)

An opaque bytes matching the requested block CID
([application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw)).

The Body hash MUST match the Multihash from the requested CID.

### CAR Response
# CAR Responses (application/vnd.ipld.car)

A CAR stream for the requested
[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)
content type, path and optional `dag-scope` and `entity-bytes` URL parameters.
content type (with optional `order` and `dups` params), path and optional
`dag-scope` and `entity-bytes` URL parameters.

#### CAR version
## CAR version

Value returned in
[`CarV1Header.version`](https://ipld.io/specs/transport/car/carv1/#header)
field MUST match the `version` parameter returned in `Content-Type` header.

#### CAR roots
## CAR roots

The behavior associated with the
[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field
is not currently specified.

Clients MAY ignore it.
The lack of standard here means a client MUST assume different Gateways could return a different value.

A Client SHOULD ignore this field.

:::issue

As of 2023-06-20, the behavior of the `roots` CAR field remains an [unresolved item within the CARv1 specification](https://web.archive.org/web/20230328013837/https://ipld.io/specs/transport/car/carv1/#unresolved-items).

:::

#### CAR determinism
## CAR `order` (content type parameter)

The `order` parameter allows clients to specify the desired block order in the
response. It supports the following values:

- `dfs`: [Depth-First Search](https://en.wikipedia.org/wiki/Depth-first_search)
order, enables streaming responses with minimal memory usage.
- `unk` (or missing): Unknown order, which serves as the implicit default when the `order`
parameter is unspecified. In this case, the client cannot make any assumptions
about the block order: blocks may arrive in a random order or be a result of
a custom DAG traversal algorithm.

A Gateway SHOULD always return explicit `order` in CAR's `Content-Type` response header.

A Gateway MAY skip `order` in CAR response if no order was explicitly requested
by the client and the default order is unknown.

A Client MUST assume implicit `order=unk` when `order` is missing, unknown, or empty.

## CAR `dups` (content type parameter)

The `dups` parameter specifies whether duplicate blocks (the same block
occurring multiple times in the requested DAG) will be present in the CAR
response. Useful when a deterministic block order is used.

It accepts two values:
- `y`: Duplicate blocks MUST be sent every time they occur during the DAG walk.
- `n`: Duplicate blocks MUST be sent only once.

When set to `y`, light clients are able to discard blocks after
reading them, removing the need for caching in-memory or on-disk.

Setting to `n` allows for more efficient data transfer of certain types of
data, but introduces additional resource cost on the receiving end, as each
block needs to be kept around in case its CID appears again.

If the `dups` parameter is absent from the `Accept` request header, the
behavior is unspecified. In such cases, a Gateway should respond with `dups=n`
if it has control over the duplicate status, or without `dups` parameter if it
does not.
Defaulting to the inclusion of duplicate blocks (`dups=y`) SHOULD only be
implemented by Gateway systems that exclusively support `dups=y` and do not
support any other behavior.

A Client MUST not assume any implicit behavior when `dups` is missing.

If the `dups` parameter is absent from the `Content-Type` response header, the
behavior is unspecified, and the CAR response includes an arbitrary list of
blocks. In this unknown state, the client MUST assume duplicates are not sent,
but also MUST ignore duplicates and other unexpected blocks if they are present.

A Gateway MUST always return `dups` in `Content-Type` response header
when the duplicate status is known at the time of processing the request.
A Gateway SHOULD not return `dups` if determining the duplicate status is not
possible at the time of processing the request.

A Gateway MUST NOT include virtual blocks identified by identity CIDs
(multihash with `0x00` code) in CAR responses. This exclusion applies regardless
of their presence in the DAG or the value assigned to the "dups" parameter, as
the raw data is already present in the parent block that links to the identity
CID.

The default CAR header and block order in a CAR response is not specified and is non-deterministic.
## CAR format parameters and determinism

The default header and block order in a CAR format is not specified by IPLD specifications.

Clients MUST NOT assume that CAR responses are deterministic (byte-for-byte identical) across different gateways.

Clients MUST NOT assume that CAR includes CIDs and their blocks in the same order across different gateways.

Clients MUST assume block order and duplicate status only if `Content-Type` returned with CAR responses includes optional `order` or `dups` parameters, as specified by :cite[ipip-0412].

A Gateway SHOULD support some aspects of determinism by implementing content type negotiation and signaling via `Accept` and `Content-Type` headers.

:::issue

In controlled environments, clients MAY choose to rely on undocumented CAR determinism,
subject to the agreement of the following conditions between the client and the
gateway:
In controlled environments, clients MAY choose to rely on implicit and
undocumented CAR determinism, subject to the agreement of the following
conditions between the client and the gateway:
- CAR version
- content of [`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field
- order of blocks
- status of duplicate blocks
- order of blocks (`order` from :cite[ipip-0412])
- status of duplicate blocks (`dups` from :cite[ipip-0412])

In the future, there may be an introduction of a convention to indicate aspects
of determinism in CAR responses. Please refer to
[IPIP-412](https://github.com/ipfs/specs/pull/412) for potential developments
in this area.
Mind this is undocumented behavior, and MUST NOT be used on public networks.

:::

### CAR format signaling in Request

Content type negotiation is based on section 12.5.1 of :cite[rfc9110].

Clients MAY indicate their preferred block order by sending an `Accept` header in
the HTTP request. The `Accept` header format is as follows:

```
Accept: application/vnd.ipld.car; version=1; order=dfs; dups=y
```

In the future, when more orders or parameters exist, clients will be able to
specify a list of preferences, for example:

```
Accept: application/vnd.ipld.car;order=foo, application/vnd.ipld.car;order=dfs;dups=y;q=0.5
```

The above example is a list of preferences, the client would really like to use
the hypothetical `order=foo` however if this isn't available it would accept
`order=dfs` with `dups=y` instead (lower priority indicated via `q` parameter,
as noted in :cite[rfc9110]).

### CAR format signaling in Response

The Trustless Gateway MUST always respond with a `Content-Type` header that includes
information about all supported and known parameters, even if the client did not
specify them in the request.

The `Content-Type` header format is as follows:

```
Content-Type: application/vnd.ipld.car;version=1;order=dfs;dups=n
```

Gateway implementations SHOULD decide on the implicit default ordering or
other parameters, and use it in responses when client did not explicitly
specify any matching preference.

A Gateway MAY choose to implement only some parameters and return HTTP
400 Bad Request or 406 Not Acceptable when a client requested a response with
unsupported content type variant.

A Client MUST verify `Content-Type` returned with CAR response before
processing the payload, as the legacy gateway may not support optional content
type parameters like `order` an `dups` and return plain
`application/vnd.ipld.car`.

# IPNS Record Responses (application/vnd.ipfs.ipns-record)

An opaque bytes matching the [Signed IPNS Record](https://specs.ipfs.tech/ipns/ipns-record/#ipns-record)
for the requested [IPNS Name](https://specs.ipfs.tech/ipns/ipns-record/#ipns-name)
returned as [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record).

A Client MUST confirm the record signature match `libp2p-key` from the requested IPNS Name.

A Client MUST [perform additional record verification according to the IPNS specification](https://specs.ipfs.tech/ipns/ipns-record/#record-verification).
Loading