From 63ea4ff8984d536e891627687ea3794ad8366a11 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Mon, 15 May 2023 14:33:10 +0200 Subject: [PATCH] IPIP-412: Signaling Block Order in CARs on Gateways First draft based on various prior art and recent discussions cited in the header front matter. --- src/http-gateways/trustless-gateway.md | 2 +- src/ipips/ipip-0412.md | 240 +++++++++++++++++++++++++ 2 files changed, 241 insertions(+), 1 deletion(-) create mode 100644 src/ipips/ipip-0412.md diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 6d090b119..72e4ac3d5 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -63,7 +63,7 @@ mode and `Accept` header is missing Below response types MUST to be supported: - [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – requests a single, verifiable raw block to be returned -- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned +- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned, implementations MAY support optional parameters (:cite[ipip-0412]) - [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) – requests a verifiable :cite[ipns-record] (multicodec `0x0300`). # HTTP Response diff --git a/src/ipips/ipip-0412.md b/src/ipips/ipip-0412.md new file mode 100644 index 000000000..16ab2960a --- /dev/null +++ b/src/ipips/ipip-0412.md @@ -0,0 +1,240 @@ +--- +title: "IPIP-0412: Signaling Block Order in CARs on HTTP Gateways" +date: 2023-05-15 +ipip: proposal +editors: + - name: Marcin Rataj + github: lidel + url: https://lidel.org/ + - name: Jorropo + github: Jorropo +relatedIssues: + - https://github.com/ipfs/specs/issues/348 + - https://github.com/ipfs/specs/pull/330 + - https://github.com/ipfs/specs/pull/402 + - https://github.com/ipfs/specs/pull/412 +order: 412 +tags: ['ipips'] +--- + +## Summary + +Adds support for additional, optional content type options that allow the +client and server to signal or negotiate a specific block order in the returned +CAR. + +## Motivation + +We want to make it easier to build light-clients for IPFS. We want them to have +low memory footprints on arbitrary sized files. The main pain point preventing +this is the fact that CAR ordering isn't specified. + +This require to keeping some kind of reference either on disk, or in memory to +previously seen blocks for two reasons. + +1. Blocks can arrive out of order, meaning when a block is consumed (data is + red and returned to the consumer) and when it's received might not match. +1. Blocks can be reused multiple times, this is handy for cases when you plan + to cache on disk but not at all when you want to process a stream with use & + forget policy. + +What we really want is for the gateway to help us a bit, and give us blocks in +a useful order. + +The existing Trustless Gateway specification does not provide a mechanism for +negotiating the order of blocks in CAR responses. + +This IPIP aims to improve the status quo. + +## Detailed design + +CAR content type +([`application/vnd.ipld.car`](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) +already supports `version` parameter, which allows gateway to indicate which +CAR flavour is returned with the response. + +The proposed solution introduces two new parameters for the content type headers +in HTTP requests and responses: `order` and `dups`. + +The `order` parameter allows the client to indicate its preference for a +specific block order in the CAR response, and the `dups` parameter specifies +whether duplicate blocks are allowed in the response. + +### Signaling in Request + +Content type negotiation is based on section 12.5.1 of :cite[rfc9110]. + +Clients MAY indicate their preferred block order by sending an `Accept` header in +the HTTP request. The `Accept` header format is as follows: + +``` +Accept: application/vnd.ipld.car; version=1; order=dfs; dups=y +``` + +In the future, when more orders or parameters exist, clients will be able to +specify a list of preferences, for example: + +``` +Accept: application/vnd.ipld.car;order=foo, application/vnd.ipld.car;order=dfs;dups=y;q=0.5 +``` + +The above example is a list of preferences, the client would really like to use +the hypothetical `order=foo` however if this isn't available it would accept +`order=dfs` with `dups=y` instead (lower priority indicated via `q` parameter, +as noted in :cite[rfc9110]). + +#### `order` CAR content type parameter + +The `order` parameter accepts the following values: + +- `dfs`: [Depth-First Search](https://en.wikipedia.org/wiki/Depth-first_search) + order, allows for streaming responses with minimal memory usage +- `rnd`: Unknown (random) order, the implicit default when `order` parameter is missing. + +#### `dups` CAR content type parameter + +The `dups` parameter specifies whether duplicate blocks (the same block +occuring multiple times in the requested DAG) will be present in the CAR +response. + +It accepts two values: +- `y`: duplicate blocks are allowed +- `n`: duplicates are not allowed + +When allowed (`y`), light clients are able to discard blocks after +reading them, removing the need for caching in-memory or on-disk. + + + +### Signaling in Response + +The Trustless Gateway MUST always respond with a `Content-Type` header that includes +information about all supported/known parameters, even if the client did not +specify them in the request. + +The `Content-Type` header format is as follows: + +``` +Content-Type: application/vnd.ipld.car;version=1;order=dfs;dups=y +``` + + +Gateway implementations are free to decide on the implicit default ordering or +other parameters, and use it in responses when client did not explicitly +specify, or requested unsupported or unknown query parameter. + +Implementations MAY choose to implement only some of the parameters. + +## Design rationale + +The proposed specification change aims to address the limitations of the +existing Trustless Gateway specification by introducing a mechanism for +negotiating the block order in CAR responses. + +By allowing clients to indicate their preferred block order, Trustless Gateways +can cache CAR responses for popular content, resulting in improved performance +and reduced network load. Clients benefit from more efficient data handling by +deserializing blocks as they arrive, + +We reuse exiting HTTP content type negotiation, and the CAR content type, which +already had the optional `version` parameter. + +### User benefit + +The proposed specification change brings several benefits to end users: + +1. Improved Performance: Gateways can decide on their implicit default ordering + and cache CAR responses for popular content. In turn, clients can benefit + from strong `Etag` in ordered (deterministic) responses. This reduces the + response time for subsequent requests, resulting in faster content retrieval + for users. + +2. Reduced Memory Usage: Clients no longer need to buffer the entire CAR + response in memory until the deserialization of the requested entity is + finished. With the ability to deserialize blocks as they arrive, users can + conserve memory resources, especially when dealing with large CAR responses. + +3. Efficient Data Handling: By discarding blocks as soon as the CID is + validated and data is deserialized, clients can efficiently process the data + in real-time. This is particularly useful for light clients, IoT devices, + mobile web browsers, and other streaming applications where immediate access + to the data is required. + +4. Customizable Ordering: Clients can indicate their preferred block order in the + `Accept` header, allowing them to prioritize specific ordering strategies that + align with their use cases. This flexibility enhances the user experience + and empowers users to optimize content retrieval according to their needs. + +### Compatibility + +The proposed specification change is backward compatible with existing client +and server implementations. + +Trustless Gateways that do not support the negotiation of block order in CAR +responses will continue to function as before, providing their existing default +behavior, and the clients will be able to detect it by inspecting the +`Content-Type` header present in HTTP response. + +Clients that do not send the `Accept` header or do not recognize the `order` +and `dups` parameters in the `Content-Type` header will receive and process CAR +responses as they did before: buffering/caching all blocks until done with the +final deserialization. + +Existing implementations can choose to adopt the new specification and +implement support for the negotiation of block order incrementally. This allows +for a smooth transition and ensures compatibility with both new and old +clients. + +### Security + +The proposed specification change does not introduce any negative security +implications beyond those already present in the existing Trustless Gateway +specification. It focuses on enhancing performance and data handling without +affecting the underlying security model of IPFS. + +Light clients with support for `order` and `dups` CAR content type parameters +will be able to detect malicious response faster, reducing risks of +memory-based DoS attacks from malicious gateways. + +### Alternatives + +Several alternative approaches were considered before arriving at the proposed solution: + +1. Implicit Server-Side Configuration: Instead of negotiating the block order, + in the CAR response, the Trustless Gateway could have a server-side + configuration that specifies the default order. However, this approach would + limit the flexibility for clients, requiring them to have prior knowledge + about order supported by each gateway. + +2. Fixed Block Order: Another option was to enforce a fixed block order in the + CAR responses. However, this approach would not cater to the varying needs + and preferences of different clients and use cases, and is not backward + compatible with the existing Trustless Gateways which return CAR responses + with Weak `Etag` and unspecified block order. + +3. Separate `X-` HTTP Header: Introduction of a separate HTTP reader was + rejected because we try to use HTTP semantics where possible, and gateways + already use HTTP content type negotiation for CAR `version` and reusing it + saves a few bytes in each round-trip. Also, :cite[rfc6648] advises against + use of `X-` and similar constructs in new protocols. + +The proposed solution of negotiating the block order through headers si +future-proof, allows for flexibility, interoperability, and customization while +maintaining compatibility with existing implementations. + +## Test fixtures + +Implementation compliance can be determined by testing the negotiation process +between clients and Trustless Gateways using various combinations of `order` and +`dups` parameters. + +TODO: +1. a CAR with blocks for a small file in DFS order +2. a CAR with blocks for a small file with one block appearing twice + + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).