From 0ee8e17ded2148d3c7c46d9628c363b6114263b1 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 23 Aug 2022 05:37:28 -0400 Subject: [PATCH 01/27] ipfs: Copy Template --- rfcs/0000-ipfs.md | 58 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 rfcs/0000-ipfs.md diff --git a/rfcs/0000-ipfs.md b/rfcs/0000-ipfs.md new file mode 100644 index 000000000..43569cf57 --- /dev/null +++ b/rfcs/0000-ipfs.md @@ -0,0 +1,58 @@ +--- +feature: (fill me in with a unique ident, my_awesome_feature) +start-date: (fill me in with today's date, YYYY-MM-DD) +author: (name of the main author) +co-authors: (find a buddy later to help out with the RFC) +shepherd-team: (names, to be nominated and accepted by RFC steering committee) +shepherd-leader: (name to be appointed by RFC steering committee) +related-issues: (will contain links to implementation PRs) +--- + +# Summary +[summary]: #summary + +One paragraph explanation of the feature. + +# Motivation +[motivation]: #motivation + +Why are we doing this? What use cases does it support? What is the expected +outcome? + +# Detailed design +[design]: #detailed-design + +This is the core, normative part of the RFC. Explain the design in enough +detail for somebody familiar with the ecosystem to understand, and implement. +This should get into specifics and corner-cases. Yet, this section should also +be terse, avoiding redundancy even at the cost of clarity. + +# Examples and Interactions +[examples-and-interactions]: #examples-and-interactions + +This section illustrates the detailed design. This section should clarify all +confusion the reader has from the previous sections. It is especially important +to counterbalance the desired terseness of the detailed design; if you feel +your detailed design is rudely short, consider making this section longer +instead. + +# Drawbacks +[drawbacks]: #drawbacks + +Why should we *not* do this? + +# Alternatives +[alternatives]: #alternatives + +What other designs have been considered? What is the impact of not doing this? + +# Unresolved questions +[unresolved]: #unresolved-questions + +What parts of the design are still TBD or unknowns? + +# Future work +[future]: #future-work + +What future work, if any, would be implied or impacted by this feature +without being directly part of the work? From ae6ca2d72b72aa15e23fc1581c195f3ba8653bc4 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 23 Aug 2022 07:24:44 -0400 Subject: [PATCH 02/27] ipfs: Start drafting --- rfcs/0000-ipfs.md | 146 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 137 insertions(+), 9 deletions(-) diff --git a/rfcs/0000-ipfs.md b/rfcs/0000-ipfs.md index 43569cf57..b51d6c217 100644 --- a/rfcs/0000-ipfs.md +++ b/rfcs/0000-ipfs.md @@ -1,7 +1,7 @@ --- -feature: (fill me in with a unique ident, my_awesome_feature) +feature: ipfs start-date: (fill me in with today's date, YYYY-MM-DD) -author: (name of the main author) +author: John Ericsion (@Ericson2314) on behalf of [Obsidian Systems](https://obsidian.systems) co-authors: (find a buddy later to help out with the RFC) shepherd-team: (names, to be nominated and accepted by RFC steering committee) shepherd-leader: (name to be appointed by RFC steering committee) @@ -11,21 +11,149 @@ related-issues: (will contain links to implementation PRs) # Summary [summary]: #summary -One paragraph explanation of the feature. +Integrate Nix with IPFS, in phases of increasing sophistication. +This follows the work done and described in https://github.com/obsidiansystems/ipfs-nix-guide/ # Motivation [motivation]: #motivation -Why are we doing this? What use cases does it support? What is the expected -outcome? +## Binary distribution + +Currently distributing Nix binaries takes a lot of bandwidth and storage. +This is a barrier to being a Nix user in areas of slower internet --- which includes the vast majority of the world's population at this time. +This is also a barrier to users running their own caches. + +Content-addressing opens up a *huge* design space of solutions to get around such problems. +IPFS explores many of those solutions. + +## Source distribution and archival + +A goal of the Nix ecosystem is to package software in a way that never bitrots. +Getting in the way of that, however, is the fact source code frequently goes off-line. +The Software Heritage archive is the best in the world, and a natural partner in this effort. + +Unfortunately, as https://www.tweag.io/blog/2020-06-18-software-heritage/ describes at the end, a major challenge is the way nix content-addresses software. +First of all, Nix hashes sources in bespoke ways that no other project will adopt. +Second of all, tarballs instead of the underlying files leaking non-normative details (compression, odd perms, etc.). + +We should natively support git file hashing, which Git repos and Software Heritage both support. +This will completely obliterate these issues. + +IPFS also supports git hashing, and so we also provide a good way for people and intuitions to "pin" the sources they need, especially if those sources include private ones SWH won't have. +Finally, per (Obsidian's bridging work)https://github.com/obsidiansystems/go-ipfs-swh-plugin/ , + +## Not just IPFS + +Many of the IPFS-specific logic could in fact live in a plugin if this is desired. +However, we still need to adjust core abstractions of Nix store layer (as described below) to interface with IPFS in the best possible way. +Those same adjustments would allow Nix to work better with *any* content-addressing system, so alternatives networks/projects to IPFS can also be just as easily experimented with. + +As always with my work, the manta (from Scheme) to follow is + +> *x* should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. + +A ton of misc features have been added to Nix since 2.0, and we are very careful to not increase total ad-hoc complexity more than necessary. + +## Build adoption through seamless interop + +This last argument is more strategic than technical. + +A lot of people in this community would like to see Nix be used more widely, but as much as we all wish otherwise, the fact remains that there is some tension between making nix *better* and making it *more accessible*. + +Nix is very foreign from the "bad conventional" way things are done, and making Nix better can sometimes involve making it even more foreign. +We don't want to steepen the learning curve or make it "seem more weird". + +On the other hand, making Nix more accessible by making it more like tools users are already use-to can obscure or chip-away at Nix's benefits. +We don't want to "pander" in ways that will make Nix faddish but ultimately undermine it's popular over the long haul (see Docker the company's woes). + +One way to get around this tension to me is rather than pushing Nix towards the rest of the world, pushing the rest of the world towards us. +Like-minded projects emphasizing content-addressing are our *natural* partners, and we should work with them to promote Nix-*agnostic* standards that further our values and mission. # Detailed design [design]: #detailed-design -This is the core, normative part of the RFC. Explain the design in enough -detail for somebody familiar with the ecosystem to understand, and implement. -This should get into specifics and corner-cases. Yet, this section should also -be terse, avoiding redundancy even at the cost of clarity. +Each item can be done separately provided its dependent items are also done. + +## Augmented `narinfo` + +- **Purpose**: Binary distribution + +*This is taken from [RFC PR 122](https://github.com/lucasew/rfcs/blob/binary-cache-ipfs/rfcs/0122-binary-cache-ipfs.md), which was abandoned by its author.* + +The purpose of this is a "hybrid" store where the narinfo metadata is still severed via HTTPS, but the data itself is served via IPFS. + +Today, a narinfo looks like this: + +``` +StorePath: /nix/store/gdh8165b7rg4y53v64chjys7mbbw89f9-hello-2.10 +URL: nar/0i6ardx43rdg24ab1nc3mq7f5ykyiamymh1v37gxdv5xh5cm0cmb.nar.xz +Compression: xz +FileHash: sha256:0i6ardx43rdg24ab1nc3mq7f5ykyiamymh1v37gxdv5xh5cm0cmb +FileSize: 40360 +NarHash: sha256:1ddv0iqq47j0awyw7a8dmm8bz71c6ifrliq53kmmsfzjxf3rwvb8 +NarSize: 197528 +References: 7gx4kiv5m0i7d7qkixq2cwzbr10lvxwc-glibc-2.27 gdh8165b7rg4y53v64chjys7mbbw89f9-hello-2.10 +Deriver: 5sj6fdfym58sdaf3r5p87v4l8sj2zlvn-hello-2.10.drv +Sig: cache.nixos.org-1:K0thQEG60rzAK8ZS9f1whb7eRlIshlMDJAm7xvX1oF284H+PTqlicv/wGW6BIj+wWWONHvUZ2MYc+KDArekjDA== +``` + +This RFC proposes new key-value pairs that in this example would be: + +``` +IpfsCid: Qmf8NfV2hnq44RoQw9vxmSpGYTwAovA8FUCxeCJCqmXeNN +IpfsEncoding: {"method":"wrapped-nar","chunking":{"leaf-format":"raw","strategy":"fixed-size"},"layout":"balanced","max-width":174} +``` + +Just as today, the `NarHash` and `NarSize` remain the *normative* way to lock down the store object the `narinfo` file describes. +Conversely, The `URL`, `FileHash` and `FileSize` by contrast are *informational*, describing not what the store object *is*, but *how to get it*. + +The `IpfsCid` and `IpfsEncoding` are likewise informational, describing how to get the store object: + +- `IpfsCid`: Native content address for IPFS. + +- `IpfsEncoding`: Enough info to deterministically rebuild the IPFS representation from a non-IPFS copy of the store object. + + For now, `IpfsEncoding` will only support `unixfs-nar`, which works as follows: + + The NAR is itself wrapped in IPFS's [UnixFS](https://github.com/ipfs/specs/blob/main/UNIXFS.md). + This other format can be extracted from the CID (which is conceptually a pair of encoding metadata and a hash). + For now, only IPFS's "unixfs" is supported. + `chunking`, `layout`, and `max-size` are tuning parameters for unixfs [described in the UnixFS spec](https://github.com/ipfs/specs/blob/main/UNIXFS.md#importing). + + "UNIXFS" is not used directly because it doesn't support the "executable bit** Nix does on files. + NAR archive are not used directly because IPFS doesn't support arbitrary large objects. + +## Git file hashing + +- **Purpose**: Source distribution and archival + +## Content address or store path in Store interface + +- **Purpose**: Source distribution and archival + +## Git fetching for `buitins.fetch` + +- **Purpose**: Source distribution and archival +- **Depends on**: Git file hashing, Content address or store path in Store interface + +## NAR info or content address normative in `ValidPathInfo` + +- **Purpose**: Source distribution and archival + +## IPFS Narinfo + +- **Purpose**: Binary distribution +- **Depends on**: Augmented `narinfo` + +## Wrapped git objects with references + +- **Purpose**: Binary distribution +- **Depends on**: Git file hashing + +## IPLD Derivations + +- **Purpose**: Build plan distribution +- **Depends on**: Wrapped git objects with references # Examples and Interactions [examples-and-interactions]: #examples-and-interactions From 5d74bc67caa5c5eaa589e705f1068c05d3635d93 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 23 Aug 2022 22:17:12 -0400 Subject: [PATCH 03/27] ipfs: Finish draft --- rfcs/0000-ipfs.md | 129 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 109 insertions(+), 20 deletions(-) diff --git a/rfcs/0000-ipfs.md b/rfcs/0000-ipfs.md index b51d6c217..f90781109 100644 --- a/rfcs/0000-ipfs.md +++ b/rfcs/0000-ipfs.md @@ -12,7 +12,7 @@ related-issues: (will contain links to implementation PRs) [summary]: #summary Integrate Nix with IPFS, in phases of increasing sophistication. -This follows the work done and described in https://github.com/obsidiansystems/ipfs-nix-guide/ +This follows the work done and described in https://github.com/obsidiansystems/ipfs-nix-guide/ . # Motivation [motivation]: #motivation @@ -40,7 +40,7 @@ We should natively support git file hashing, which Git repos and Software Herita This will completely obliterate these issues. IPFS also supports git hashing, and so we also provide a good way for people and intuitions to "pin" the sources they need, especially if those sources include private ones SWH won't have. -Finally, per (Obsidian's bridging work)https://github.com/obsidiansystems/go-ipfs-swh-plugin/ , +Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), ## Not just IPFS @@ -123,64 +123,153 @@ The `IpfsCid` and `IpfsEncoding` are likewise informational, describing how to g "UNIXFS" is not used directly because it doesn't support the "executable bit** Nix does on files. NAR archive are not used directly because IPFS doesn't support arbitrary large objects. +## IPFS Narinfo and "stateful" IPFS Store + +- **Purpose**: Binary distribution +- **Depends on**: Augmented `narinfo` + +Instead of a "hybrid" store, where the narinfo index is served with HTTP but the data itself is served with IPFS, we can do an all-IPFS store with the data itself and mutable index stored in IPFS. +The Narinfo instead of being encoded the legacy line-oriented text format can be IPFS's native DAG-CBOR IPLD codec, which is like JSON + content address links (but stored as CBOR). +This allows Narinfos to reference each other and be nicely structured so the index is legible from Nix-agnostic IPFS tools and recursive pinning comes for free. + +Read-only is easier, since IPFS data is immutable but "writable" stores are supported by simple printing back a new CID for the new store root after some modifications, or modifying a mutable IPNS reference. +IPNS is historically slow, but the update is automatic. +Printing out a new CID for the index root allows the store administrator to update an out-of-bound mutable reference, but this cannot be automated because Nix doesn't know what the out-of-band method is. + ## Git file hashing - **Purpose**: Source distribution and archival -## Content address or store path in Store interface +In addition to the various forms of content-addressing Nix supports today ("text", "fixed" with either "flat" or "nar" serialization of file system objects), Nix should support Git hashing. +This support entails two basic things: -- **Purpose**: Source distribution and archival + - Content addresses are used to compute store paths. + - Content addresses are used to verify store object integrity. + +Git hashing would not support references (since references in Nix's sense are not a Git concept), but that is not an issue for the intended use-case of exchanging source code. -## Git fetching for `buitins.fetch` +## Git file hashing for `buitins.fetch*` - **Purpose**: Source distribution and archival -- **Depends on**: Git file hashing, Content address or store path in Store interface +- **Depends on**: Git file hashing, + +The builtin fetchers can also be made to work with git file hashing just as they support the other types. +In addition, Git repo fetching can leverage this better to than the other formats since the data in git repos is already content-addressed in this way. + +## Content address or store path in Store interface + +- **Purpose**: All distribution + +Modify many store interface methods that today take store paths to instead accept *either* a store path or a content address. + +For stores that are purpose-built for Nix, like the ones we support today, all addressing can be done store paths, so the current interface is fine. +But for Nix-agnostic stores, store paths are rather useless as a key type because Nix-agnostic tools don't know about them. +They can, however, understand content addresses. +And from such a content address we can always produce a store path again, so there is no loss of functionality with existing stores. ## NAR info or content address normative in `ValidPathInfo` - **Purpose**: Source distribution and archival +- **Depends on**: Content address or store path in Store interface, -## IPFS Narinfo +As described in the first step, currently `NarHash` and `NarSize` are the *normative* fields which are used to verify a store object. +But if the store object is content-addressed, we don't need these, because the content address (`CA` field) will also suffice, all by itself. +Relax the abstract `ValidPathInfo` type to merely require that *one of* `NarHash` and `NarSize` or `CA` be defined. -- **Purpose**: Binary distribution -- **Depends on**: Augmented `narinfo` +Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compat and don't come with a cost. +Only new nix-agnostic store types would take advantage of these new, relaxed rules. + +## "stateless" IPFS store + +- **Purpose**: Source distribution and archival +- **Depends on**: NAR info or content address normative in `ValidPathInfo` + +Use the above functionality to create a "stateless" IPFS store. +Opaque store path lookups always fail, but when the key is the new content address type, we can translate the key itself into a CID that we can look up. + +Unlike the previous two flavours of IPFS store, this one is stateless in that there is no need for an index at all. +Only content-addressed data is looked up, and it doesn't need any nar-info metadata before the data is all there. + +We need the previous step for querying without fetching any data. +In that case since there is no narinfo index we're looking up, we don't get any additional metadata back. +But the content address key a successful query used is enough to create a bare-bones `ValidPathInfo` with a `CA` field, which with the enough step is valid. + +(A bare-bones `ValidPathInfo` might sound sub-par, but for plain old content-addressed data it is fine. +Most of the other metadata in `ValidPathInfo` is really just for input-addressed derivation outputs, and is thus obviated by CA derivation trust maps which contain the same data but more naturally.) ## Wrapped git objects with references - **Purpose**: Binary distribution - **Depends on**: Git file hashing +Merkelized formats like git file hashing are better than NAR because that allow for very natural deduplication and minimal transfers. +This is the same benefit we get today with Nix within a closure of multiple store objects, now also *within a single store object*. +But git has no notion of Nix-style references, so plain git hashing is only suitable for leaf store objects without references (like source code). + +However, we can use IPLD to wrap git-hashed data with a reference set, and "has self reference" bit. +This easily creates a new content addressing scheme which handles all "shapes" of store objects. +This gives is a nice way to thus share arbitrary nix store data (provided it is content-addressed) over IPFS. + +Like with "IPFS Narinfo", this format is also very easy to understand with nix-agnostic native IPFS tools. +This is because, once again, the reference graph is made native to IPFS not done indirectly with store path strings which must be looked up. + +An interesting corollary to note: +Content addressing today is "shallow", in that references are arbitrary store paths. +With this form on content addressing, references are instead CIDs (native IPFS references) to other obligatorily content-addressed data. +This means the content addressing is "deep", such that any such content-addressed store object always has a content-addressed closure. +At the cost of interop with existing derivation outputs, this make such data easier to manage because there are fewer trust issues and degrees of freedom in general for something to go wrong. + ## IPLD Derivations - **Purpose**: Build plan distribution -- **Depends on**: Wrapped git objects with references +- **Depends on**: Wrapped git objects with references, + IPFS as substitutor + +Natively represent derivations in IPFS, again with the same benefits of leverage the native graph representations. + +This is a culmination of all the futures so far. +The derivations must be CA derivations (floating or fixed). +They must also produce wrapped git objects with references, though they can also depend on regular unwrapped git file hashed store objects. + +The derivations and their outputs are thus all fully IPFS native, leveraging the IPFS graph and trust vs plain old data separation for the high standard of interoperability. # Examples and Interactions [examples-and-interactions]: #examples-and-interactions -This section illustrates the detailed design. This section should clarify all -confusion the reader has from the previous sections. It is especially important -to counterbalance the desired terseness of the detailed design; if you feel -your detailed design is rudely short, consider making this section longer -instead. +We encourage anyone interested to check our tutorial in https://github.com/obsidiansystems/ipfs-nix-guide/ which demonstrates the above functionality. +Note at the time of writing this guide uses our original 2020 fork of Nix. # Drawbacks [drawbacks]: #drawbacks -Why should we *not* do this? +The main cost is more complexity to the store layer. +For two reason we think this is not so bad: + +1. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. + This is the sense in which we are "removing the weaknesses and restrictions that make additional features appear necessary" per the Scheme mantra cited above: + Instead of extending the model wit new features, we are reflaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. + +2. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. + With Flakes and other post-2.0 features, the upper layer of Nix have gained an enormous amount of flexibility and sophistication. + RFCs like this show that the so-far more sleepy lower layers also have plenty of potential to gain sophistication too. + + Embracing layering on technical, educational, communications, and managerial levels can scale our capacity to manage complexity and sophistication without the project growing out of control. + It will "divide and conquer" the project so the interfaces between each layer are still rigorously enforced preventing a combinatorial explosion in complexity. + + We plan on more formally proposing this next. # Alternatives [alternatives]: #alternatives -What other designs have been considered? What is the impact of not doing this? +The dependency graph of steps can be sliced to save some for future work. +For now they are all written together, but during the RFC meetings we will decide which steps (if any) to ratify now, and which steps to save for later. # Unresolved questions [unresolved]: #unresolved-questions -What parts of the design are still TBD or unknowns? +Per the above, deciding which steps to leave as future work. # Future work [future]: #future-work -What future work, if any, would be implied or impacted by this feature -without being directly part of the work? +Chiefly, any steps which we don't wish to commit to initially; to be decided as described above. From c858e40e1266c19e59b8717987def7c2f5d7f670 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Tue, 23 Aug 2022 22:20:49 -0400 Subject: [PATCH 04/27] ipfs: Expand discussion of managing complexity --- rfcs/0000-ipfs.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/rfcs/0000-ipfs.md b/rfcs/0000-ipfs.md index f90781109..8221b84ab 100644 --- a/rfcs/0000-ipfs.md +++ b/rfcs/0000-ipfs.md @@ -243,18 +243,24 @@ Note at the time of writing this guide uses our original 2020 fork of Nix. [drawbacks]: #drawbacks The main cost is more complexity to the store layer. -For two reason we think this is not so bad: +For a few reason we think this is not so bad. -1. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. +Most importantly is the division of the work into a dependency graph of steps. +This allows us to slowly try IPFS out and not commit to more change than we want to up front. + +Even if we do end up adopting everything though, we thing for the following two reasons the complexity can still be kept manageable: + +2. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. This is the sense in which we are "removing the weaknesses and restrictions that make additional features appear necessary" per the Scheme mantra cited above: Instead of extending the model wit new features, we are reflaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. -2. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. +3. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. With Flakes and other post-2.0 features, the upper layer of Nix have gained an enormous amount of flexibility and sophistication. RFCs like this show that the so-far more sleepy lower layers also have plenty of potential to gain sophistication too. Embracing layering on technical, educational, communications, and managerial levels can scale our capacity to manage complexity and sophistication without the project growing out of control. It will "divide and conquer" the project so the interfaces between each layer are still rigorously enforced preventing a combinatorial explosion in complexity. + That frees up "complexity budget" for project like this. We plan on more formally proposing this next. From 3fa874ea314202024734383e502af6d2ce94a4a7 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sat, 27 Aug 2022 22:34:21 -0400 Subject: [PATCH 05/27] ipfs: Fix typos Thanks! --- rfcs/0000-ipfs.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/rfcs/0000-ipfs.md b/rfcs/0000-ipfs.md index 8221b84ab..0edfcf2e2 100644 --- a/rfcs/0000-ipfs.md +++ b/rfcs/0000-ipfs.md @@ -39,7 +39,7 @@ Second of all, tarballs instead of the underlying files leaking non-normative de We should natively support git file hashing, which Git repos and Software Heritage both support. This will completely obliterate these issues. -IPFS also supports git hashing, and so we also provide a good way for people and intuitions to "pin" the sources they need, especially if those sources include private ones SWH won't have. +IPFS also supports git hashing, and so we also provide a good way for people and institutions to "pin" the sources they need, especially if those sources include private ones SWH won't have. Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), ## Not just IPFS @@ -48,7 +48,7 @@ Many of the IPFS-specific logic could in fact live in a plugin if this is desire However, we still need to adjust core abstractions of Nix store layer (as described below) to interface with IPFS in the best possible way. Those same adjustments would allow Nix to work better with *any* content-addressing system, so alternatives networks/projects to IPFS can also be just as easily experimented with. -As always with my work, the manta (from Scheme) to follow is +As always with my work, the mantra (from Scheme) to follow is > *x* should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. @@ -64,7 +64,7 @@ Nix is very foreign from the "bad conventional" way things are done, and making We don't want to steepen the learning curve or make it "seem more weird". On the other hand, making Nix more accessible by making it more like tools users are already use-to can obscure or chip-away at Nix's benefits. -We don't want to "pander" in ways that will make Nix faddish but ultimately undermine it's popular over the long haul (see Docker the company's woes). +We don't want to "pander" in ways that will make Nix faddish but ultimately undermine it's popularity over the long haul (see Docker the company's woes). One way to get around this tension to me is rather than pushing Nix towards the rest of the world, pushing the rest of the world towards us. Like-minded projects emphasizing content-addressing are our *natural* partners, and we should work with them to promote Nix-*agnostic* standards that further our values and mission. @@ -227,7 +227,7 @@ At the cost of interop with existing derivation outputs, this make such data eas Natively represent derivations in IPFS, again with the same benefits of leverage the native graph representations. -This is a culmination of all the futures so far. +This is a culmination of all the features so far. The derivations must be CA derivations (floating or fixed). They must also produce wrapped git objects with references, though they can also depend on regular unwrapped git file hashed store objects. @@ -248,7 +248,7 @@ For a few reason we think this is not so bad. Most importantly is the division of the work into a dependency graph of steps. This allows us to slowly try IPFS out and not commit to more change than we want to up front. -Even if we do end up adopting everything though, we thing for the following two reasons the complexity can still be kept manageable: +Even if we do end up adopting everything though, we think for the following two reasons the complexity can still be kept manageable: 2. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. This is the sense in which we are "removing the weaknesses and restrictions that make additional features appear necessary" per the Scheme mantra cited above: From 2f25a1a193d378f0e8b4e4185eb6be47862874a9 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sat, 27 Aug 2022 22:39:24 -0400 Subject: [PATCH 06/27] ipfs: Fix more typos Thanks! --- rfcs/0000-ipfs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0000-ipfs.md b/rfcs/0000-ipfs.md index 0edfcf2e2..89542c855 100644 --- a/rfcs/0000-ipfs.md +++ b/rfcs/0000-ipfs.md @@ -252,7 +252,7 @@ Even if we do end up adopting everything though, we think for the following two 2. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. This is the sense in which we are "removing the weaknesses and restrictions that make additional features appear necessary" per the Scheme mantra cited above: - Instead of extending the model wit new features, we are reflaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. + Instead of extending the model with new features, we are relaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. 3. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. With Flakes and other post-2.0 features, the upper layer of Nix have gained an enormous amount of flexibility and sophistication. From 56ad43fbfd4650ce694b0d79a6c72189a994b5ca Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sat, 27 Aug 2022 22:44:22 -0400 Subject: [PATCH 07/27] ipfs: FInish motivation on source distribution and archival --- rfcs/0000-ipfs.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/rfcs/0000-ipfs.md b/rfcs/0000-ipfs.md index 89542c855..bde60ab78 100644 --- a/rfcs/0000-ipfs.md +++ b/rfcs/0000-ipfs.md @@ -40,7 +40,10 @@ We should natively support git file hashing, which Git repos and Software Herita This will completely obliterate these issues. IPFS also supports git hashing, and so we also provide a good way for people and institutions to "pin" the sources they need, especially if those sources include private ones SWH won't have. -Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), +Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), we have a 3 way integration between IPFS, SWH, and Nix. +Data can be directly downloaded from SWH via HTTPS, or indirectly via IPFS, which can act as a CDN to not put as much load on SWH's servers. + +Overall, are building out a uniform way to work with source code, regardless of its origins or the exact tools involved. ## Not just IPFS From 2290ece07d4fc24fb69a86d584c6cbfcd94e589c Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sat, 27 Aug 2022 23:20:17 -0400 Subject: [PATCH 08/27] ipfs: Rename now that we have number --- rfcs/{0000-ipfs.md => 0133-ipfs.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{0000-ipfs.md => 0133-ipfs.md} (100%) diff --git a/rfcs/0000-ipfs.md b/rfcs/0133-ipfs.md similarity index 100% rename from rfcs/0000-ipfs.md rename to rfcs/0133-ipfs.md From 69b0c461782d587a7455c4a653d72161f5fe09ff Mon Sep 17 00:00:00 2001 From: John Ericson Date: Mon, 29 Aug 2022 10:47:45 -0400 Subject: [PATCH 09/27] Apply suggestions from code review Thanks! Co-authored-by: Kevin Cox --- rfcs/0133-ipfs.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-ipfs.md index bde60ab78..51d13643e 100644 --- a/rfcs/0133-ipfs.md +++ b/rfcs/0133-ipfs.md @@ -124,7 +124,7 @@ The `IpfsCid` and `IpfsEncoding` are likewise informational, describing how to g `chunking`, `layout`, and `max-size` are tuning parameters for unixfs [described in the UnixFS spec](https://github.com/ipfs/specs/blob/main/UNIXFS.md#importing). "UNIXFS" is not used directly because it doesn't support the "executable bit** Nix does on files. - NAR archive are not used directly because IPFS doesn't support arbitrary large objects. + NAR archive are not used directly because IPFS doesn't support arbitrarily large objects. ## IPFS Narinfo and "stateful" IPFS Store @@ -177,7 +177,7 @@ And from such a content address we can always produce a store path again, so the As described in the first step, currently `NarHash` and `NarSize` are the *normative* fields which are used to verify a store object. But if the store object is content-addressed, we don't need these, because the content address (`CA` field) will also suffice, all by itself. -Relax the abstract `ValidPathInfo` type to merely require that *one of* `NarHash` and `NarSize` or `CA` be defined. +Relax the abstract `ValidPathInfo` type to merely require that *either* the pair of `NarHash` and `NarSize` or just `CA` alone be defined. Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compat and don't come with a cost. Only new nix-agnostic store types would take advantage of these new, relaxed rules. From d7c3a839c67c593f9fc040e70ab7fef71dc4bae7 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Thu, 8 Sep 2022 11:06:04 -0400 Subject: [PATCH 10/27] Fix typos Thanks! Co-authored-by: Adam Joseph <54836058+amjoseph-nixpkgs@users.noreply.github.com> --- rfcs/0133-ipfs.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-ipfs.md index 51d13643e..b61c8fb67 100644 --- a/rfcs/0133-ipfs.md +++ b/rfcs/0133-ipfs.md @@ -83,7 +83,7 @@ Each item can be done separately provided its dependent items are also done. *This is taken from [RFC PR 122](https://github.com/lucasew/rfcs/blob/binary-cache-ipfs/rfcs/0122-binary-cache-ipfs.md), which was abandoned by its author.* -The purpose of this is a "hybrid" store where the narinfo metadata is still severed via HTTPS, but the data itself is served via IPFS. +The purpose of this is a "hybrid" store where the narinfo metadata is still served via HTTPS, but the data itself is served via IPFS. Today, a narinfo looks like this: @@ -121,9 +121,9 @@ The `IpfsCid` and `IpfsEncoding` are likewise informational, describing how to g The NAR is itself wrapped in IPFS's [UnixFS](https://github.com/ipfs/specs/blob/main/UNIXFS.md). This other format can be extracted from the CID (which is conceptually a pair of encoding metadata and a hash). For now, only IPFS's "unixfs" is supported. - `chunking`, `layout`, and `max-size` are tuning parameters for unixfs [described in the UnixFS spec](https://github.com/ipfs/specs/blob/main/UNIXFS.md#importing). + `chunking`, `layout`, and `max-width` are tuning parameters for unixfs [described in the UnixFS spec](https://github.com/ipfs/specs/blob/main/UNIXFS.md#importing). - "UNIXFS" is not used directly because it doesn't support the "executable bit** Nix does on files. + "UNIXFS" is not used directly because it doesn't support the "executable bit" Nix does on files. NAR archive are not used directly because IPFS doesn't support arbitrarily large objects. ## IPFS Narinfo and "stateful" IPFS Store From a790811bf55797713aa2a5d78c579aeb4328d31c Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 14 Dec 2022 09:40:59 -0500 Subject: [PATCH 11/27] 133: Add shepherd team! Co-authored-by: Eelco Dolstra --- rfcs/0133-ipfs.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-ipfs.md index b61c8fb67..afa6459bb 100644 --- a/rfcs/0133-ipfs.md +++ b/rfcs/0133-ipfs.md @@ -3,8 +3,8 @@ feature: ipfs start-date: (fill me in with today's date, YYYY-MM-DD) author: John Ericsion (@Ericson2314) on behalf of [Obsidian Systems](https://obsidian.systems) co-authors: (find a buddy later to help out with the RFC) -shepherd-team: (names, to be nominated and accepted by RFC steering committee) -shepherd-leader: (name to be appointed by RFC steering committee) +shepherd-team: edolstra, kevincox, gador, @mjoseph-nixpkgs +shepherd-leader: mjoseph-nixpkgs related-issues: (will contain links to implementation PRs) --- From f134f8c646ff9eb7e07675596063ade7bfab97cc Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 1 Feb 2023 13:04:38 -0500 Subject: [PATCH 12/27] 133: Fix shepherds list mjoseph -> amjoseph --- rfcs/0133-ipfs.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-ipfs.md index afa6459bb..b2113b51f 100644 --- a/rfcs/0133-ipfs.md +++ b/rfcs/0133-ipfs.md @@ -3,8 +3,8 @@ feature: ipfs start-date: (fill me in with today's date, YYYY-MM-DD) author: John Ericsion (@Ericson2314) on behalf of [Obsidian Systems](https://obsidian.systems) co-authors: (find a buddy later to help out with the RFC) -shepherd-team: edolstra, kevincox, gador, @mjoseph-nixpkgs -shepherd-leader: mjoseph-nixpkgs +shepherd-team: edolstra, kevincox, gador, @amjoseph-nixpkgs +shepherd-leader: amjoseph-nixpkgs related-issues: (will contain links to implementation PRs) --- From fd494d185ad01b7ceebee931f0eebc7ac67efd0d Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 1 Feb 2023 17:27:08 -0500 Subject: [PATCH 13/27] 133: Move non-`git` steps to future work --- rfcs/0133-ipfs.md | 162 ++++++++++++++++++++++++---------------------- 1 file changed, 83 insertions(+), 79 deletions(-) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-ipfs.md index b2113b51f..1fd4947c0 100644 --- a/rfcs/0133-ipfs.md +++ b/rfcs/0133-ipfs.md @@ -40,7 +40,7 @@ We should natively support git file hashing, which Git repos and Software Herita This will completely obliterate these issues. IPFS also supports git hashing, and so we also provide a good way for people and institutions to "pin" the sources they need, especially if those sources include private ones SWH won't have. -Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), we have a 3 way integration between IPFS, SWH, and Nix. +Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), we have a 3 way integration between IPFS, SWH, and Nix. Data can be directly downloaded from SWH via HTTPS, or indirectly via IPFS, which can act as a CDN to not put as much load on SWH's servers. Overall, are building out a uniform way to work with source code, regardless of its origins or the exact tools involved. @@ -76,6 +76,88 @@ Like-minded projects emphasizing content-addressing are our *natural* partners, [design]: #detailed-design Each item can be done separately provided its dependent items are also done. +These are the items we wish to commit to at this time. +(See [Future Work](#future-work) for other steps left for later.) + +## Git file hashing + +- **Purpose**: Source distribution and archival + +In addition to the various forms of content-addressing Nix supports today ("text", "fixed" with either "flat" or "nar" serialization of file system objects), Nix should support Git hashing. +This support entails two basic things: + + - Content addresses are used to compute store paths. + - Content addresses are used to verify store object integrity. + +Git hashing would not support references (since references in Nix's sense are not a Git concept), but that is not an issue for the intended use-case of exchanging source code. + +## Git file hashing for `buitins.fetch*` + +- **Purpose**: Source distribution and archival +- **Depends on**: Git file hashing, + +The builtin fetchers can also be made to work with git file hashing just as they support the other types. +In addition, Git repo fetching can leverage this better to than the other formats since the data in git repos is already content-addressed in this way. + +## Content address or store path in Store interface + +- **Purpose**: All distribution + +Modify many store interface methods that today take store paths to instead accept *either* a store path or a content address. + +For stores that are purpose-built for Nix, like the ones we support today, all addressing can be done store paths, so the current interface is fine. +But for Nix-agnostic stores, store paths are rather useless as a key type because Nix-agnostic tools don't know about them. +They can, however, understand content addresses. +And from such a content address we can always produce a store path again, so there is no loss of functionality with existing stores. + +# Examples and Interactions +[examples-and-interactions]: #examples-and-interactions + +We encourage anyone interested to check our tutorial in https://github.com/obsidiansystems/ipfs-nix-guide/ which demonstrates the above functionality. +Note at the time of writing this guide uses our original 2020 fork of Nix. + +# Drawbacks +[drawbacks]: #drawbacks + +The main cost is more complexity to the store layer. +For a few reason we think this is not so bad. + +Most importantly is the division of the work into a dependency graph of steps. +This allows us to slowly try IPFS out and not commit to more change than we want to up front. + +Even if we do end up adopting everything though, we think for the following two reasons the complexity can still be kept manageable: + +2. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. + This is the sense in which we are "removing the weaknesses and restrictions that make additional features appear necessary" per the Scheme mantra cited above: + Instead of extending the model with new features, we are relaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. + +3. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. + With Flakes and other post-2.0 features, the upper layer of Nix have gained an enormous amount of flexibility and sophistication. + RFCs like this show that the so-far more sleepy lower layers also have plenty of potential to gain sophistication too. + + Embracing layering on technical, educational, communications, and managerial levels can scale our capacity to manage complexity and sophistication without the project growing out of control. + It will "divide and conquer" the project so the interfaces between each layer are still rigorously enforced preventing a combinatorial explosion in complexity. + That frees up "complexity budget" for project like this. + + We plan on more formally proposing this next. + +# Alternatives +[alternatives]: #alternatives + +The dependency graph of steps can be sliced to save some for future work. +For now they are all written together, but during the RFC meetings we will decide which steps (if any) to ratify now, and which steps to save for later. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None at this time. + +# Future work +[future]: #future-work + +Each item can be done separately provided its dependent items are also done. +These are the items we do *not* wish to commit to at this time, but leave for later. +(See [Detailed Design](#detailed-design) for other steps left for later.) ## Augmented `narinfo` @@ -139,37 +221,6 @@ Read-only is easier, since IPFS data is immutable but "writable" stores are supp IPNS is historically slow, but the update is automatic. Printing out a new CID for the index root allows the store administrator to update an out-of-bound mutable reference, but this cannot be automated because Nix doesn't know what the out-of-band method is. -## Git file hashing - -- **Purpose**: Source distribution and archival - -In addition to the various forms of content-addressing Nix supports today ("text", "fixed" with either "flat" or "nar" serialization of file system objects), Nix should support Git hashing. -This support entails two basic things: - - - Content addresses are used to compute store paths. - - Content addresses are used to verify store object integrity. - -Git hashing would not support references (since references in Nix's sense are not a Git concept), but that is not an issue for the intended use-case of exchanging source code. - -## Git file hashing for `buitins.fetch*` - -- **Purpose**: Source distribution and archival -- **Depends on**: Git file hashing, - -The builtin fetchers can also be made to work with git file hashing just as they support the other types. -In addition, Git repo fetching can leverage this better to than the other formats since the data in git repos is already content-addressed in this way. - -## Content address or store path in Store interface - -- **Purpose**: All distribution - -Modify many store interface methods that today take store paths to instead accept *either* a store path or a content address. - -For stores that are purpose-built for Nix, like the ones we support today, all addressing can be done store paths, so the current interface is fine. -But for Nix-agnostic stores, store paths are rather useless as a key type because Nix-agnostic tools don't know about them. -They can, however, understand content addresses. -And from such a content address we can always produce a store path again, so there is no loss of functionality with existing stores. - ## NAR info or content address normative in `ValidPathInfo` - **Purpose**: Source distribution and archival @@ -235,50 +286,3 @@ The derivations must be CA derivations (floating or fixed). They must also produce wrapped git objects with references, though they can also depend on regular unwrapped git file hashed store objects. The derivations and their outputs are thus all fully IPFS native, leveraging the IPFS graph and trust vs plain old data separation for the high standard of interoperability. - -# Examples and Interactions -[examples-and-interactions]: #examples-and-interactions - -We encourage anyone interested to check our tutorial in https://github.com/obsidiansystems/ipfs-nix-guide/ which demonstrates the above functionality. -Note at the time of writing this guide uses our original 2020 fork of Nix. - -# Drawbacks -[drawbacks]: #drawbacks - -The main cost is more complexity to the store layer. -For a few reason we think this is not so bad. - -Most importantly is the division of the work into a dependency graph of steps. -This allows us to slowly try IPFS out and not commit to more change than we want to up front. - -Even if we do end up adopting everything though, we think for the following two reasons the complexity can still be kept manageable: - -2. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. - This is the sense in which we are "removing the weaknesses and restrictions that make additional features appear necessary" per the Scheme mantra cited above: - Instead of extending the model with new features, we are relaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. - -3. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. - With Flakes and other post-2.0 features, the upper layer of Nix have gained an enormous amount of flexibility and sophistication. - RFCs like this show that the so-far more sleepy lower layers also have plenty of potential to gain sophistication too. - - Embracing layering on technical, educational, communications, and managerial levels can scale our capacity to manage complexity and sophistication without the project growing out of control. - It will "divide and conquer" the project so the interfaces between each layer are still rigorously enforced preventing a combinatorial explosion in complexity. - That frees up "complexity budget" for project like this. - - We plan on more formally proposing this next. - -# Alternatives -[alternatives]: #alternatives - -The dependency graph of steps can be sliced to save some for future work. -For now they are all written together, but during the RFC meetings we will decide which steps (if any) to ratify now, and which steps to save for later. - -# Unresolved questions -[unresolved]: #unresolved-questions - -Per the above, deciding which steps to leave as future work. - -# Future work -[future]: #future-work - -Chiefly, any steps which we don't wish to commit to initially; to be decided as described above. From d3b531366f304e1bf729bfc0466a7ce601073ab8 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 15 Feb 2023 00:15:04 -0500 Subject: [PATCH 14/27] 133: Move one more section out of future work --- rfcs/0133-ipfs.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-ipfs.md index 1fd4947c0..9b0a7aa72 100644 --- a/rfcs/0133-ipfs.md +++ b/rfcs/0133-ipfs.md @@ -110,6 +110,18 @@ But for Nix-agnostic stores, store paths are rather useless as a key type becaus They can, however, understand content addresses. And from such a content address we can always produce a store path again, so there is no loss of functionality with existing stores. +## NAR info or content address normative in `ValidPathInfo` + +- **Purpose**: Source distribution and archival +- **Depends on**: Content address or store path in Store interface, + +As described in the first step, currently `NarHash` and `NarSize` are the *normative* fields which are used to verify a store object. +But if the store object is content-addressed, we don't need these, because the content address (`CA` field) will also suffice, all by itself. +Relax the abstract `ValidPathInfo` type to merely require that *either* the pair of `NarHash` and `NarSize` or just `CA` alone be defined. + +Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compat and don't come with a cost. +Only new nix-agnostic store types would take advantage of these new, relaxed rules. + # Examples and Interactions [examples-and-interactions]: #examples-and-interactions @@ -221,18 +233,6 @@ Read-only is easier, since IPFS data is immutable but "writable" stores are supp IPNS is historically slow, but the update is automatic. Printing out a new CID for the index root allows the store administrator to update an out-of-bound mutable reference, but this cannot be automated because Nix doesn't know what the out-of-band method is. -## NAR info or content address normative in `ValidPathInfo` - -- **Purpose**: Source distribution and archival -- **Depends on**: Content address or store path in Store interface, - -As described in the first step, currently `NarHash` and `NarSize` are the *normative* fields which are used to verify a store object. -But if the store object is content-addressed, we don't need these, because the content address (`CA` field) will also suffice, all by itself. -Relax the abstract `ValidPathInfo` type to merely require that *either* the pair of `NarHash` and `NarSize` or just `CA` alone be defined. - -Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compat and don't come with a cost. -Only new nix-agnostic store types would take advantage of these new, relaxed rules. - ## "stateless" IPFS store - **Purpose**: Source distribution and archival From 65755649dc4cfc46c9781902fa90800db5c8acd4 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 15 Feb 2023 01:05:49 -0500 Subject: [PATCH 15/27] 133: Move IPFS-specific motivation to future work too --- rfcs/0133-ipfs.md | 65 +++++++++++++++++++++++++++++------------------ 1 file changed, 40 insertions(+), 25 deletions(-) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-ipfs.md index 9b0a7aa72..51e1c7943 100644 --- a/rfcs/0133-ipfs.md +++ b/rfcs/0133-ipfs.md @@ -11,7 +11,10 @@ related-issues: (will contain links to implementation PRs) # Summary [summary]: #summary -Integrate Nix with IPFS, in phases of increasing sophistication. +Integrate Git hashing with Nix. + +Nix should support content-addressed store objects using git blob + tree hashing, and Nix-unaware remote stores that serve git objects. + This follows the work done and described in https://github.com/obsidiansystems/ipfs-nix-guide/ . # Motivation @@ -24,7 +27,8 @@ This is a barrier to being a Nix user in areas of slower internet --- which incl This is also a barrier to users running their own caches. Content-addressing opens up a *huge* design space of solutions to get around such problems. -IPFS explores many of those solutions. + +The first steps proposed below do *not* tackle this problem directly, but it lays the ground-work for future experiments in this direction. ## Source distribution and archival @@ -39,24 +43,8 @@ Second of all, tarballs instead of the underlying files leaking non-normative de We should natively support git file hashing, which Git repos and Software Heritage both support. This will completely obliterate these issues. -IPFS also supports git hashing, and so we also provide a good way for people and institutions to "pin" the sources they need, especially if those sources include private ones SWH won't have. -Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), we have a 3 way integration between IPFS, SWH, and Nix. -Data can be directly downloaded from SWH via HTTPS, or indirectly via IPFS, which can act as a CDN to not put as much load on SWH's servers. - Overall, are building out a uniform way to work with source code, regardless of its origins or the exact tools involved. -## Not just IPFS - -Many of the IPFS-specific logic could in fact live in a plugin if this is desired. -However, we still need to adjust core abstractions of Nix store layer (as described below) to interface with IPFS in the best possible way. -Those same adjustments would allow Nix to work better with *any* content-addressing system, so alternatives networks/projects to IPFS can also be just as easily experimented with. - -As always with my work, the mantra (from Scheme) to follow is - -> *x* should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. - -A ton of misc features have been added to Nix since 2.0, and we are very careful to not increase total ad-hoc complexity more than necessary. - ## Build adoption through seamless interop This last argument is more strategic than technical. @@ -135,7 +123,7 @@ The main cost is more complexity to the store layer. For a few reason we think this is not so bad. Most importantly is the division of the work into a dependency graph of steps. -This allows us to slowly try IPFS out and not commit to more change than we want to up front. +This allows us to slowly try out things like IPFS that leverage git hashing, and not commit to more change than we want to up front. Even if we do end up adopting everything though, we think for the following two reasons the complexity can still be kept manageable: @@ -164,14 +152,41 @@ For now they are all written together, but during the RFC meetings we will decid None at this time. -# Future work +# Future work --- IPFS [future]: #future-work +## Motivation + +### Binary Distribution + +IPFS is a potential solution leveraging content addressing for a P2P CDN. +It out-of-the-box match's Nix's data model extremely well. + +### Source distribution and archival + +IPFS also supports git hashing, and so we also provide a good way for people and institutions to "pin" the sources they need, especially if those sources include private ones SWH won't have. +Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), we have a 3 way integration between IPFS, SWH, and Nix. +Data can be directly downloaded from SWH via HTTPS, or indirectly via IPFS, which can act as a CDN to not put as much load on SWH's servers. + +### Not just IPFS + +Many of the IPFS-specific logic could in fact live in a plugin if this is desired. +However, we still need to adjust core abstractions of Nix store layer (as described below) to interface with IPFS in the best possible way. +Those same adjustments would allow Nix to work better with *any* content-addressing system, so alternatives networks/projects to IPFS can also be just as easily experimented with. + +As always with my work, the mantra (from Scheme) to follow is + +> *x* should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. + +A ton of misc features have been added to Nix since 2.0, and we are very careful to not increase total ad-hoc complexity more than necessary. + +## Detailed Design + Each item can be done separately provided its dependent items are also done. These are the items we do *not* wish to commit to at this time, but leave for later. (See [Detailed Design](#detailed-design) for other steps left for later.) -## Augmented `narinfo` +### Augmented `narinfo` - **Purpose**: Binary distribution @@ -220,7 +235,7 @@ The `IpfsCid` and `IpfsEncoding` are likewise informational, describing how to g "UNIXFS" is not used directly because it doesn't support the "executable bit" Nix does on files. NAR archive are not used directly because IPFS doesn't support arbitrarily large objects. -## IPFS Narinfo and "stateful" IPFS Store +### IPFS Narinfo and "stateful" IPFS Store - **Purpose**: Binary distribution - **Depends on**: Augmented `narinfo` @@ -233,7 +248,7 @@ Read-only is easier, since IPFS data is immutable but "writable" stores are supp IPNS is historically slow, but the update is automatic. Printing out a new CID for the index root allows the store administrator to update an out-of-bound mutable reference, but this cannot be automated because Nix doesn't know what the out-of-band method is. -## "stateless" IPFS store +### "stateless" IPFS store - **Purpose**: Source distribution and archival - **Depends on**: NAR info or content address normative in `ValidPathInfo` @@ -251,7 +266,7 @@ But the content address key a successful query used is enough to create a bare-b (A bare-bones `ValidPathInfo` might sound sub-par, but for plain old content-addressed data it is fine. Most of the other metadata in `ValidPathInfo` is really just for input-addressed derivation outputs, and is thus obviated by CA derivation trust maps which contain the same data but more naturally.) -## Wrapped git objects with references +### Wrapped git objects with references - **Purpose**: Binary distribution - **Depends on**: Git file hashing @@ -273,7 +288,7 @@ With this form on content addressing, references are instead CIDs (native IPFS r This means the content addressing is "deep", such that any such content-addressed store object always has a content-addressed closure. At the cost of interop with existing derivation outputs, this make such data easier to manage because there are fewer trust issues and degrees of freedom in general for something to go wrong. -## IPLD Derivations +### IPLD Derivations - **Purpose**: Build plan distribution - **Depends on**: Wrapped git objects with references, From 2e04424b4eb10803dbfd0076118609f1abcb5bf8 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 15 Feb 2023 01:07:45 -0500 Subject: [PATCH 16/27] 133: Rename feature in light of changes --- rfcs/0133-ipfs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-ipfs.md index 51e1c7943..b31e239ad 100644 --- a/rfcs/0133-ipfs.md +++ b/rfcs/0133-ipfs.md @@ -1,5 +1,5 @@ --- -feature: ipfs +feature: git-hashing start-date: (fill me in with today's date, YYYY-MM-DD) author: John Ericsion (@Ericson2314) on behalf of [Obsidian Systems](https://obsidian.systems) co-authors: (find a buddy later to help out with the RFC) From 5a68ea0912982c7d767699fa3ae19e5a57532c52 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 15 Feb 2023 01:08:41 -0500 Subject: [PATCH 17/27] 133: Rename RFC in light of changes --- rfcs/{0133-ipfs.md => 0133-git-hashing.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{0133-ipfs.md => 0133-git-hashing.md} (100%) diff --git a/rfcs/0133-ipfs.md b/rfcs/0133-git-hashing.md similarity index 100% rename from rfcs/0133-ipfs.md rename to rfcs/0133-git-hashing.md From 17de8dd12428e32004d22b2fb8510391f2e817b3 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 15 Feb 2023 15:54:13 -0500 Subject: [PATCH 18/27] 133: Discuss the downside of git's file system model being different --- rfcs/0133-git-hashing.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index b31e239ad..63c891a7a 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -119,6 +119,8 @@ Note at the time of writing this guide uses our original 2020 fork of Nix. # Drawbacks [drawbacks]: #drawbacks +## Complexity + The main cost is more complexity to the store layer. For a few reason we think this is not so bad. @@ -141,6 +143,18 @@ Even if we do end up adopting everything though, we think for the following two We plan on more formally proposing this next. +## Git and Nix's file system data models do not entirely coincide + +Nix puts the permission info of a file (executable bit for now) with that file, whereas Git puts it with the name and hash in the directory. +The practical effect of this discrepancy is that a root file (as opposed to directory) in Nix has permission info, but does not in Git. + +If we are trying to convert existing Nix data into Git, this is a problem. +Assuming we treat "no permission bits" as meaning "non-executable", we will have a partial conversion that will fail on executable bare files. +Tricks like always wrapping everything in a directory get around this, but then we have to be careful the directory is exactly as expected when "unwrapping" in the other direction. + +For now, we only focus on ingesting data *from* Git *to* Nix, and this side-steps the issue. +That conversation is total (though not surjective), and so there is no problem for now. + # Alternatives [alternatives]: #alternatives From e2641c975edbf6c41696f0a907ef90ba0ad02d66 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Thu, 22 Jun 2023 13:38:58 -0400 Subject: [PATCH 19/27] Split future work, clean up Nix-agnostic stores section --- rfcs/0133-git-hashing.md | 69 ++++++++++++++++++++++++++++------------ 1 file changed, 49 insertions(+), 20 deletions(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 63c891a7a..7aa0d53c2 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -65,7 +65,7 @@ Like-minded projects emphasizing content-addressing are our *natural* partners, Each item can be done separately provided its dependent items are also done. These are the items we wish to commit to at this time. -(See [Future Work](#future-work) for other steps left for later.) +(See the two Future Work sections for other steps left for later.) ## Git file hashing @@ -87,28 +87,37 @@ Git hashing would not support references (since references in Nix's sense are no The builtin fetchers can also be made to work with git file hashing just as they support the other types. In addition, Git repo fetching can leverage this better to than the other formats since the data in git repos is already content-addressed in this way. -## Content address or store path in Store interface +## Nix-agnostic content-addressing "stores" - **Purpose**: All distribution -Modify many store interface methods that today take store paths to instead accept *either* a store path or a content address. +We want to be able to substitute from an arbitrary store (in the general, non-Nix sense) of content-addressed objects. +For the purpose of this RFC, that means querying objects by git hash, and being able to trust the results because we can verify them against the git hash. -For stores that are purpose-built for Nix, like the ones we support today, all addressing can be done store paths, so the current interface is fine. -But for Nix-agnostic stores, store paths are rather useless as a key type because Nix-agnostic tools don't know about them. -They can, however, understand content addresses. -And from such a content address we can always produce a store path again, so there is no loss of functionality with existing stores. +In the implementation, we could accomplish this in a variety of ways. -## NAR info or content address normative in `ValidPathInfo` +- On on extreme, we could have a `ContentAddressedSubstitutor` abstract interface completely separate from Nix's `Store` interface. -- **Purpose**: Source distribution and archival -- **Depends on**: Content address or store path in Store interface, +- On the other extreme, we can generalize `Store` itself to allow taking content addresses or store paths as references. + +Exactly how this shakes out is to be determined post-RFC, but it would be nice to use Nix-agnostic persistent methods with `--store` and `--substituters`. + +If we do go the route of modifying the `Store` class, note that these things will need to happen: + + - Many store interface methods that today take store paths will need to also accept names & content addresse pairs. -As described in the first step, currently `NarHash` and `NarSize` are the *normative* fields which are used to verify a store object. -But if the store object is content-addressed, we don't need these, because the content address (`CA` field) will also suffice, all by itself. -Relax the abstract `ValidPathInfo` type to merely require that *either* the pair of `NarHash` and `NarSize` or just `CA` alone be defined. + For stores that are purpose-built for Nix, like the ones we support today, all addressing can be done store paths, so the current interface is fine. + But for Nix-agnostic stores, store paths are rather useless as a key type because Nix-agnostic tools don't know about them. + Those store can, however, understand content addresses. + And from such a name + content address, we can always produce a store path again, so there is no loss of functionality with existing stores. -Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compat and don't come with a cost. -Only new nix-agnostic store types would take advantage of these new, relaxed rules. +- Relax `ValidPathInfo` to merely require that *either* the pair of `NarHash` and `NarSize` or just `CA` alone be defined. + + As described in the first step, currently `NarHash` and `NarSize` are the *normative* fields which are used to verify a store object. + But if the store object is content-addressed, we don't need these, because the content address (`CA` field) will also suffice, all by itself. + + Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compat and don't come with a cost. + Only new nix-agnostic store types would take advantage of these new, relaxed rules. # Examples and Interactions [examples-and-interactions]: #examples-and-interactions @@ -166,8 +175,24 @@ For now they are all written together, but during the RFC meetings we will decid None at this time. +# Future work --- Software Heritage +[future]: #future-work-swh + +## Motivation + +Software Heritage is a great way to get source code that is no longer found at its original location online. +It would be really great to use it as a substituter to fetch source code that fails to be fetched the default way. + +## Detailed Designed + +- **Depends on**: Nix-agnostic content-addressing "stores" + +[`GET /api/1/raw/(swhid)/`](https://docs.softwareheritage.org/devel/swh-web/uri-scheme-api.html#get--api-1-raw-(swhid)-) +is a relatively new Software Heritage API endpoint to get a git object for the given SWHID. +We can used this to write a Software Heritage store, which will accomplish the task layed out in the previous motivation section. + # Future work --- IPFS -[future]: #future-work +[future]: #future-work-ipfs ## Motivation @@ -178,9 +203,13 @@ It out-of-the-box match's Nix's data model extremely well. ### Source distribution and archival -IPFS also supports git hashing, and so we also provide a good way for people and institutions to "pin" the sources they need, especially if those sources include private ones SWH won't have. -Finally, per [Obsidian's bridging work](https://github.com/obsidiansystems/go-ipfs-swh-plugin), we have a 3 way integration between IPFS, SWH, and Nix. -Data can be directly downloaded from SWH via HTTPS, or indirectly via IPFS, which can act as a CDN to not put as much load on SWH's servers. +IPFS also supports git hashing, and so we also provide a good way for people and institutions to "pin" the sources they need, especially if those sources include private ones Software Heritage won't have. +Finally, per [this IPFS plugin](https://blog.obsidian.systems/software-heritage-bridge/), we can have a 3 way integration between IPFS, Software Heritage, and Nix. +Data can be directly downloaded from Software Heritage via HTTPS, or indirectly via IPFS. + +This would hopefully supercede the direct Software Heritage store outlined above. +This is because Software Heritage is primarily an archive, and we should be careful not to waste their limitted resources with copious egress bandwidth. +IPFS can act as a CDN to not put as much load on Software Heritage's servers. ### Not just IPFS @@ -265,7 +294,7 @@ Printing out a new CID for the index root allows the store administrator to upda ### "stateless" IPFS store - **Purpose**: Source distribution and archival -- **Depends on**: NAR info or content address normative in `ValidPathInfo` +- **Depends on**: Nix-agnostic content-addressing "stores" Use the above functionality to create a "stateless" IPFS store. Opaque store path lookups always fail, but when the key is the new content address type, we can translate the key itself into a CID that we can look up. From 852d740448d0b7cc1795a6935a78a34d8cc25d37 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sat, 24 Jun 2023 11:56:19 -0400 Subject: [PATCH 20/27] Fix numerious typos Thanks, all of you! Co-authored-by: Kevin Cox Co-authored-by: Adam Joseph <54836058+amjoseph-nixpkgs@users.noreply.github.com> Co-authored-by: Linus Heckemann --- rfcs/0133-git-hashing.md | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 7aa0d53c2..315598e94 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -32,18 +32,17 @@ The first steps proposed below do *not* tackle this problem directly, but it lay ## Source distribution and archival -A goal of the Nix ecosystem is to package software in a way that never bitrots. -Getting in the way of that, however, is the fact source code frequently goes off-line. -The Software Heritage archive is the best in the world, and a natural partner in this effort. +Source code used by Nix expressions frequently goes off-line. It would be beneficial if there was some resistance to this form of bitrot. +The Software Heritage archive stores much of the source code that Nix expressions use. They would be a natural partner in this effort. -Unfortunately, as https://www.tweag.io/blog/2020-06-18-software-heritage/ describes at the end, a major challenge is the way nix content-addresses software. +Unfortunately, as https://www.tweag.io/blog/2020-06-18-software-heritage/ describes at the end, a major challenge is the way Nix content-addresses software. First of all, Nix hashes sources in bespoke ways that no other project will adopt. -Second of all, tarballs instead of the underlying files leaking non-normative details (compression, odd perms, etc.). +Second of all, hashing tarballs instead of the underlying files leads non-normative details (compression, odd perms, etc.). We should natively support git file hashing, which Git repos and Software Heritage both support. This will completely obliterate these issues. -Overall, are building out a uniform way to work with source code, regardless of its origins or the exact tools involved. +Overall, we are building out a uniform way to work with source code, regardless of its origins or the exact tools involved. ## Build adoption through seamless interop @@ -104,9 +103,9 @@ Exactly how this shakes out is to be determined post-RFC, but it would be nice t If we do go the route of modifying the `Store` class, note that these things will need to happen: - - Many store interface methods that today take store paths will need to also accept names & content addresse pairs. + - Many store interface methods that today take store paths will need to also accept names & content address pairs. - For stores that are purpose-built for Nix, like the ones we support today, all addressing can be done store paths, so the current interface is fine. + For stores that are purpose-built for Nix, like the ones we support today, all addressing can be done with store paths, so the current interface is fine. But for Nix-agnostic stores, store paths are rather useless as a key type because Nix-agnostic tools don't know about them. Those store can, however, understand content addresses. And from such a name + content address, we can always produce a store path again, so there is no loss of functionality with existing stores. @@ -131,19 +130,19 @@ Note at the time of writing this guide uses our original 2020 fork of Nix. ## Complexity The main cost is more complexity to the store layer. -For a few reason we think this is not so bad. +For a few reasons we think this is not so bad. Most importantly is the division of the work into a dependency graph of steps. -This allows us to slowly try out things like IPFS that leverage git hashing, and not commit to more change than we want to up front. +This allows us to slowly try out things like IPFS that leverage Git hashing, and not commit to more change than we want to up front. Even if we do end up adopting everything though, we think for the following two reasons the complexity can still be kept manageable: -2. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. +1. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. This is the sense in which we are "removing the weaknesses and restrictions that make additional features appear necessary" per the Scheme mantra cited above: Instead of extending the model with new features, we are relaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. -3. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. - With Flakes and other post-2.0 features, the upper layer of Nix have gained an enormous amount of flexibility and sophistication. +2. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. + With Flakes and other post-2.0 features, the upper layers of Nix have gained an enormous amount of flexibility and sophistication. RFCs like this show that the so-far more sleepy lower layers also have plenty of potential to gain sophistication too. Embracing layering on technical, educational, communications, and managerial levels can scale our capacity to manage complexity and sophistication without the project growing out of control. @@ -158,7 +157,7 @@ Nix puts the permission info of a file (executable bit for now) with that file, The practical effect of this discrepancy is that a root file (as opposed to directory) in Nix has permission info, but does not in Git. If we are trying to convert existing Nix data into Git, this is a problem. -Assuming we treat "no permission bits" as meaning "non-executable", we will have a partial conversion that will fail on executable bare files. +Assuming we treat "no permission bits" as meaning "non-executable", we will have a partial conversion that will fail on executable files without a parent directory. Tricks like always wrapping everything in a directory get around this, but then we have to be careful the directory is exactly as expected when "unwrapping" in the other direction. For now, we only focus on ingesting data *from* Git *to* Nix, and this side-steps the issue. From 15c1cbc9d90d189fd666b2b48ae219596f054c89 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sat, 24 Jun 2023 11:57:23 -0400 Subject: [PATCH 21/27] Add RFC open PR date --- rfcs/0133-git-hashing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 315598e94..0956773f1 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -1,6 +1,6 @@ --- feature: git-hashing -start-date: (fill me in with today's date, YYYY-MM-DD) +start-date: 2022-08-27 author: John Ericsion (@Ericson2314) on behalf of [Obsidian Systems](https://obsidian.systems) co-authors: (find a buddy later to help out with the RFC) shepherd-team: edolstra, kevincox, gador, @amjoseph-nixpkgs From 165979cbedf6bff63dbf2379d61c8b38b631d63e Mon Sep 17 00:00:00 2001 From: John Ericson Date: Sat, 24 Jun 2023 12:22:40 -0400 Subject: [PATCH 22/27] Be clearer about not supporting references to start --- rfcs/0133-git-hashing.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 0956773f1..8280e6de9 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -76,7 +76,8 @@ This support entails two basic things: - Content addresses are used to compute store paths. - Content addresses are used to verify store object integrity. -Git hashing would not support references (since references in Nix's sense are not a Git concept), but that is not an issue for the intended use-case of exchanging source code. +Git hashing would not (in this first proposed version) support references, since references in Nix's sense are not part of Git's data model. +This is OK for now; encoding references is not needed for the intended initial use-case of exchanging source code. ## Git file hashing for `buitins.fetch*` From 3c3cac656b111e06d6d4be7dee4e4fb8bf3873c2 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Mon, 26 Jun 2023 09:48:15 -0400 Subject: [PATCH 23/27] Update rfcs/0133-git-hashing.md Co-authored-by: Kevin Cox --- rfcs/0133-git-hashing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 8280e6de9..7e5ca4b3b 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -199,7 +199,7 @@ We can used this to write a Software Heritage store, which will accomplish the t ### Binary Distribution IPFS is a potential solution leveraging content addressing for a P2P CDN. -It out-of-the-box match's Nix's data model extremely well. +Out-of-the-box it matches Nix's data model extremely well. ### Source distribution and archival From 641891b53fe168914c1f636a613c7f513ca480b1 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Mon, 26 Jun 2023 14:19:09 -0400 Subject: [PATCH 24/27] Rip out both RFC-scal Future Work sections They are now in an `ipfs-2` branch in this repo. --- rfcs/0133-git-hashing.md | 173 ++------------------------------------- 1 file changed, 6 insertions(+), 167 deletions(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 7e5ca4b3b..557777e3e 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -64,7 +64,7 @@ Like-minded projects emphasizing content-addressing are our *natural* partners, Each item can be done separately provided its dependent items are also done. These are the items we wish to commit to at this time. -(See the two Future Work sections for other steps left for later.) +(The goals mentioned future work are, in a separate document, also broken down into a dependency graph of smaller steps.) ## Git file hashing @@ -175,172 +175,11 @@ For now they are all written together, but during the RFC meetings we will decid None at this time. -# Future work --- Software Heritage -[future]: #future-work-swh +# Future work +[future]: #future-work -## Motivation +- Integrate with outside content-addressing storage/transmission like -Software Heritage is a great way to get source code that is no longer found at its original location online. -It would be really great to use it as a substituter to fetch source code that fails to be fetched the default way. + - The Software Heritage archive -## Detailed Designed - -- **Depends on**: Nix-agnostic content-addressing "stores" - -[`GET /api/1/raw/(swhid)/`](https://docs.softwareheritage.org/devel/swh-web/uri-scheme-api.html#get--api-1-raw-(swhid)-) -is a relatively new Software Heritage API endpoint to get a git object for the given SWHID. -We can used this to write a Software Heritage store, which will accomplish the task layed out in the previous motivation section. - -# Future work --- IPFS -[future]: #future-work-ipfs - -## Motivation - -### Binary Distribution - -IPFS is a potential solution leveraging content addressing for a P2P CDN. -Out-of-the-box it matches Nix's data model extremely well. - -### Source distribution and archival - -IPFS also supports git hashing, and so we also provide a good way for people and institutions to "pin" the sources they need, especially if those sources include private ones Software Heritage won't have. -Finally, per [this IPFS plugin](https://blog.obsidian.systems/software-heritage-bridge/), we can have a 3 way integration between IPFS, Software Heritage, and Nix. -Data can be directly downloaded from Software Heritage via HTTPS, or indirectly via IPFS. - -This would hopefully supercede the direct Software Heritage store outlined above. -This is because Software Heritage is primarily an archive, and we should be careful not to waste their limitted resources with copious egress bandwidth. -IPFS can act as a CDN to not put as much load on Software Heritage's servers. - -### Not just IPFS - -Many of the IPFS-specific logic could in fact live in a plugin if this is desired. -However, we still need to adjust core abstractions of Nix store layer (as described below) to interface with IPFS in the best possible way. -Those same adjustments would allow Nix to work better with *any* content-addressing system, so alternatives networks/projects to IPFS can also be just as easily experimented with. - -As always with my work, the mantra (from Scheme) to follow is - -> *x* should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. - -A ton of misc features have been added to Nix since 2.0, and we are very careful to not increase total ad-hoc complexity more than necessary. - -## Detailed Design - -Each item can be done separately provided its dependent items are also done. -These are the items we do *not* wish to commit to at this time, but leave for later. -(See [Detailed Design](#detailed-design) for other steps left for later.) - -### Augmented `narinfo` - -- **Purpose**: Binary distribution - -*This is taken from [RFC PR 122](https://github.com/lucasew/rfcs/blob/binary-cache-ipfs/rfcs/0122-binary-cache-ipfs.md), which was abandoned by its author.* - -The purpose of this is a "hybrid" store where the narinfo metadata is still served via HTTPS, but the data itself is served via IPFS. - -Today, a narinfo looks like this: - -``` -StorePath: /nix/store/gdh8165b7rg4y53v64chjys7mbbw89f9-hello-2.10 -URL: nar/0i6ardx43rdg24ab1nc3mq7f5ykyiamymh1v37gxdv5xh5cm0cmb.nar.xz -Compression: xz -FileHash: sha256:0i6ardx43rdg24ab1nc3mq7f5ykyiamymh1v37gxdv5xh5cm0cmb -FileSize: 40360 -NarHash: sha256:1ddv0iqq47j0awyw7a8dmm8bz71c6ifrliq53kmmsfzjxf3rwvb8 -NarSize: 197528 -References: 7gx4kiv5m0i7d7qkixq2cwzbr10lvxwc-glibc-2.27 gdh8165b7rg4y53v64chjys7mbbw89f9-hello-2.10 -Deriver: 5sj6fdfym58sdaf3r5p87v4l8sj2zlvn-hello-2.10.drv -Sig: cache.nixos.org-1:K0thQEG60rzAK8ZS9f1whb7eRlIshlMDJAm7xvX1oF284H+PTqlicv/wGW6BIj+wWWONHvUZ2MYc+KDArekjDA== -``` - -This RFC proposes new key-value pairs that in this example would be: - -``` -IpfsCid: Qmf8NfV2hnq44RoQw9vxmSpGYTwAovA8FUCxeCJCqmXeNN -IpfsEncoding: {"method":"wrapped-nar","chunking":{"leaf-format":"raw","strategy":"fixed-size"},"layout":"balanced","max-width":174} -``` - -Just as today, the `NarHash` and `NarSize` remain the *normative* way to lock down the store object the `narinfo` file describes. -Conversely, The `URL`, `FileHash` and `FileSize` by contrast are *informational*, describing not what the store object *is*, but *how to get it*. - -The `IpfsCid` and `IpfsEncoding` are likewise informational, describing how to get the store object: - -- `IpfsCid`: Native content address for IPFS. - -- `IpfsEncoding`: Enough info to deterministically rebuild the IPFS representation from a non-IPFS copy of the store object. - - For now, `IpfsEncoding` will only support `unixfs-nar`, which works as follows: - - The NAR is itself wrapped in IPFS's [UnixFS](https://github.com/ipfs/specs/blob/main/UNIXFS.md). - This other format can be extracted from the CID (which is conceptually a pair of encoding metadata and a hash). - For now, only IPFS's "unixfs" is supported. - `chunking`, `layout`, and `max-width` are tuning parameters for unixfs [described in the UnixFS spec](https://github.com/ipfs/specs/blob/main/UNIXFS.md#importing). - - "UNIXFS" is not used directly because it doesn't support the "executable bit" Nix does on files. - NAR archive are not used directly because IPFS doesn't support arbitrarily large objects. - -### IPFS Narinfo and "stateful" IPFS Store - -- **Purpose**: Binary distribution -- **Depends on**: Augmented `narinfo` - -Instead of a "hybrid" store, where the narinfo index is served with HTTP but the data itself is served with IPFS, we can do an all-IPFS store with the data itself and mutable index stored in IPFS. -The Narinfo instead of being encoded the legacy line-oriented text format can be IPFS's native DAG-CBOR IPLD codec, which is like JSON + content address links (but stored as CBOR). -This allows Narinfos to reference each other and be nicely structured so the index is legible from Nix-agnostic IPFS tools and recursive pinning comes for free. - -Read-only is easier, since IPFS data is immutable but "writable" stores are supported by simple printing back a new CID for the new store root after some modifications, or modifying a mutable IPNS reference. -IPNS is historically slow, but the update is automatic. -Printing out a new CID for the index root allows the store administrator to update an out-of-bound mutable reference, but this cannot be automated because Nix doesn't know what the out-of-band method is. - -### "stateless" IPFS store - -- **Purpose**: Source distribution and archival -- **Depends on**: Nix-agnostic content-addressing "stores" - -Use the above functionality to create a "stateless" IPFS store. -Opaque store path lookups always fail, but when the key is the new content address type, we can translate the key itself into a CID that we can look up. - -Unlike the previous two flavours of IPFS store, this one is stateless in that there is no need for an index at all. -Only content-addressed data is looked up, and it doesn't need any nar-info metadata before the data is all there. - -We need the previous step for querying without fetching any data. -In that case since there is no narinfo index we're looking up, we don't get any additional metadata back. -But the content address key a successful query used is enough to create a bare-bones `ValidPathInfo` with a `CA` field, which with the enough step is valid. - -(A bare-bones `ValidPathInfo` might sound sub-par, but for plain old content-addressed data it is fine. -Most of the other metadata in `ValidPathInfo` is really just for input-addressed derivation outputs, and is thus obviated by CA derivation trust maps which contain the same data but more naturally.) - -### Wrapped git objects with references - -- **Purpose**: Binary distribution -- **Depends on**: Git file hashing - -Merkelized formats like git file hashing are better than NAR because that allow for very natural deduplication and minimal transfers. -This is the same benefit we get today with Nix within a closure of multiple store objects, now also *within a single store object*. -But git has no notion of Nix-style references, so plain git hashing is only suitable for leaf store objects without references (like source code). - -However, we can use IPLD to wrap git-hashed data with a reference set, and "has self reference" bit. -This easily creates a new content addressing scheme which handles all "shapes" of store objects. -This gives is a nice way to thus share arbitrary nix store data (provided it is content-addressed) over IPFS. - -Like with "IPFS Narinfo", this format is also very easy to understand with nix-agnostic native IPFS tools. -This is because, once again, the reference graph is made native to IPFS not done indirectly with store path strings which must be looked up. - -An interesting corollary to note: -Content addressing today is "shallow", in that references are arbitrary store paths. -With this form on content addressing, references are instead CIDs (native IPFS references) to other obligatorily content-addressed data. -This means the content addressing is "deep", such that any such content-addressed store object always has a content-addressed closure. -At the cost of interop with existing derivation outputs, this make such data easier to manage because there are fewer trust issues and degrees of freedom in general for something to go wrong. - -### IPLD Derivations - -- **Purpose**: Build plan distribution -- **Depends on**: Wrapped git objects with references, - IPFS as substitutor - -Natively represent derivations in IPFS, again with the same benefits of leverage the native graph representations. - -This is a culmination of all the features so far. -The derivations must be CA derivations (floating or fixed). -They must also produce wrapped git objects with references, though they can also depend on regular unwrapped git file hashed store objects. - -The derivations and their outputs are thus all fully IPFS native, leveraging the IPFS graph and trust vs plain old data separation for the high standard of interoperability. + - IPFS From 5828c41565d818fd1bcafbccb08bf3f6e61e2816 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Mon, 26 Jun 2023 14:27:02 -0400 Subject: [PATCH 25/27] Remove "Build adoption through seamless interop" That can go in a separate blog post. --- rfcs/0133-git-hashing.md | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 557777e3e..683fdbca5 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -44,21 +44,6 @@ This will completely obliterate these issues. Overall, we are building out a uniform way to work with source code, regardless of its origins or the exact tools involved. -## Build adoption through seamless interop - -This last argument is more strategic than technical. - -A lot of people in this community would like to see Nix be used more widely, but as much as we all wish otherwise, the fact remains that there is some tension between making nix *better* and making it *more accessible*. - -Nix is very foreign from the "bad conventional" way things are done, and making Nix better can sometimes involve making it even more foreign. -We don't want to steepen the learning curve or make it "seem more weird". - -On the other hand, making Nix more accessible by making it more like tools users are already use-to can obscure or chip-away at Nix's benefits. -We don't want to "pander" in ways that will make Nix faddish but ultimately undermine it's popularity over the long haul (see Docker the company's woes). - -One way to get around this tension to me is rather than pushing Nix towards the rest of the world, pushing the rest of the world towards us. -Like-minded projects emphasizing content-addressing are our *natural* partners, and we should work with them to promote Nix-*agnostic* standards that further our values and mission. - # Detailed design [design]: #detailed-design From 9279a03712e7fad87f885ee56cb298a9be4ec720 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 28 Jun 2023 21:45:55 -0400 Subject: [PATCH 26/27] Apply suggestions from code review Thank you both!! Co-authored-by: Valentin Gagarin Co-authored-by: Ryan Lahfa --- rfcs/0133-git-hashing.md | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 683fdbca5..2c067af9f 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -37,9 +37,9 @@ The Software Heritage archive stores much of the source code that Nix expression Unfortunately, as https://www.tweag.io/blog/2020-06-18-software-heritage/ describes at the end, a major challenge is the way Nix content-addresses software. First of all, Nix hashes sources in bespoke ways that no other project will adopt. -Second of all, hashing tarballs instead of the underlying files leads non-normative details (compression, odd perms, etc.). +Second of all, hashing tarballs instead of the underlying files leads to non-normative details (compression, odd perms, etc.). -We should natively support git file hashing, which Git repos and Software Heritage both support. +We should natively support Git file hashing, which is supported both by Git repos and Software Heritage. This will completely obliterate these issues. Overall, we are building out a uniform way to work with source code, regardless of its origins or the exact tools involved. @@ -49,7 +49,7 @@ Overall, we are building out a uniform way to work with source code, regardless Each item can be done separately provided its dependent items are also done. These are the items we wish to commit to at this time. -(The goals mentioned future work are, in a separate document, also broken down into a dependency graph of smaller steps.) +(The goals mentioned under [future work](#future-work) are, in a separate document, also broken down into a dependency graph of smaller steps.) ## Git file hashing @@ -67,21 +67,21 @@ This is OK for now; encoding references is not needed for the intended initial u ## Git file hashing for `buitins.fetch*` - **Purpose**: Source distribution and archival -- **Depends on**: Git file hashing, +- **Depends on**: Git file hashing -The builtin fetchers can also be made to work with git file hashing just as they support the other types. -In addition, Git repo fetching can leverage this better to than the other formats since the data in git repos is already content-addressed in this way. +The built-in fetchers can also be made to work with Git file hashing just as they support the other types. +In addition, Git repo fetching can leverage this better to than the other formats since the data in Git repos is already content-addressed in this way. ## Nix-agnostic content-addressing "stores" - **Purpose**: All distribution We want to be able to substitute from an arbitrary store (in the general, non-Nix sense) of content-addressed objects. -For the purpose of this RFC, that means querying objects by git hash, and being able to trust the results because we can verify them against the git hash. +For the purpose of this RFC, that means querying objects by Git hash, and being able to trust the results because we can verify them against the Git hash. In the implementation, we could accomplish this in a variety of ways. -- On on extreme, we could have a `ContentAddressedSubstitutor` abstract interface completely separate from Nix's `Store` interface. +- On one extreme, we could have a `ContentAddressedSubstitutor` abstract interface completely separate from Nix's `Store` interface. - On the other extreme, we can generalize `Store` itself to allow taking content addresses or store paths as references. @@ -101,8 +101,8 @@ If we do go the route of modifying the `Store` class, note that these things wil As described in the first step, currently `NarHash` and `NarSize` are the *normative* fields which are used to verify a store object. But if the store object is content-addressed, we don't need these, because the content address (`CA` field) will also suffice, all by itself. - Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compat and don't come with a cost. - Only new nix-agnostic store types would take advantage of these new, relaxed rules. + Existing Nix stores types are still required to contain a `NarHash` and `NarSize`, which is good for backwards compatibility and don't come with a cost. + Only new Nix-agnostic store types would take advantage of these new, relaxed rules. # Examples and Interactions [examples-and-interactions]: #examples-and-interactions @@ -123,8 +123,8 @@ This allows us to slowly try out things like IPFS that leverage Git hashing, and Even if we do end up adopting everything though, we think for the following two reasons the complexity can still be kept manageable: -1. Per the abstract vs concrete model of the nix store in https://github.com/NixOS/nix/pull/6877 , everything we are doing is simply flushing out alternative interpretations of the abstract model. - This is the sense in which we are "removing the weaknesses and restrictions that make additional features appear necessary" per the Scheme mantra cited above: +1. Per the abstract vs concrete model of the Nix store in https://github.com/NixOS/nix/pull/6877, everything we are doing is simply flushing out alternative interpretations of the abstract model. + This is the sense in which we are, per the Scheme mantra, "removing the weaknesses and restrictions that make additional features appear necessary": Instead of extending the model with new features, we are relaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. 2. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. @@ -133,9 +133,7 @@ Even if we do end up adopting everything though, we think for the following two Embracing layering on technical, educational, communications, and managerial levels can scale our capacity to manage complexity and sophistication without the project growing out of control. It will "divide and conquer" the project so the interfaces between each layer are still rigorously enforced preventing a combinatorial explosion in complexity. - That frees up "complexity budget" for project like this. - - We plan on more formally proposing this next. + That frees up "complexity budget" for projects like this. ## Git and Nix's file system data models do not entirely coincide @@ -147,7 +145,7 @@ Assuming we treat "no permission bits" as meaning "non-executable", we will have Tricks like always wrapping everything in a directory get around this, but then we have to be careful the directory is exactly as expected when "unwrapping" in the other direction. For now, we only focus on ingesting data *from* Git *to* Nix, and this side-steps the issue. -That conversation is total (though not surjective), and so there is no problem for now. +That mapping is total, i.e. all Git data can be mapped, and injective, i.e. each Git data has a unique Nix data representative (though not surjective, i.e. not all Nix data can be represented as a piece of Git data), and so there is no problem for now. # Alternatives [alternatives]: #alternatives From 3a083b21679573f36678471052f90831987217db Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 28 Jun 2023 22:15:43 -0400 Subject: [PATCH 27/27] Slim down the layering section The other stuff is already in flight, we don't need to talk about it so much here. Co-authored-by: Valentin Gagarin --- rfcs/0133-git-hashing.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/rfcs/0133-git-hashing.md b/rfcs/0133-git-hashing.md index 2c067af9f..81eb9f192 100644 --- a/rfcs/0133-git-hashing.md +++ b/rfcs/0133-git-hashing.md @@ -128,11 +128,7 @@ Even if we do end up adopting everything though, we think for the following two Instead of extending the model with new features, we are relaxing concrete model assumptions (e.g. references are always opaque store paths) while keeping the abstract model the same. 2. We also support plans to decouple the layers of Nix further, and update our educational and marketing material to reflect it. - With Flakes and other post-2.0 features, the upper layers of Nix have gained an enormous amount of flexibility and sophistication. - RFCs like this show that the so-far more sleepy lower layers also have plenty of potential to gain sophistication too. - - Embracing layering on technical, educational, communications, and managerial levels can scale our capacity to manage complexity and sophistication without the project growing out of control. - It will "divide and conquer" the project so the interfaces between each layer are still rigorously enforced preventing a combinatorial explosion in complexity. + Layering will "divide and conquer" the project so the interfaces between each layer are still rigorously enforced preventing a combinatorial explosion in complexity. That frees up "complexity budget" for projects like this. ## Git and Nix's file system data models do not entirely coincide