Nix and IPFS #859

vcunat · 2016-03-24T10:10:49Z

(I wanted to split this thread from #296 (comment) .)

Let's discuss relations with IPFS here. As I see it, mainly a decentralized way to distribute nix-stored data would be appreciated.

What we might start with

The easiest usable step might be to allow distribution of fixed-output derivations over IPFS. That are paths that already are content-addressed, typically by (truncated) sha256 over either a flat file or a tar-like dump of a directory tree; more details are in the docs. These paths are mainly used for compressed tarballs of sources. This step itself should avoid lots of problems with unstable upstream downloads, assuming we could convince enough nixers to serve their files over IPFS.

Converting hashes

One of the difficulties is that we use different kinds of hashing than in IPFS, and I don't think it would be good to require converting those many thousands of hashes in our expressions. (Note that it's infeasible to convert among those hashes unless you have the whole content.) IPFS people might best suggest how to work around this. I imagine we want to "serve" a mapping from the hashes we use to the IPFS's hashes, perhaps realized through IPNS. (I don't know details of IPFS's design, I'm afraid.) There's an advantage that one can easily verify the nix-style hash in the end after obtaining the paths in any way.

Non-fixed content

If we get that far, it shouldn't be too hard to manage distributing everything via IPFS, as for all other derivations we use something we could call indirect content addressing. To explain that, let's look at how we distribute binaries now – our binary caches. We hash the build recipe, including all its recipe dependencies, and we inspect the corresponding narinfo URL on cache.nixos.org. If our build farm has built that recipe, various information is in that file, mainly the hashes of the content of the resulting outputs of that build and crypto-signatures of them.

Note that this narinfo step just converts our problem to the previous fixed-output case, and the conversion itself seems very reminiscent of IPNS.

Deduplication

Note that nix-built stuff has significantly greater than usual potential for chunk-level deduplication. Very often we do a rebuild of a package only because something in a dependency has changed, so there are only very minor changes expected in the results, mainly just exchanging the references to runtime dependencies as their paths have changed. (In seldom occasions even lengths of the paths would change.) There's a great potential to save on that during distribution of binaries, which would be utilized by implementing the section above, and even potential in saving disk space in comparison to our way of hardlinking equal files (the next paragraph).

Saving disk space

Another use might be to actually store the files in a FS similar to what IPFS uses. That seems a little more complex and tricky thing to deploy, e.g. I'm not sure someone already trusts the implementation of the FS enough to have the whole OS running of it.

It's probably premature to speculate too much on this use ATM; I'll just write I can imagine having symlinks from /nix/store/foo to /ipfs/*, representing the locally trusted version of that path. (That's working around the problems related to making /nix/store/foo content-addressed.) Perhaps it could start as a per-path opt-in, so one could move only the less vital paths out of /nix/store itself.

I can help personally with bridging the two communities in my spare time. Not too long ago, I spent many months on researching various ways to handle "highly redundant" data, mainly from the point of view of theoretical computer science.

The text was updated successfully, but these errors were encountered:

ehmry · 2016-03-24T10:19:51Z

I'm curious what the minimalist way to associate store paths to IPFS objects while interfering as little as possible with IPFS-unaware tools would be.

vcunat · 2016-03-24T10:29:36Z

I described such a way in the second paragraph from bottom. It should work with IPFS and nix store as they are, perhaps with some script that would move the data, create the symlink and pin the path in IPFS to avoid losing it during GC. (It could be unpinned when nix deletes the symlink during GC.)

ehmry · 2016-03-24T10:50:56Z

I was thinking about avoiding storing store objects in something that wouldn't require a daemon, but of course you can't have everything.

Ericson2314 · 2016-03-24T16:05:55Z

@vcunat Great write up! More thoughts on this later, but one thing that gets me is the tension between wanting incremental goals, and avoiding work we don't need long term. For example it will take some heroics to use our current hashing schemes, but for things like dedup and the intensional store we'd want to switch to what IPFS already does (or much closer to that) anyways.

Maybe the best first step is a new non-flat/non-NAR hashing strategy for fixed output derivations? We can slowly convert nixpkgs to use that, and get IPFS mirroring and dedup in the fixed-output case. Another step is using git tree hashes for fetch git. We already want to do that, and I suspect IPFS would want that too for other users. IPFS's multihash can certainly be heavily abused for such a thing :).

Ericson2314 · 2016-03-24T16:07:33Z

For me the end goal should be only using IPNS for the derivation -> build map. Any trust-based compatibility map between hashing schemes long term makes the perfectionist in me sad :).

vcunat · 2016-03-24T16:56:43Z

For example it will take some heroics to use our current hashing schemes, but for things like dedup and the intensional store we'd want to switch to what IPFS already does (or much closer to that) anyways.

I meant that we would "use" some IPFS hashes but also utilize a mapping from our current hashes, perhaps run over IPNS, so that it would still be possible to run our fetchurl { sha256 = "..." } without modification. Note that it's these flat tarball hashes that most upstreams release and sign, and that's not going to change anytime soon, moreover there's not much point in trying to deduplicate compressed tarballs anyway. (We might choose to use uncompressed sources instead, but that's just another partially independent decision I'm not sure about.)

Ericson2314 · 2016-03-24T17:11:24Z

For single files / IPFS blobs, we should be able to hash the same way without modification.

Ericson2314 · 2016-03-24T17:17:14Z

But for VCS fetches we currently do a recursive/nar hash right? That is what I was worried about.

Ericson2314 · 2016-03-24T18:14:21Z

@ehmry I assume it would be pretty easy to make the nix store an immutable FUSE filesystem backed by IPFS (hopefully such a thing exists already). Down the road I'd like to have package references and the other things currently in the SQLite database also backed by IPFS: they would "appear" in the fuse filesystem as specially-named symlinks/hard-links/duplicated sub-directories. "referees" is the only field I'm aware of that'd be a cache on top. Nix would keep track of roots, but IPFS would do GC itself, in the obvious way.

cleverca22 · 2016-04-08T10:22:23Z

one idea i had, was to keep all outputs in NAR format, and have the fuse layer dynamically unpack things on-demand, that can then be used with some other planned IPFS features to share a file without copying it into the block storage

then you get a compressed store and don't have to store 2 copies of everything (the nar for sharing and the installed)

nmikhailov · 2016-04-08T10:24:59Z

@cleverca22 yeah, I had same thoughts about that, its unclear how hard this would impact performance though

cleverca22 · 2016-04-08T11:01:12Z

could keep a cache of recently used files in a normal tmpfs, and relay things over to that to boost performance back up

davidar · 2016-04-08T13:55:01Z

@cleverca22 another idea that was mentioned previously was to add support for NAR to ipfs, so that we can transparently unpack it as we do with TAR currently (ipfs tar --help)

Ericson2314 · 2016-04-08T15:21:03Z

NAR sucks though---no file-level dedup we could otherwise get for free. The above might be fine as a temporary step, but Nix should learn about a better format.

davidar · 2016-04-09T02:42:17Z

@Ericson2314 another option that was mentioned was for Nix and IPFS (and perhaps others) to try to standardise on a common archive format

https://github.com/ipfs/archive-format

Ericson2314 · 2016-04-09T16:08:33Z

@davidar Sure that's always good. For the shortish term, I was leaning towards a stripped down unixfs with just the attributes NAR cares about. As far as Nix is concerned this is basically the same format but with a different hashing scheme.

Ericson2314 · 2016-04-09T16:13:51Z

Yeah looking at Car, it's seems to be both an "IPFS Schema" over the IPFS Merkel DAG (Unless it just reuses unixfs), and then an interchange format for packing the dag into one binary blob.

That former is cool, but I don't think Nix even needs the latter (except perhaps as a new way to fall back on http etc if IPFS is not available while using a compatible format). For normal operation, I'd hope nix could just ask IPFS to populate the fuse filesystem that is the store given a hash, and everything else would be transparent.

cleverca22 · 2016-04-09T17:19:46Z

https://github.com/cleverca22/fusenar

i now have a nixos container booting with a fuse filesystem at /nix/store, which mmap's a bunch of .nar files, and transparently reads the requested files

knupfer · 2016-07-20T17:07:55Z

What is currently missing for using IPFS? How could I contribute? I really need this feature for work.

knupfer · 2016-07-20T17:09:46Z

Pinging @jbenet and @whyrusleeping because they are only enlisted on the old issue.

copumpkin · 2016-07-20T17:34:51Z

@knupfer I think writing a fetchIPFS would be a pretty easy first step. Deeper integration will be more work and require touching Nix itself.

knupfer · 2016-07-28T14:19:05Z

Ok, I'm working on it but there are some problems. Apparently, ipfs doesn't save the executable flag, so stuff like stdenv doesn't work, because it expects an executable configure. The alternative would be to distribute tarballs and not directories, but that would be clearly inferior because it would exclude deduplication on file level. Any thoughts on that? I could make every file executable, but that would be not very nice...

copumpkin · 2016-07-28T14:23:37Z

@knupfer it's not great, but would it be possible to distribute a "permissions spec file" paired with a derivation, that specifies file modes out of band? Think of it like a JSON file or whatever format and your thing pulls from IPFS, then applies the file modes to the contents of the directory as specified in the spec. The spec could be identified unique by the folder it's a spec for.

copumpkin · 2016-07-28T14:25:17Z

In fact, the unit of distribution could be something like:

{
  "contents": "/ipfs/12345",
  "permissions": "/ipfs/647123"
}

knupfer · 2016-07-28T14:34:11Z

Yep, that would work. Albeit it makes it more complicated for the user to add some sources to ipfs. But we could for example give an additional url in the fetchIPFS which wouldn't be in ipfs, and if it fetches from normal web automatically generate the permissions file and add that to ipfs... I'll think a bit about it.

davidak · 2016-07-28T15:44:12Z

ipfs doesn't save the executable flag

should it? @jbenet

how is ipfs-npm doing it? maybe also just distributes tarballs. that is of course not the most elegant solution.

lordcirth · 2020-11-11T17:56:55Z

I think chunking should be set to Rabin; if the majority of these packages are going to be uploaded by this implementation anyway, there is little downside to being non-standard. Rabin is more advanced and should save on incremental update sizes. Though maybe that doesn't apply to compressed files?

kevincox · 2020-11-12T01:35:57Z

It depends on how you do the compression. gzip has the --rsyncable option but the xz cli doesn't appear to have it. Basically if you reset the compression stream every once and a while (ideally using some rolling hash yourself) the hashes can sync up.

Of course the ideal IPFS chunking would just reuse the compression chunking but I don't think that is supported by the current implementation (and would need to be custom for each compression format).

So I guess the answer is that as of today it probably won't help much.

ohAitch · 2020-11-12T01:52:47Z

You could always do the https://bup.github.io/ thing and store the explicitly chunked into files compressed version in IPFS, especially if the chunks are small enough you know they won't be separately chunked by IPFS itself.

lordcirth · 2020-11-12T18:53:23Z

I guess if there is no big win, then it is best to stick with IPFS defaults. Incremental dedup between versions isn't the main reason we want IPFS anyway.

kamadorueda · 2020-12-26T02:45:37Z

Guys let me introduce you to a beta of CachIPFS, an encrypted (and private) read/write Nix binary cache over IPFS

https://4shells.com/docs#about-cachipfs

Sure it has a lot of rough edges, but it is an MVP that works, so let me know your thoughts!

Demo.sh
Demo GIF

bbigras · 2020-12-30T18:09:35Z

@kamadorueda I tried CachIPFS twice. The first run took a long time (which I guess is normal) but the second run still took over 11 hours. Is that normal? I was publishing my nix-config.

I wasn't running 4s cachipfs daemon since in the demo it looks like it's only used for fetching.

kamadorueda · 2020-12-30T18:52:29Z

I think having faster second and later executions is a must

The current algorithm is very naive:

create a temporary directory
do nix copy the /nix/store/path you want to publish to the directory
encrypt the files
add them to IPFS
publish them to the CachIPFS coordinator

But naive is slow

We'll definitely improve it, thanks for the feedback @bbigras, I'm taking note! 😄

Does someone have a user case beyond publishing/retrieving from a private cache? we'd love to hear it

bbigras · 2020-12-31T00:26:30Z

Does someone have a user case beyond publishing/retrieving from a private cache? we'd love to hear it

If both a stranger and me publish our nix-config with CachIPFS. Could both caches by used by the 2 of us?

Maybe another similar use case would be, if a lot of people are using CachIPFS, and they are all using the same channel (let say unstable). It could be nice and efficient to all share the same stuff.

If the file are trustable. I didn't read everything on CachIPFS yet.

kamadorueda · 2021-01-02T20:13:19Z

If both a stranger and me publish our nix-config with CachIPFS. Could both caches by used by the 2 of us?

Yes! As long as both of you use the same CachIPFS API token

Every account has associated:

an api token
a secret key for encryption/decryption
a private binary cache that can be accessed with the api token and encrypted/decrypted with the secret key

We'll have the ability to rotate those secrets soon. If many machines (you + your friend) use the same api token, they use the same encryption keys, and upload/retrieve from the same private binary cache

This is the private layer of CachIPFS, it requires trust but this is actually a feature (we don't want untrusted people to read/modify our data)

Maybe another similar use case would be, if a lot of people are using CachIPFS, and they are all using the same channel (let say unstable). It could be nice and efficient to all share the same stuff.

If the file are trustable. I didn't read everything on CachIPFS yet.

I'm thinking on this one, this would be the CachIPFS public layer in which all Nix users share the binary cache with all Nix users. This creates a distributed binary cache over IPFS (this is my dream and purpose)

The problem is security, An attacker can place a virus under /nix/store/gqm07as49jn3gqmxlxrgpnqhzmm18374-gcc-9.3.0 and upload it to the binary cache. If someone else requires gcc, (s)he downloads the virus instead of gcc. This is why trust is very important, you only want to fetch data from people you can trust (not hackers)

But trust can be negotiated in many ways:

users can decide which users to share data with. Sounds like the CachIPFS private layer? well, it is
with algorithms: https://www.tweag.io/blog/2020-12-16-trustix-announcement/ which is not very different from CachIPFS private layer but we find this solution very cool, we just need to find a way in which we can guarantee to the network 100% confidence they are downloading legit nix store paths (I take security very seriously)
by implementing cryptographic protocols that make trust unnecessary, like git https://blog.ipfs.io/2020-09-08-nix-ipfs-milestone-1/ this is by far the ideal solution, but the implementation is hard and there may be many months/years until we can offer a good user experience to the community, also it may not be possible to have 100% content-addressability in all cases (https://www.tweag.io/blog/2020-11-18-nix-cas-self-references/) and those cases will require trust anyway

This is a very exciting topic, we are thinking about it every day

In the big picture, CachIPFS can be defined as a let's-implement-something-useful-with-the-things-we-have-today

bbigras · 2021-05-12T13:43:53Z

Any progress on IPFS now that there's the call to test Content-addressed Nix?

Ericson2314 · 2021-05-12T15:43:13Z

@bbigras We at have not done any more IPFS work lately, because the implementation was basically complete and the main blocker is consensus around merging. But rest assured, all the recent work polishing content-addressed Nix builds upon the foundation for CA we laid with @regnat and @edolstra last summer while working on IPFS × Nix, and a win for content-addressed Nix is a win for IPFS × Nix.

I have merged master in our outstanding PRs from time to time, maybe it's time for me to do that again.

stale · 2021-11-16T11:44:16Z

I marked this as stale due to inactivity. → More info

davidak · 2021-11-16T13:52:48Z

What are the next steps to have this as an official feature? Do we still have to wait for CA or is it good enough? Do we need an RFC?

Ericson2314 · 2021-11-16T15:58:48Z

@davidak As far as I am concerned, the functionality is good enough to head straight to code review and land some experimental features, as experimental features do not require an RFC. (c.f. NixOS/rfcs#92 (review)). (There are no a decent amount of conflicts in the later P.R.s, but I would happily go fix this if we started merging the earlier ones.)

Now, when I talked to @edolstra before, he was a bit skeptical of this, especially with there being good concrete use-cases from the get-go. I was hoping complete https://nlnet.nl/project/SoftwareHeritage-P2P/ to make a more concrete use-case before taking up the issue again. But we haven't started that yet because of staffing constraints which will hopefully dissipate soon.

I suppose I could start and RFC now anyways, even if it isn't strictly required, so the portion of the community interested can make itself heard, and we have a more spec-style feature list as opposed to the tutorial-style one that is https://github.com/obsidiansystems/ipfs-nix-guide/blob/master/tutorial.md. I didn't do that yet because, again, wanted the SWH use-case, and also because I have other RFCs in flight and limited time, but I could be convinced I ought to go write that RFC anyways :).

Ericson2314 · 2022-02-10T18:11:24Z

https://www.softwareheritage.org/2022/02/10/building-bridge-to-the-software-heritage-archive/ We have kicked off work on this! I hope once it wraps up, we will be able to a make a tighter case for Nix and IPFS, and open an RFC, so our stuff from 2020 can finally get merged on an experimental basis.

nixos-discourse · 2022-02-10T18:24:15Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/should-ipfs-be-used-as-a-source-for-fetchurl-in-nixpkgs/16312/19

Ericson2314 · 2022-09-07T17:41:30Z

NixOS/rfcs#133 we have an RFC for finally getting some or all of our work merged upstream.

lemanschik · 2022-12-04T12:45:53Z

oh i am happy that you understand that some one pointed me here i can tell you this is closed by

Tracking: ECOSYSTEM Adoption: V8 Is now a „General Recursive Applicative and Algorithmic Language“. Virtual Machine and so is the web! lemanschik/web-modules#4

to be more exact the linked PR: incremental refactoring to web-modules of vscode / code-oss it will integrate a distributed build cache on web scale via p2p i also superseeded the IPFS Standard via web-modules a interplanetary module system that is *nix compatible i will update you all soon sorry i am alone.

vcunat mentioned this issue Mar 24, 2016

Intensional store model #296

Closed

Ericson2314 mentioned this issue Mar 27, 2016

WIP: deterministic evaluation of expressions/derivations #709

Closed

14 tasks

copumpkin mentioned this issue Jul 14, 2016

consider a torrent for build inputs NixOS/nixpkgs#16953

Closed

vcunat mentioned this issue Jul 16, 2016

Additional binary caches for unusual flags NixOS/nixpkgs#16190

Closed

stale bot added the stale label Nov 16, 2021

stale bot removed the stale label Nov 16, 2021

Ericson2314 mentioned this issue Jan 14, 2022

Git hashing as a new file ingestion method --- contains #3754 #3635

Closed

mrusme mentioned this issue May 15, 2022

IPFS and Gentoo Portage (distfiles) ipfs/notes#296

Open

stale bot added the stale label Aug 12, 2022

stale bot removed the stale label Sep 7, 2022

fricklerhandwerk added feature Feature request or proposal performance UX The way in which users interact with Nix. Higher level than UI. labels Sep 12, 2022

roberth added significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. store Issues and pull requests concerning the Nix store fetching Networking with the outside (non-Nix) world, input locking labels Jun 2, 2023

Nix and IPFS #859

Nix and IPFS #859

Comments

vcunat commented Mar 24, 2016

What we might start with

Converting hashes

Non-fixed content

Deduplication

Saving disk space

ehmry commented Mar 24, 2016

vcunat commented Mar 24, 2016

ehmry commented Mar 24, 2016

Ericson2314 commented Mar 24, 2016

Ericson2314 commented Mar 24, 2016

vcunat commented Mar 24, 2016

Ericson2314 commented Mar 24, 2016

Ericson2314 commented Mar 24, 2016

Ericson2314 commented Mar 24, 2016

cleverca22 commented Apr 8, 2016

nmikhailov commented Apr 8, 2016

cleverca22 commented Apr 8, 2016

davidar commented Apr 8, 2016

Ericson2314 commented Apr 8, 2016

davidar commented Apr 9, 2016

Ericson2314 commented Apr 9, 2016

Ericson2314 commented Apr 9, 2016

cleverca22 commented Apr 9, 2016

knupfer commented Jul 20, 2016

knupfer commented Jul 20, 2016

copumpkin commented Jul 20, 2016

knupfer commented Jul 28, 2016 • edited Loading

copumpkin commented Jul 28, 2016

copumpkin commented Jul 28, 2016 • edited Loading

knupfer commented Jul 28, 2016

davidak commented Jul 28, 2016

lordcirth commented Nov 11, 2020 • edited Loading

kevincox commented Nov 12, 2020

ohAitch commented Nov 12, 2020

lordcirth commented Nov 12, 2020

kamadorueda commented Dec 26, 2020

bbigras commented Dec 30, 2020

kamadorueda commented Dec 30, 2020

bbigras commented Dec 31, 2020

kamadorueda commented Jan 2, 2021

bbigras commented May 12, 2021

Ericson2314 commented May 12, 2021

stale bot commented Nov 16, 2021

davidak commented Nov 16, 2021

Ericson2314 commented Nov 16, 2021

Ericson2314 commented Feb 10, 2022

nixos-discourse commented Feb 10, 2022

Ericson2314 commented Sep 7, 2022

lemanschik commented Dec 4, 2022

knupfer commented Jul 28, 2016 •

edited

Loading

copumpkin commented Jul 28, 2016 •

edited

Loading

lordcirth commented Nov 11, 2020 •

edited

Loading