Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using ipfs or other existing content addressable stores for tarballs. #99

Open
Raynos opened this issue Jun 1, 2019 · 16 comments
Labels
enhancement New feature or request registry the public API layer of the backend

Comments

@Raynos
Copy link

Raynos commented Jun 1, 2019

In the readme you mentioned that you want to use a content addressable storage.

There are existing content addressable systems like IPFS that you can leverage.

I’ve recently spoken with IPFS engineers and they are really interested in making IPFS easy to use for package managers so they might be open to implement features you need.

@zkat
Copy link
Contributor

zkat commented Jun 1, 2019

zkat/pacote#173 (comment)

IPFS performance is disappointing. I'm not sure it's ready for something like this, tbh.

@ceejbot ceejbot added enhancement New feature or request registry the public API layer of the backend labels Jun 2, 2019
@ceejbot
Copy link
Collaborator

ceejbot commented Jun 2, 2019

IPFS perf is a worry, but as an overall approach I think pluggable backends is good. My next task for the project is to make it possible to store the content blobs in S3 & other object stores so people who have durability requirements (and don't want to deal with backing up disks) can have this option.

@fwip
Copy link

fwip commented Jun 4, 2019

Dat is also a very good content-addressable system in the JavaScript space, and I think a lot of the work they've done could be helpful for this application. (If not used directly, at least as inspiration / problem solving).

@martinheidegger
Copy link

I am involved in the DAT community and saying HI! So: DAT is pretty cool for this, but I wouldn't use it ... yet ... because it doesn't solve enough features for a reasonably sized registry. Though that is changing: @andrewosh has progressed far in adding a hypertrie structure holepunchto/hyperdrive#233 (available in a rc-release). The new hyperdrive is tested with a lot of files and a lot of data. (terabytes, pentabyte-test still running)

This makes it an interesting candidate for a decentralized data structure:

  1. Unlike someone@mydomain.com dependencies which require server tooling. Links could look like dat://<32bit key> (with domain names optionally: `dat://mydomain.com) which means you don't have to buy a domain to join the fun!
  2. DATs are by definition single-writer which means it both makes sure that the owner doesn't switch magically and that no-one hacks inbetween.
  3. DAT uses a predefined networking stack but that can be easily exchanged for a networking stack of your choice (DAT is nice but hard that way)

But there are good challenges ahead why to not use DAT.

  • What if someone lost the key to update a DAT or published a malicious library? Having moderation roots that specify namespaces might be a necessary for a reasonable user experience.
  • Multi-writer isn't here yet. This means only one person is allowed to publish a new version. Not ideal for a package manager.
  • Proper DAT link versioning is not trivial

@philippefutureboy
Copy link

Would a protocol like BitTorrent be an interesting technology to support a package registry? It's already widely used for torrenting and seems to be quite performant.

Similarly, as stated in #252, if you are interested in exploring blockchain options I can link a few experts from the community here (Maidsafe, Skycoin, etc.).

Let me know what you think!

@tomByrer
Copy link

tomByrer commented Jun 16, 2019

IPFS perf is a worry, but as an overall approach I think pluggable backends is good. My next task for the project is to make it possible to store the content blobs in S3...

Perf issues was my instinct also.

I'm all for 'decentralized' but seems if someone doesn't ensure they're the 'always on +current +connected source,' there can be no certainty of file availability, which can not happen. So there has to be 1 source of truth somewhere. But extra ad-hoc POPs for a CDN-like network is a cool idea, no matter the protocol.

May I suggest contacting jsDelivr for help with this? They built their own routing system that spreads file requests over 4+ CDNs, with backups for the backups. They might be able to host the files though jsDelivr even; they already mirror npm & a chunk of JS on Github.

@ghost
Copy link

ghost commented Jun 16, 2019

When

IPFS perf is a worry

is stated, is perf short for performance?

The protocol is decentralized, sure, but it is also distributed. Copies of the the files stored in this protocol are automatically distributed which means...

if the main hoster is down, then copies are still available via other peers on the network...

Ideally the file will still be available from anyone else because it is a p2p protocol as well. This essentially allows it to behave like a CDN with redundant backups all over the net.

It's also faster and more efficient because you never download from a single source from a server that may be a considerable distance away from you. Instead, you download from peers that are closest to you and grab those files incrementally from those designated peers.

Downloading from a centralized source has always been slower than downloading from a p2p source.

Sources disappearing has also always been an issue regardless of the protocol or hosting service.

I think it should be allowed if the hoster wants their files to disappear. There's a reason why GDPR was put in to effect and it's because sometimes people want this.

@martinheidegger
Copy link

I just stumbled on this, using ssb: https://github.com/noffle/ssb-npm-101

@hannahhoward
Copy link

Hi! Just want to say I'm on the IPFS team and folks over there are discussing what we can do to support you all. You may have seen @andrew linked ipfs-inactive/package-managers#64 -- where we are discussing Entropic.

As someone pretty close the the data transfer aspects of IPFS, I am super concerned about perf and working on it. In theory, it's like @averydark says -- always having the worlds fastest and most redundant CDN at your fingertips -- but in practice, there are a LOT of challenges cause IPFS is a truly distributed global content addressed network, without magnet links and trackers you might have in BitTorrent, and this makes for some complicated problems. So, we will hopefully be exactly what @averydark says eventually, but we're not quite there yet. One upside to having IPFS support is as IPFS gets faster, you get the benefits, and it will get faster, or at least I think it will anyway.

@RangerMauve
Copy link

Heyo, I'm also coming from the Dat community. I'm currently working on our SDK / developer experience stuff. Package manager is a pretty big deal.

One of my worries about registries is that they'd be have centralized update mechanisms and would be taking control of both indexing / curating packages and storage. Decentralized tech can help a lot with this. Registries can focus more on the curation / search aspect, and users can have more control over the actual updates of their content and can move the storage of their packages as they see fit.

I think the current IPFS registry works as a sort of mirror of NPM / uses IPNS links to people's package history. Tracking down all the pieces that need to be pinned is extra effort in that you need to parse the package metadata to find all the IPFS links that need to be pinned.
Swarming the files inside packages globally seems like it'd be a lot of overhead for packages with many files / IPNS updates are still pretty slow so updating might be a bit of a hassle.
Swarming per file is going to be great for sparsely downloading files from packages, however, since it'll be easier to find peers just for the files you want.

With Dat, you could have a different workflow using the upcoming mounts feature. I wrote a blog post about it earlier last year.
Basically, if you have a package, you can create a Dat archive to keep all its files in one place. Any version metadata can be placed in a file at the root, and you can either have the files directly in it, or have some sort of fancy setup for linking to versions (or something more simple like the manifest IPFS is using).

Then registries become archives which can have folders for package names, which will then mount the archive for the package.
A cool thing about this setup is that updates to any packages or to the registry should propagate fairly quickly, and can reasonably be processed in real-time by listening to the change events. This could be used for all sorts of hooks for doing automated testing / changes / etc.
Keeping copies of a package online is a bit easier, too since you can say "pin this archive" to keep the entire history updated, or "pin this package but only with the latest changes" or "pin this package at this specific version".
Similarly, mirrors of registries can either be pinned sparsely, or pinned fully, or even mounted within other registries to group them together or federate them.

Search indexes could also be stored in dat archvies and distributed over the P2P network so that you wouldn't need a central server to serve the search, it could be processed in one place and propagated over p2p networks using the internet, local wifi, or fancy mesh network setups. It'd also have the advantage of working offline out of the box. Of course this stuff could also be accomplished with IPFS, but again the updates would take more time to propagate, and individually swarming for each file in each archive would be a larger overhead than one swarm per archive / registry.

The mounts feature isn't out yet, so I'd wait a month or two while we integrate it into the SDK. Also, updating a package from multiple devices is still in the design phase. 😅

@RangerMauve
Copy link

By the way, here's a toy project that @dpaez from @geut recently put together a package manager built on top of Dat as a sort of toy project.

https://github.com/geut/building-up-on-dat/tree/master/packages/gpm

@tinchoz49
Copy link

By the way, here's a toy project that @dpaez from @geut recently put together a package manager built on top of Dat as a sort of toy project.

https://github.com/geut/building-up-on-dat/tree/master/packages/gpm

A short demo video from his talk in the NodeConf Colombia: https://streamable.com/l52ba

@marcusnewton
Copy link

Holochain is a much better approach than IPFS. You can guarantee availability of packages, you can establish shared rules about registering packages, the registry API would be automatically generated for you and a GUI for browsing packages can be served over Holo. The entire thing can be fully distributed with no central servers, without sacrificing performance. Uptime is 100% because you retrieve packages via a DHT, peer to peer. It's the obvious solution IMO

@pegaltier
Copy link

In addition to @marcusnewton comment. I would like to add another positive thing about Holochain. Actually is built in Rust (the future) and that's the reason why they are late but at the end it will be a very positive argument for the project.

@spiralcrew-ou
Copy link

This is a git-like protocol created with Holochain. It could be a source of inspiration: https://github.com/uprtcl/hc-uprtcl

@yanmaani
Copy link

yanmaani commented Feb 8, 2022

Consider using BitTorrent, which is not a project created as a marketing scheme for cryptocurrency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request registry the public API layer of the backend
Projects
None yet
Development

No branches or pull requests