Skip to content
This repository has been archived by the owner on Mar 25, 2022. It is now read-only.

npm on IPFS - post mortem analysis #132

Closed
daviddias opened this issue Dec 11, 2015 · 5 comments
Closed

npm on IPFS - post mortem analysis #132

daviddias opened this issue Dec 11, 2015 · 5 comments

Comments

@daviddias
Copy link
Member

This is the "post-mortem" (as in, after Node.js Interactive, nothing is over, we are still making it :D) analysis of our adventure of adding npm to IPFS. A lot of key things were learned, improvements were made. Now we still have some work at our hands and a really good platform to test our improvements.

I'll list things as a concise list of things we've identified so that they can be translated into action items to improve. If needed I can write down a more detailed version of the entire adventure for historical records (it was fun systems engineering :))

Problems/bugs found

  • ipfs add -r is slow for large data sets
    • improved by a factor of 1000x with the improved ipfs add -r made possible by mfs (will call it mfs ipfs add -r for clarity). Still not screaming fast though
  • ipfs memory leaks. It consumes a huge amount a memory during a long ipfs add -r (easily over 12Gb if available)
  • ’mfs’ ipfs add -r creates a ton of debris by not gc’ing on time the old MerkleDAG directory nodes. This debris is considerably significant, it increases the space required by a factor of 4x to 6x for a dataset like npm.
  • ’mfs’ ipfs add -r is not pinning the data set correctly. an IPFS repo gc will delete most of it.
  • ipfs is not able to add files concurrently (this could bring a lot of speed improvements)

Other learnings

  • A lot of the challenges caught us off guard by the lack of previous experience on how IPFS would behave for such large dataset, metrics and projections are key to make sure we are prepared and plan accordingly for these kind of situations
  • registry-mirror has good performance flying solo, IPFS add was the bottleneck
@davidar
Copy link
Member

davidar commented Dec 12, 2015

ipfs add -r is slow for large data sets ... A lot of the challenges caught us off guard by the lack of previous experience on how IPFS would behave for such large dataset

FTR, I've been talking about this in the context of ipfs/archives for quite a while...

ipfs memory leaks. It consumes a huge amount a memory during a long ipfs add -r

ipfs/kubo#1222

@jbenet
Copy link
Member

jbenet commented Dec 13, 2015

@davidar yeah, we know you've been running against all of this for a while, so have i and others. this doc is accounting the various things that happened in this effort

@ghost
Copy link

ghost commented Aug 26, 2016

@diasdavid can this be moved somewhere and closed?

@daviddias
Copy link
Member Author

@lgierth somewhere, where?

@ghost
Copy link

ghost commented Aug 26, 2016

I don't know, probably the info isn't even relevant anymore. Mind if I just close?

@ghost ghost added the storage label Nov 3, 2016
@ghost ghost closed this as completed Aug 6, 2018
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants