Skip to content
This repository has been archived by the owner on Jun 2, 2020. It is now read-only.

Guide: “Running an IPFS pinning service and making it fast” #62

Closed
Mr0grog opened this issue Mar 28, 2018 · 11 comments
Closed

Guide: “Running an IPFS pinning service and making it fast” #62

Mr0grog opened this issue Mar 28, 2018 · 11 comments
Labels
dif/medium Prior experience is likely helpful effort/days Estimated to take multiple days, but less than a week help wanted Seeking public contribution on this issue topic/design-content Content design, writing, information architecture topic/docs Documentation

Comments

@Mr0grog
Copy link
Collaborator

Mr0grog commented Mar 28, 2018

This issue is part of Epic 3B: Fixes from legacy issue queue.

Or: how to run IPFS on a server so stuff stays online when your computer goes offline.

This is from a discussion on Slack with @flyingzumwalt. Keeping it here so we don’t lose track of it.

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Mar 28, 2018

Some useful basics to make sure we cover:

@nothingismagick
Copy link

Maybe also with an optional part about how to mount a mutable folder...

@flyingzumwalt
Copy link

@lgierth and @hsanjuan does our advice here basically boil down to "use IPFS Cluster"?

  • Can we imagine any cases where we would advise running a pinning service that doesn't use cluster?
  • Are there specific instructions for people who want to run a performant, reliable pinning service? system configs, system requirements, etc.
  • Is there any other documentation specific to this use case that isn't addressed by the IPFS cluster docs? Should we fold those instructions into the Cluster docs?

@ghost
Copy link

ghost commented Apr 4, 2018

An important question is probably "what kind of IPFS pinning service" -- do you want to offer it to others, do you just want host a bunch of your own stuff, etc. They come at different levels of seriousness, and are a complexity trade-off. I.e. to my knowledge cluster isn't of terribly much use below 3 nodes (@hsanjuan confirm), but setting up 3 nodes and clustering is quite the entry barrier.

Are there specific instructions for people who want to run a performant, reliable pinning service? system configs, system requirements, etc.

  • Use ipfs init --profile=server when in a datacenter
  • Use the badger datastore (experimental), the current default flatfs datastore gets slower the more data it holds. This was quite noticable with the media.ccc.de dataset.
  • SSDs are always good
  • With the flatfs datastore, use an underlying filesystem that doesn't account inodes (e.g. btrfs instead of ext4)
  • Explain why S3 isn't really an option for the datastore (we used to have an s3 datastore, and people keep sometimes asking about this)
  • Maybe it's worth mentioning @kyledrake's IPFS-on-Ceph setup. ipfs-cluster is clearly the recommendation as it continues to emerges, but it's worth pointing out all the options of what you can do (and how that pretty much follows from our stack's modularity).
  • What to do in certain failure scenarios (e.g. failing disks, or split-brains clusters (which haven't been an issue, but still))

Is there any other documentation specific to this use case that isn't addressed by the IPFS cluster docs? Should we fold those instructions into the Cluster docs?

How to build/operate storage in general: RAID, NAS, HDD/SSD/NVMe/Flash, filesystems, monitoring

@ghost
Copy link

ghost commented Apr 4, 2018

Pinning lots of (big?) things performantly

This gets straight into data structures and IPLD land :) Which is great - every big data structure is its own beast. We've made optimizations for a couple of different scenarios by now:

  • Unixfs directory sharding was implemented in response to the issues with large numbers of directory entries in the npm-on-ipfs dataset (10s of thousands). This has helped tremendously with the wikipedia-on-ipfs datasets (millions of directory entries), and others.
  • The media.ccc.de dataset was simply so large in total size that performance problems in the flatfs datastore emerged (i.e. O(n) writes). With badger it's totally fine (O(1) writes).

There are also specific datasets that we currently have issues with, e.g. the wikipedia-on-ipfs datasets are terribly slow to fetch and pin, and we're not yet sure why. That Stanford dataset of Jack's was also pretty nasty to IPFS but we eventually managed with a big machine.

Interesting thing to convey will be "how to actually see that it's pinning performantly).

@hsanjuan
Copy link
Member

hsanjuan commented Apr 4, 2018

Can we imagine any cases where we would advise running a pinning service that doesn't use cluster?

It all depends on how cluster feature set matches the pinning service requirements.

Also, note that we haven't scaled cluster with, say, millions of pins...

@nothingismagick
Copy link

I just outlined one of our mission-critical processes over at:

https://discuss.ipfs.io/t/millions-of-pins-in-a-transient-ipfs-cluster/2494

There I explain how and why we will have millions of pins. FYI, I plan to be testing this in early May.

@philips
Copy link

philips commented Jul 13, 2018

Personally I am interested in the explanation on why pinning to object store isn't an option. It seems obvious but for whatever reason no one says why it isn't documented.

@Stebalien
Copy link
Contributor

You can. You'd have to configure go-ipfs to use https://github.com/ipfs/go-ds-s3 (will require some code modification to set it up).

It's not documented because it hasn't been really been tested and/or integrated (we don't use it internally and have higher priority issues at the moment).

@Mr0grog Mr0grog added dif/medium Prior experience is likely helpful P2 - Medium labels Aug 24, 2018
@asimoneo
Copy link

Still not clear what hw requirements are for running ipfs as a nas, or, instead of it.

@meiqimichelle meiqimichelle added topic/design-content Content design, writing, information architecture and removed blocked topic/design-content Content design, writing, information architecture labels Jun 4, 2019
@jessicaschilling jessicaschilling changed the title Write a guide to “Running an IPFS pinning service and making it fast” Guide: “Running an IPFS pinning service and making it fast” Jul 26, 2019
@jessicaschilling jessicaschilling added the help wanted Seeking public contribution on this issue label Jul 26, 2019
@jessicaschilling jessicaschilling added topic/docs Documentation effort/days Estimated to take multiple days, but less than a week and removed Priority: P2 (Medium) labels Sep 19, 2019
@jessicaschilling
Copy link
Contributor

Closing due to overlap with #94. Also noting that substantial development has been made on Cluster since this issue was originally opened.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
dif/medium Prior experience is likely helpful effort/days Estimated to take multiple days, but less than a week help wanted Seeking public contribution on this issue topic/design-content Content design, writing, information architecture topic/docs Documentation
Projects
None yet
Development

No branches or pull requests

9 participants