Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Identify the "big bugs" and "big optimizations" relevant for data.gov #105

Closed
flyingzumwalt opened this issue Jan 16, 2017 · 5 comments
Closed
Assignees

Comments

@flyingzumwalt
Copy link
Contributor

flyingzumwalt commented Jan 16, 2017

Make a short list of the "big bugs" and "big optimizations" relevant for this sprint. We'll want a good list to have in mind

ie. file attrs, bitswap supporting paths (kills so many RTTs)

@Kubuxu
Copy link
Contributor

Kubuxu commented Jan 17, 2017

Storage is one that everyone notices but in my optionion it isn't the limiting factor in this case.
300TiB of data is such a huge amount that just adding it to go-ipfs currently would take 6 months (300*2^40 / (20*2^20)/60/60/24/30, about 20 MiB
/s is add speed I got from go-ipfs on beefy PC with SSD caching and config optimizations), currently that for sure can be improved.

In my opinion we should focus on getting the performance for those big datasets in a range that is possible to use them at all. This includes adding, fetching, rechecking.

Also I have no idea if our GC will run at all with such a number of keys (it might run out of memory).

DHT is also other major problem that might make the go-ipfs choke even if you've got the disk space.

Filestore would be nice space wise but it probably could be a sprint on its own (or not, depending on how cleanly we want to do it), and even if we got it there might other barriers for deploying this data into IPFS.

@whyrusleeping
Copy link
Contributor

Some notes i wrote down the other day:

  • UX
    • Adds are slow
      • fetches are bursty
        • this is likely due to the bitswap concurrency factor per peer being too high
      • small files are slow
        • this is likely due to us having poor batching code, doing one batch per tiny file
    • managing things you've added is hard
      • Automatically add entries to mfs, maybe an equivalent of the 'Downloads' directory on mfs
      • Should make a survey for what UX people want here
  • Scaling Performance
    • Does flatfs degrade? (need metrics)
    • Look at alternate datastores
      • SQL
      • Bolt
      • "RoundFS"
    • Content Routing is slow
      • Provide selectors could help, harder to do
      • Trackers is a fairly easy thing to do
      • Larger block size could help scale the problem down by a constant factor
        • Need 'importer parameters' on objects to validate things properly
      • Need bitswap without providing

@jbenet
Copy link
Contributor

jbenet commented Jan 17, 2017

@whyrusleeping and I added some notes here ipfs/notes#216 -- copied here


data.gov

  • @whyrusleeping has a diagram (post it here maybe?)
  • improving add perf
  • improving UX
  • went over possible on-disk datastore changes
    • single mmapped file, btree index of offsets, unodered blocks after
  • went over ipfs-pack, manifest, verify
    • how it combines very well with filestore
    • importer string
  • other-repo-datastore
  • accumulators wish list
  • to discuss still:
    • s3-datastore
    • filestore implementation details

@flyingzumwalt
Copy link
Contributor Author

Relevant notes from the sprint planning call:

Big Bugs & Optimizations

TODO: dig up diagram @whyrusleeping created

@flyingzumwalt
Copy link
Contributor Author

Marking the issue "Done" because we've identified the list, but we will still be using it as a reference.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants