Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Story: Test Suite for 1MB -> 100TB Payloads #102

Open
4 tasks
flyingzumwalt opened this issue Jan 16, 2017 · 12 comments
Open
4 tasks

Story: Test Suite for 1MB -> 100TB Payloads #102

flyingzumwalt opened this issue Jan 16, 2017 · 12 comments

Comments

@flyingzumwalt
Copy link
Contributor

flyingzumwalt commented Jan 16, 2017

We don't have good metrics, graphs or reports about performance when we increase sizes and loads -- where/when/how performance dipped under certain circumstances. We need to know more than "Does it scale?". We need to know "how does it scale?" So we can identify the domain of problems, etc.

Acceptance Scenario

This story will be done when IPFS Maintainers (or, ideally, anyone) can run a suite of scripts that test IPFS at each order of magnitude of total data, from 1MB -> 100TB (up to 500TB or 1PB).

For each magnitude the tests should cover a variety of payloads. At the very least, there should be

  • a giant file payload
  • a payload with lots of little files

Tasks

  • Identify or Generate Test Data for each of the Workloads
  • Create the tests. Add them to https://github.com/ipfs/fs-stress-test (or another appropriate location)
  • Provide documentation on how to run the tests
  • Have at least one collaborator run the tests based on the provided documentation
@flyingzumwalt
Copy link
Contributor Author

@jbenet @whyrusleeping What can we add to make the acceptance scenario more precise? What are the tests checking for?

  • Are they reporting on performance?
  • Are they watching memory load?
  • Are they watching blockstore performance?
  • Are they running checks to confirm that data were replicated properly?

@ghost
Copy link

ghost commented Jan 16, 2017

Another note: getting ahold of 100TB hardware is non-trivial, and expensive

@flyingzumwalt
Copy link
Contributor Author

Yeah, but there are definitely orgs out there who do have easy access to that kind of storage and want to test IPFS at those volumes. It might make sense to write these tests with the assumption that they will be run by 3rd parties who then hand back reports after running them.

@flyingzumwalt
Copy link
Contributor Author

@Kubuxu this might be a good place for you to start tomorrow before we get to do a proper sprint planning call. Also watch the issues in the "Ready" column in https://waffle.io/ipfs/archives. At the moment most of them are for @jbenet to review stuff but you can review them too!

@flyingzumwalt flyingzumwalt changed the title Story: Test Suite for 1M -> 100TB Payloads Story: Test Suite for 1MB -> 100TB Payloads Jan 16, 2017
@Kubuxu
Copy link
Contributor

Kubuxu commented Jan 17, 2017

The major thing we need to know is how blockstores scale with number of items and size of those items. Those are two metrics that will probably characterize performance.

Setup for those tests is expensive and long (you need to write those GiBs or TiBs of data into disk). We can do those tests incrementally but we need setup dedicated for it).

Are they running checks to confirm that data were replicated properly?

In IFPS we have HashOnRead option which for archives IMO should be default on, disk corruptions happen even in RAID setups and without it we have no way of noticing it. We have to check if it screams loudly about it, and have a way to recover (linking the corrupted block to original file and reading it to recover corrupted block).

@whyrusleeping
Copy link
Contributor

Tests we want:

  • 1 node, adding dataset
  • 1 node adding dataset, same node cat'ing dataset
  • 2 nodes, add on one, cat on the other
  • 3 nodes, add on 1, cat on other two at the same time
  • 3 nodes, add on 1, cat on other two, one after the other
  • 10 nodes, add on 1, cat on others concurrently
  • 10 nodes, add on 1, cat on others serially
  • 100 nodes, add on 1, cat on others concurrently
  • 100 nodes, add on 1, cat on others in sets of 10 (first ten nodes concurrently, next ten, etc)
  • 100 nodes, add on 1, cat on others serially

Each of these tests will be run on each dataset, where the datasets vary by the following variables:

  • number of files
  • size of files (min/max)
  • nesting depth of directories

Each test should also be run for the following different node configurations:

  • routing = { normal, dhtclient, none}
  • NoSync = {true, false}
  • --raw-leaves on add
  • --chunker={normal, rabin}

during these tests, we want to gather the metrics described in this issue: ipfs/kubo#3607

@whyrusleeping
Copy link
Contributor

Ideally, we can start any of these tests with a UX of something like:

> iptest 10n-concur-cat --datasets=datadef.json --nodecfg=ncfg.json
Running... [ elapsed time: 1m23s ]
Test Complete!
Results available at: /ipfs/QmFooBarBaz

@jbenet
Copy link
Contributor

jbenet commented Jan 23, 2017

something like

> iptest --routing=<value> --repo-sync=<value> --raw-leaves=<value> --chunker=<value> --num-files=<value> --file-size-min=<val>  --file-size-max=<val>
Running... [elapsed time: 1m23s]
Test complete!
results available at: /ipfs/QmFooBarBaz

@whyrusleeping
Copy link
Contributor

@jbenet made a rudimentary tool to do this here: https://github.com/whyrusleeping/iptest

@jbenet jbenet self-assigned this Jan 25, 2017
@flyingzumwalt
Copy link
Contributor Author

@jbenet reminder: For #122 we need you to clarify which tests we're aiming for. That mainly involves rearranging and this list #102 (comment) and calling out which parts are important for #122 to enable.

@jbenet
Copy link
Contributor

jbenet commented Jan 25, 2017

clarifying tests we need from #102 (comment)

We

  • MUST have P0
  • SHOULD have P1 and P2

in

  • P0 (easiest, get these working first)
    • 1 node, adding dataset
    • 1 node adding dataset, same node cat'ing dataset
  • P1 (should not be much more, we should aim to get these)
    • 2 nodes, add on one, cat on the other
    • 3 nodes, add on 1, cat on other two at the same time (concurrent)
    • 3 nodes, add on 1, cat on other two, one after the other (serially)
  • P2 (10 -> 100 should not be much work, can be just one line change)
    • 10 nodes, add on 1, cat on others concurrently
    • 10 nodes, add on 1, cat on others serially
    • 100 nodes, add on 1, cat on others concurrently
    • 100 nodes, add on 1, cat on others serially
    • 100 nodes, add on 1, cat on others in sets of 10 (first ten nodes concurrently, next ten, etc)

As described in #102 (comment) each of these tests will be run on each dataset, with varying these variables:

  • number of files
  • size of files (min/max)
  • nesting depth of directories

We can adjust these variables o/ to play with go-random-files better, or modify go-random-files flags to take these variables better.

With tuples like:

num files, min file size, max file size, directory nesting depth
1, 1KB, 1KB, 1
1000, 10KB, 1MB, 10
10000, 500B, 1KB, 100
100000, 500B, 1KB, 1000
1000000, 500B, 1KB, 1000
10000000, 500B, 1KB, 1000
1, 1MB, 1MB, 1
10, 1MB, 10MB, 1
100, 1MB, 10MB, 5
1, 100MB, 100MB, 1
10, 20MB, 100MB, 3
100, 20MB, 100MB, 5
1000, 20MB, 100MB, 10
1, 1GB, 1GB, 1
10, 500MB, 1GB, 1
100, 100MB, 1GB, 5
1000, 100MB, 1GB, 10
1, 1TB, 1TB, 1
10, 500GB, 1TB, 1

We can auto generate the tuples or pick a few we think are interesting. I think auto generating may generate a lot more than we need. So maybe making a standard list will be useful. We can auto generate that list and prune it or something.

Also, this makes me think that we should improve go-random-files to sample from other distributions (non uniform) in the future, or have things like total max size, randomly distributed. (to generate more realistic workloads.

If any of the tuples above are too big to hit now, that's fine. let's aim for getting a bunch of them working first.

Each test should also be run for the following different node configurations:

  • routing = { normal, dhtclient, none}
  • NoSync = {true, false}
  • --raw-leaves on add
  • --chunker={normal, rabin}

during these tests, we want to gather the metrics described in this issue: ipfs/kubo#3607

@whyrusleeping
Copy link
Contributor

whyrusleeping commented Jan 25, 2017

Generating these datasets from scratch on each test run is impractical. We should have a way to generate them determinstically, but be able to reuse previously generated datasets across test runs.

@jbenet jbenet removed their assignment Jan 25, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants