-
Notifications
You must be signed in to change notification settings - Fork 24
Story: Test Suite for 1MB -> 100TB Payloads #102
Comments
@jbenet @whyrusleeping What can we add to make the acceptance scenario more precise? What are the tests checking for?
|
Another note: getting ahold of 100TB hardware is non-trivial, and expensive |
Yeah, but there are definitely orgs out there who do have easy access to that kind of storage and want to test IPFS at those volumes. It might make sense to write these tests with the assumption that they will be run by 3rd parties who then hand back reports after running them. |
@Kubuxu this might be a good place for you to start tomorrow before we get to do a proper sprint planning call. Also watch the issues in the "Ready" column in https://waffle.io/ipfs/archives. At the moment most of them are for @jbenet to review stuff but you can review them too! |
The major thing we need to know is how blockstores scale with number of items and size of those items. Those are two metrics that will probably characterize performance. Setup for those tests is expensive and long (you need to write those GiBs or TiBs of data into disk). We can do those tests incrementally but we need setup dedicated for it).
In IFPS we have |
Tests we want:
Each of these tests will be run on each dataset, where the datasets vary by the following variables:
Each test should also be run for the following different node configurations:
during these tests, we want to gather the metrics described in this issue: ipfs/kubo#3607 |
Ideally, we can start any of these tests with a UX of something like:
|
something like
|
@jbenet made a rudimentary tool to do this here: https://github.com/whyrusleeping/iptest |
@jbenet reminder: For #122 we need you to clarify which tests we're aiming for. That mainly involves rearranging and this list #102 (comment) and calling out which parts are important for #122 to enable. |
clarifying tests we need from #102 (comment)We
in
As described in #102 (comment) each of these tests will be run on each dataset, with varying these variables:
We can adjust these variables o/ to play with go-random-files better, or modify go-random-files flags to take these variables better. With tuples like:
We can auto generate the tuples or pick a few we think are interesting. I think auto generating may generate a lot more than we need. So maybe making a standard list will be useful. We can auto generate that list and prune it or something. Also, this makes me think that we should improve go-random-files to sample from other distributions (non uniform) in the future, or have things like total max size, randomly distributed. (to generate more realistic workloads. If any of the tuples above are too big to hit now, that's fine. let's aim for getting a bunch of them working first. Each test should also be run for the following different node configurations:
during these tests, we want to gather the metrics described in this issue: ipfs/kubo#3607 |
Generating these datasets from scratch on each test run is impractical. We should have a way to generate them determinstically, but be able to reuse previously generated datasets across test runs. |
We don't have good metrics, graphs or reports about performance when we increase sizes and loads -- where/when/how performance dipped under certain circumstances. We need to know more than "Does it scale?". We need to know "how does it scale?" So we can identify the domain of problems, etc.
Acceptance Scenario
This story will be done when IPFS Maintainers (or, ideally, anyone) can run a suite of scripts that test IPFS at each order of magnitude of total data, from 1MB -> 100TB (up to 500TB or 1PB).
For each magnitude the tests should cover a variety of payloads. At the very least, there should be
Tasks
The text was updated successfully, but these errors were encountered: