Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Jae/simulator improvements #2900

Merged
merged 9 commits into from
Nov 27, 2018
Merged

WIP: Jae/simulator improvements #2900

merged 9 commits into from
Nov 27, 2018

Conversation

jaekwon
Copy link
Contributor

@jaekwon jaekwon commented Nov 26, 2018

See the Tendermint PR for more details: tendermint/tendermint#2913

This PR makes the CacheKVStore.Iterator/ReverseIterator faster, and also makes simulation faster by faking the IAVL store mount and using a db adapter instead.

Requires corresponding dependency changes in Tendermint and IAVL.

@cwgoes
Copy link
Contributor

cwgoes commented Nov 26, 2018

Working on fixes to this PR in a separate one for clarity - #2901.

baseapp/baseapp.go Outdated Show resolved Hide resolved
@@ -62,3 +62,29 @@ func RandTimestamp(r *rand.Rand) time.Time {
unixTime := r.Int63n(253373529600)
return time.Unix(unixTime, 0)
}

// Derive a new rand deterministically from a rand.
// Unlike rand.New(rand.NewSource(seed)), the result is "more random"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this "more random"? I don't really follow - is there a substantive difference between reinitializing a new PRNG and just stepping the same PRNG?

cc @ValarDragon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not more random in a meaningful way. This doesn't even increase the repetitivity of the internal RNG state. (Since your combining multiple PRNG's with the same cycle size via xor)

This typically may be done to combine more entropy in some strange settings, however here that is also not the case since each seed is derived from the original r instance.

The rng used in golang is essentially a lagged fibonacci random number generator, with fixes that improve the seeding. (Source: https://groups.google.com/forum/#!topic/golang-nuts/RZ1G3_cxMcM) That has an insane cycle length (2^603) and is sufficiently random for our purposes. It is not suitable for things like monte-carlo simulations, but nor is this. It would make sense to switch to a monte-carlo safe rng in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repetitivity/cycle-length doesn't matter, as you mentioned. A problem of the original math.Rand (with default Source) is that seed values that have the same remainder when divided by 2^31-1 generate the same pseudo-random sequence.. There are only two billion unique initial random sequences for a new Rand (with default Source)!

In order to make simulation deterministic while being able to make changes in the middle somewhere, we can't just re-use the same *Rand, because changes to the usage of the *Rand of a middle transaction causes rippling changes for all future operations. In some situations this can make debugging more difficult, because e.g. just because you mutate/mute/remove a single operation of a block doesn't mean that you're really testing the same scenario, as later operations of the same block (as an example) would be completely different.

The solution is to construct a tree of rands, as it's already done here... Here, we construct a predetermined set of operations, and then for further logic within an operation, we pass in a derived Rand. In the future we may want to derive third and fourth order rands in this way.

Once you're deriving a bunch of rands in this way, the limitation of 2^31-1 possible sequences does become meaningful. Sometimes you'll end up with the same seed for a derived rand, which means for that branch of rand usage, it's no longer random... the same sequence was used in another branch of simulation logic!

This DeriveRand() function is designed to solve this issue. The possibility of deriving the same rand sequence becomes negligible.

Copy link
Contributor

@ValarDragon ValarDragon Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, I didn't know that! The comment should be amended from making the result more "random", which its not, to instead increasing the seed space.

However I still don't think this is worth doing. With there being 2**31 different valid seeds, by application of birthday collisions we'd have to do 54 thousand simulations with no change to the simulator in order for there to be a 50% probability of a single duplicate seed. We'd have to do 8.3 million simulations before there is a 1% probability each successive simulation was for nought. I frankly don't think we will do 54 thousand long simulations ever before changing the simulator, let alone 8.3 million. I don't think this is worth the performance trade-off, as speed of the simulator matters.

@cwgoes
Copy link
Contributor

cwgoes commented Nov 26, 2018

Merging #2901 since it seems non-controversial and I want to debug other sim fixes separately.

* Fix upstream versions

* Linter fix

* Pin exact revisions, not branches

* Rename 'MountStoresIAVL' to 'MountStores'
@cwgoes cwgoes requested a review from zramsay as a code owner November 26, 2018 14:47
@cwgoes
Copy link
Contributor

cwgoes commented Nov 26, 2018

The normal Gaia simulation fails with seed 99:

make test_sim_gaia_fast
Simulating... block 14/500, operation 650/696. Invariants broken after BeginBlock
unexpected leftover validator pool coins: 23.2500000000
Too many logs to display, instead writing to simulation_log_2018-11-26 15:32:43.txt
--- FAIL: TestFullGaiaSimulation (3.68s)
    invariants.go:27: 
FAIL

Everything else seems OK, import/export & simulation after import work on all other seeds.

Still running 500-block multi-seed sim.

@rigelrozanski
Copy link
Contributor

interestingly enough, this is not an accum issue, I ran the validator accum invariants when this fails and everything is proper. This suggests to me that this is possibly a rounding issue when we withdraw.

baseapp/baseapp.go Outdated Show resolved Hide resolved
cmd/gaia/app/app.go Show resolved Hide resolved
store/cachekvstore.go Show resolved Hide resolved
// make a tree and save it
func newTree(t *testing.T, db dbm.DB) (*iavl.MutableTree, CommitID) {
// make a tree with data from above and save it
func newAlohaTree(t *testing.T, db dbm.DB) (*iavl.MutableTree, CommitID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Que?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, 🌴 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's full of some random data include the word "aloha". Differentiates it from a "newTree".

store/prefixstore.go Show resolved Hide resolved
@rigelrozanski
Copy link
Contributor

figured out what the new bug is related to for seed 99- basically in block 14 there are no bonded validators, but validators are still expected to validate for a block or two after they’re no longer bonded - so what is happening is that fees are being paid during those blocks but there is no accum which is being created, hence the fees go to nobody!
oddly enough within this simulation during this brief state when the validator set dips down to zero (but is still running for a block) new validators are bonded and the network seems to want to keep running… so even if I skip this invariant check for block 14, the invariant still fails for block 15 somehow… this seems sketchy to me lol - we should probably shut down everything as soon as there are no bonded tokens left

@codecov
Copy link

codecov bot commented Nov 26, 2018

Codecov Report

Merging #2900 into develop will decrease coverage by 0.37%.
The diff coverage is 44%.

@@             Coverage Diff             @@
##           develop    #2900      +/-   ##
===========================================
- Coverage    56.72%   56.35%   -0.38%     
===========================================
  Files          120      120              
  Lines         8396     8416      +20     
===========================================
- Hits          4763     4743      -20     
- Misses        3311     3351      +40     
  Partials       322      322

@rigelrozanski
Copy link
Contributor

rigelrozanski commented Nov 26, 2018

fixed in #2905

Also opened up a reference issue for the early shutdown (#2906)

I don't forsee this circumstance happening in GoS as AiB will maintain the large backbone validator running... however this should get resolved prelaunch

@jaekwon jaekwon merged commit d1e7622 into develop Nov 27, 2018
@cwgoes cwgoes deleted the jae/simulator_improvements branch November 27, 2018 12:44
@alexanderbez alexanderbez mentioned this pull request Nov 27, 2018
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants