Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to SDK branch that lowers epoch time #451

Merged
merged 3 commits into from
Aug 29, 2021
Merged

Conversation

ValarDragon
Copy link
Member

@ValarDragon ValarDragon commented Aug 28, 2021

This switches us to the SDK branch that fixes the N^2 issues within the CacheKV store (osmosis-labs/cosmos-sdk#24 )

It also adds some scaffolding so that we can test with LevelDB backends, which at some point ended up being necessary due to a memDB bug and makes benchmarks more accurate due to avoiding some weird oddities with the memDB.

(I'm not sure if the memDB bug still gets triggered if we make some insanely high loads, but it definitely seemed like a bug on google's sides, as I traced it back when it applied to a perfectly normal set operation that caused google's BTree to infinitely hang)

Before:

BenchmarkDistributionLogicMedium-16            4         270370131 ns/op        96074306 B/op    1334220 allocs/op

After:

BenchmarkDistributionLogicMedium-16          285           3644674 ns/op         1192226 B/op      20120 allocs/op

Speedup:

>>> old = 270370131
>>> new = 3644674
>>> 100 * (old - new) / old
98.6519686969416

~99% speedup of the benchmark!

We should backport this to v3.2.0, and check if this empirically improves the epoch time performance moreso than prior WIP v3.2.0 branch.

Also interestingly, now bech32 encoding / decoding work takes 10% of the compute time after this change.

addr1 := sdk.AccAddress([]byte("addr1---------------"))
coins := sdk.Coins{sdk.NewInt64Coin("stake", 10)}
startAveragingAt := 1000
totalNumLocks := 2000
totalNumLocks := 5000
Copy link
Member Author

@ValarDragon ValarDragon Aug 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now works just fine with 20000, that'd take 7 seconds on my laptop. Just kept it lower to not take more time in CI / local testing.

@ValarDragon
Copy link
Member Author

ValarDragon commented Aug 28, 2021

Hrmm, on mainnet this still has execution time at 2 minutes total, which seems like a bit over a 2x improvement. Trying to see if I can make the benchmark parameterization comparable.

@ValarDragon
Copy link
Member Author

ValarDragon commented Aug 28, 2021

Ok, benchmark accuracy improved! It still was that the bulk of the time before was in the CacheKV effects, but the prior benchmark wasn't well capturing the other things.

Now the time that is split around: 30% is in bech32 encoding / decoding, 10-15% in emitting events, 8% in validating denoms, and a lot in many random places in bank. I'm going to work on knocking out a few of these simple fixes at least, that I'm hoping should be a quick additional 20-30% speedup.

After next commit:

BenchmarkDistributionLogicLarge-16    	       1	165136166385 ns/op	43062670968 B/op	732285874 allocs/op

(Thats taking an insane 43 GB of data written to/from heap 😬 -- at least only 700MB of heap space actually needed to be allocated tho)

@codecov-commenter
Copy link

codecov-commenter commented Aug 28, 2021

Codecov Report

Merging #451 (3897547) into main (1a1c591) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #451   +/-   ##
=======================================
  Coverage   19.00%   19.00%           
=======================================
  Files         144      144           
  Lines       22546    22546           
=======================================
  Hits         4285     4285           
  Misses      17517    17517           
  Partials      744      744           
Impacted Files Coverage Δ
x/incentives/keeper/gauge.go 68.66% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a1c591...3897547. Read the comment docs.

@ValarDragon
Copy link
Member Author

ValarDragon commented Aug 29, 2021

The above numbers have been improved (with only non-breaking changes) to be:

BenchmarkDistributionLogicLarge-16             1        88318071639 ns/op       34189730880 B/op        601589333 allocs/op

** Use some grains of salt, the benchmark itself seems to have an ~8% variance in each number on my machine, due to lock distribution things.

The next step non-state breaking big steps are basically:

  1. Remove events from the Bank module that are triggered here - Should save massive amounts of computation time, as we will be allocating less data and garbage collecting less data.
  2. Lower the number of calls to SubtractCoins by making a single SendCoinsFromModuleToManyAccounts
  3. Lower bech32 decryption overhead

Theres a lot of state breaking options to do afterwards, and changes to the logic code to combine processing of all gauges for denom at once.

I think the main immediate thing worth doing is (1), and saving the rest for later. We'll start hitting marginal utility on this side, versus the refactor of distribution logic. (That could risk being state breaking due to IAVL insertion order dependencies) The small-medium refactoring of the distribution logic should be a 2-3x speedup. (But also obviated with F1 work later)

@ValarDragon ValarDragon changed the title Switch to SDK branch that eliminates epoch time Switch to SDK branch that lowers epoch time Aug 29, 2021
@ValarDragon ValarDragon merged commit 6d082da into main Aug 29, 2021
@ValarDragon ValarDragon deleted the speedup_epochs branch August 29, 2021 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants