Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inspect and fix a slow pattern in InitGenesis of certain modules #7766

Closed
4 tasks
robert-zaremba opened this issue Oct 30, 2020 · 9 comments · Fixed by #8719
Closed
4 tasks

Inspect and fix a slow pattern in InitGenesis of certain modules #7766

robert-zaremba opened this issue Oct 30, 2020 · 9 comments · Fixed by #8719
Labels
C:genesis relating to chain genesis C:Store T: Performance Performance improvements
Milestone

Comments

@robert-zaremba
Copy link
Collaborator

robert-zaremba commented Oct 30, 2020

Summary of a Problem

Starting Gaia chain takes lot of time. The issue was reported here: #7682.
We removed unnecessary balance checks and made x/crisis AssertInvariants optional during InitGenesis.

In the aforementioned issue we profiled the app and found that the bottleneck is the storage layer.

Here the goal is to inspect the storage patterns we are using and optimize it. Hints:


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@robert-zaremba robert-zaremba added T: Performance Performance improvements C:genesis relating to chain genesis C:Store labels Oct 30, 2020
@robert-zaremba robert-zaremba added this to the v1.0 LTS milestone Oct 30, 2020
@clevinson clevinson changed the title Inspect and ifx a slow pattern in InitGenesis of certain modules Inspect and fix a slow pattern in InitGenesis of certain modules Nov 3, 2020
@zmanian
Copy link
Member

zmanian commented Feb 21, 2021

I've done some investigation of this issue and come closer to isolating the problem.

First I added logging and determined virtually the entire start up time is spent initializing the bank module.

Screen Shot 2021-02-20 at 11 05 28 PM

I'll note that both the auth and bank modules perform O(n) operations on the store where n is number of accounts.

Profiling SetAccount and SetBalance though have given inconclusive results.

https://github.com/cosmos/cosmos-sdk/blob/master/x/auth/keeper/account.go#L50-L60

SetAccount appears to have an execution time of 3-5 microseconds.

The execution time of SetBalance on 0.41.* takes 3-50 microseconds.

Now I observe that Robert refactored this code path in #8473 but I don't see anything that I would expect to change performance.

This looks firmly like an IAVL performance issue to me.

@faddat
Copy link
Contributor

faddat commented Feb 24, 2021

I'll add an observation: while this is happening, we're barely touching the CPU. It did not look like a single-thread bottleneck.

@odeke-em
Copy link
Collaborator

odeke-em commented Feb 26, 2021

Thank you for this report @robert-zaremba, thank you @zmanian and @faddat for the input!

I just encountered this while creating benchmarks with 100K accounts and thanks to @ethanfrey for pointing me to a similar issue.

Preliminary investigation

Given this code

func BenchmarkOneBankMultiSendTxPerBlock100K(b *testing.B) {
        exp, xerr := exporter.New(context.Background(), exporter.WithInsecureAddress(":8444"))
        if xerr != nil {
                panic(xerr)
        }
        defer exp.Stop()

        startTime := time.Now()
        b.ReportAllocs()
        db, err := dbm.NewGoLevelDB("onebankmultisend100k", ".")
        if err != nil {
                b.Fatal(err)
        }
        defer db.Close()

        // 1. Generate n validators.
        valSet, _ := tmtypes.RandValidatorSet(100, 1e6)
        println("generated validatorset", time.Since(startTime).String())

        // 3. Generate balances and accounts.
        addrs, _ := makeRandomAddressesAndPublicKeys(100000)
        println("generated random addresses and publickeys", time.Since(startTime).String())
        genTokens := sdk.TokensFromConsensusPower(67)
        genCoin := sdk.NewCoin("benchcoin", genTokens)
        var balances []banktypes.Balance
        var accounts []types.GenesisAccount
        coins := sdk.Coins{genCoin}

        for i, addr := range addrs {
                addrStr := addr.String()
                balances = append(balances, banktypes.Balance{
                        Address: addrStr,
                        Coins:   coins,
                })
                accounts = append(accounts, &authtypes.BaseAccount{
                        Address:       addrStr,
                        AccountNumber: uint64(i + 1),
                })
        }
        println("generated balances and accounts", time.Since(startTime).String())

        app := simapp.SetupWithGenesisValSetAndDB(b, db, valSet, accounts, balances...)
        println("finished generating the app", time.Since(startTime).String())
        benchmarkOneBankMultiSendTxPerBlock(b, app)
}

func makeRandomAddressesAndPublicKeys(n int) (accL []sdk.AccAddress, pkL []*ed25519.PubKey) {
        for i := 0; i < n; i++ {
                pk := ed25519.GenPrivKey().PubKey().(*ed25519.PubKey)
                pkL = append(pkL, pk)
                accL = append(accL, sdk.AccAddress(pk.Address()))
        }
        return accL, pkL
}

TL;DR:

  • sdk.AccAddressFromBech32 is very expensive yet used in a quicksort best case O(nlogn) algorithm, or worst case O(n**2), per
    func SanitizeGenesisBalances(balances []Balance) []Balance {
    sort.Slice(balances, func(i, j int) bool {
    addr1, _ := sdk.AccAddressFromBech32(balances[i].Address)
    addr2, _ := sdk.AccAddressFromBech32(balances[j].Address)
    return bytes.Compare(addr1.Bytes(), addr2.Bytes()) < 0
  • Keeper.setBalances -> Keeper.ClearBalances is very slow when iterating through the entire database to clear prior values

Deep dive

I have some continuous profiling internally at my company and it collects profiles for 2 minutes periodically, and indeed InitGenesis is taking forever and the culprits so far are:
snapshot

Total: 41.48s
ROUTINE ======================== github.com/cosmos/cosmos-sdk/types/module.(*Manager).InitGenesis in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/types/module/module.go
         0     16.32s (flat, cum) 39.34% of Total
         .          .    305:			continue
         .          .    306:		}
         .          .    307:
         .          .    308:                t0 := time.Now()
         .          .    309:                println("starting initGeneis for ", moduleName, time.Since(startTime).String())
         .     16.32s    310:		moduleValUpdates := m.Modules[moduleName].InitGenesis(ctx, cdc, genesisData[moduleName])
         .          .    311:                println("finished initGeneis for ", moduleName, time.Since(startTime).String(), "spent", time.Since(t0).String())
         .          .    312:
         .          .    313:		// use these validator updates if provided, the module manager assumes
         .          .    314:		// only one module will update the validator set
         .          .    315:		if len(moduleValUpdates) > 0 {
ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/auth.AppModule.InitGenesis in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/auth/module.go
         0      1.96s (flat, cum)  4.73% of Total
         .          .    133:
         .          .    134:// InitGenesis performs genesis initialization for the auth module. It returns
         .          .    135:// no validator updates.
         .          .    136:func (am AppModule) InitGenesis(ctx sdk.Context, cdc codec.JSONMarshaler, data json.RawMessage) []abci.ValidatorUpdate {
         .          .    137:	var genesisState types.GenesisState
         .      1.40s    138:	cdc.MustUnmarshalJSON(data, &genesisState)
         .      560ms    139:	InitGenesis(ctx, am.accountKeeper, genesisState)
         .          .    140:	return []abci.ValidatorUpdate{}
         .          .    141:}
         .          .    142:
         .          .    143:// ExportGenesis returns the exported genesis state as raw bytes for the auth
         .          .    144:// module.
ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/auth.InitGenesis in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/auth/genesis.go
         0      560ms (flat, cum)  1.35% of Total
         .          .     11:// CONTRACT: old coins from the FeeCollectionKeeper need to be transferred through
         .          .     12:// a genesis port script to the new fee collector account
         .          .     13:func InitGenesis(ctx sdk.Context, ak keeper.AccountKeeper, data types.GenesisState) {
         .          .     14:	ak.SetParams(ctx, data.Params)
         .          .     15:
         .       10ms     16:	accounts, err := types.UnpackAccounts(data.Accounts)
         .          .     17:	if err != nil {
         .          .     18:		panic(err)
         .          .     19:	}
         .       20ms     20:	accounts = types.SanitizeGenesisAccounts(accounts)
         .          .     21:
         .          .     22:	for _, a := range accounts {
         .       70ms     23:		acc := ak.NewAccount(ctx, a)
         .      460ms     24:		ak.SetAccount(ctx, acc)
         .          .     25:	}
         .          .     26:
         .          .     27:	ak.GetModuleAccount(ctx, types.FeeCollectorName)
         .          .     28:}
         .          .     29:
ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/bank.AppModule.InitGenesis in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/bank/module.go
         0     14.36s (flat, cum) 34.62% of Total
         .          .    136:// InitGenesis performs genesis initialization for the bank module. It returns
         .          .    137:// no validator updates.
         .          .    138:func (am AppModule) InitGenesis(ctx sdk.Context, cdc codec.JSONMarshaler, data json.RawMessage) []abci.ValidatorUpdate {
         .          .    139:	start := time.Now()
         .          .    140:	var genesisState types.GenesisState
         .      1.17s    141:	cdc.MustUnmarshalJSON(data, &genesisState)
         .          .    142:	telemetry.MeasureSince(start, "InitGenesis", "crisis", "unmarshal")
         .          .    143:
         .     13.19s    144:	am.keeper.InitGenesis(ctx, &genesisState)
         .          .    145:	return []abci.ValidatorUpdate{}
         .          .    146:}
         .          .    147:
         .          .    148:// ExportGenesis returns the exported genesis state as raw bytes for the bank
         .          .    149:// module.
ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/bank/keeper.BaseKeeper.InitGenesis in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/bank/keeper/genesis.go
         0     13.19s (flat, cum) 31.80% of Total
         .          .     11:func (k BaseKeeper) InitGenesis(ctx sdk.Context, genState *types.GenesisState) {
         .          .     12:	k.SetParams(ctx, genState.Params)
         .          .     13:
         .          .     14:	var totalSupply sdk.Coins
         .          .     15:
         .      6.39s     16:	genState.Balances = types.SanitizeGenesisBalances(genState.Balances)
         .          .     17:	for _, balance := range genState.Balances {
         .       20ms     18:		addr, err := sdk.AccAddressFromBech32(balance.Address)
         .          .     19:		if err != nil {
         .          .     20:			panic(err)
         .          .     21:		}
         .          .     22:
         .      6.78s     23:		if err := k.setBalances(ctx, addr, balance.Coins); err != nil {
         .          .     24:			panic(fmt.Errorf("error on setting balances %w", err))
         .          .     25:		}
         .          .     26:
         .          .     27:		totalSupply = totalSupply.Add(balance.Coins...)
         .          .     28:	}

Culprits

and to further zoom in

ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/bank/types.SanitizeGenesisBalances.func1 in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/bank/types/balance.go
      20ms      6.36s (flat, cum) 15.33% of Total
         .          .     66:}
         .          .     67:
         .          .     68:// SanitizeGenesisBalances sorts addresses and coin sets.
         .          .     69:func SanitizeGenesisBalances(balances []Balance) []Balance {
         .          .     70:	sort.Slice(balances, func(i, j int) bool {
      10ms      2.64s     71:		addr1, _ := sdk.AccAddressFromBech32(balances[i].Address)
      10ms      3.71s     72:		addr2, _ := sdk.AccAddressFromBech32(balances[j].Address)
         .       10ms     73:		return bytes.Compare(addr1.Bytes(), addr2.Bytes()) < 0
         .          .     74:	})
         .          .     75:
         .          .     76:	for _, balance := range balances {
         .          .     77:		balance.Coins = balance.Coins.Sort()
         .          .     78:	}

Keeper.setBalances

ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/bank/keeper.BaseKeeper.InitGenesis in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/bank/keeper/genesis.go
         0     13.19s (flat, cum) 31.80% of Total
         .          .     11:func (k BaseKeeper) InitGenesis(ctx sdk.Context, genState *types.GenesisState) {
         .          .     12:	k.SetParams(ctx, genState.Params)
         .          .     13:
         .          .     14:	var totalSupply sdk.Coins
         .          .     15:
         .      6.39s     16:	genState.Balances = types.SanitizeGenesisBalances(genState.Balances)
         .          .     17:	for _, balance := range genState.Balances {
         .       20ms     18:		addr, err := sdk.AccAddressFromBech32(balance.Address)
         .          .     19:		if err != nil {
         .          .     20:			panic(err)
         .          .     21:		}
         .          .     22:
         .      6.78s     23:		if err := k.setBalances(ctx, addr, balance.Coins); err != nil {
         .          .     24:			panic(fmt.Errorf("error on setting balances %w", err))
         .          .     25:		}
         .          .     26:
         .          .     27:		totalSupply = totalSupply.Add(balance.Coins...)
         .          .     28:	}

and goes down to

Total: 41.48s
ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/bank/keeper.BaseSendKeeper.setBalances in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/bank/keeper/send.go
      10ms      6.77s (flat, cum) 16.32% of Total
         .          .    235:
         .          .    236:// setBalances sets the balance (multiple coins) for an account by address. It will
         .          .    237:// clear out all balances prior to setting the new coins as to set existing balances
         .          .    238:// to zero if they don't exist in amt. An error is returned upon failure.
         .          .    239:func (k BaseSendKeeper) setBalances(ctx sdk.Context, addr sdk.AccAddress, balances sdk.Coins) error {
         .      6.75s    240:	k.ClearBalances(ctx, addr)
         .          .    241:
      10ms       10ms    242:	for _, balance := range balances {
         .       10ms    243:		err := k.setBalance(ctx, addr, balance)
         .          .    244:		if err != nil {
         .          .    245:			return err
         .          .    246:		}
         .          .    247:	}
         .          .    248:

and into ClearBalances we see

Total: 41.48s
ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/bank/keeper.BaseSendKeeper.ClearBalances in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/bank/keeper/send.go
         0      6.75s (flat, cum) 16.27% of Total
         .          .    219:}
         .          .    220:
         .          .    221:// ClearBalances removes all balances for a given account by address.
         .          .    222:func (k BaseSendKeeper) ClearBalances(ctx sdk.Context, addr sdk.AccAddress) {
         .          .    223:	keys := [][]byte{}
         .      6.75s    224:	k.IterateAccountBalances(ctx, addr, func(balance sdk.Coin) bool {
         .          .    225:		keys = append(keys, []byte(balance.Denom))
         .          .    226:		return false
         .          .    227:	})
         .          .    228:
         .          .    229:	accountStore := k.getAccountStore(ctx, addr)

and down to

Total: 41.48s
ROUTINE ======================== github.com/cosmos/cosmos-sdk/x/bank/keeper.BaseViewKeeper.IterateAccountBalances in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/x/bank/keeper/view.go
         0      6.74s (flat, cum) 16.25% of Total
         .          .    114:// provides the token balance to a callback. If true is returned from the
         .          .    115:// callback, iteration is halted.
         .          .    116:func (k BaseViewKeeper) IterateAccountBalances(ctx sdk.Context, addr sdk.AccAddress, cb func(sdk.Coin) bool) {
         .          .    117:	accountStore := k.getAccountStore(ctx, addr)
         .          .    118:
         .      6.74s    119:	iterator := accountStore.Iterator(nil, nil)
         .          .    120:	defer iterator.Close()
         .          .    121:
         .          .    122:	for ; iterator.Valid(); iterator.Next() {
         .          .    123:		var balance sdk.Coin
         .          .    124:		k.cdc.MustUnmarshalBinaryBare(iterator.Value(), &balance)

then to

ROUTINE ======================== github.com/cosmos/cosmos-sdk/store/prefix.Store.Iterator in /Users/emmanuelodeke/go/src/github.com/cosmos/cosmos-sdk/store/prefix/store.go
         0      6.74s (flat, cum) 16.25% of Total
         .          .     90:		newend = cpIncr(s.prefix)
         .          .     91:	} else {
         .          .     92:		newend = cloneAppend(s.prefix, end)
         .          .     93:	}
         .          .     94:
         .      6.74s     95:	iter := s.parent.Iterator(newstart, newend)
         .          .     96:
         .          .     97:	return newPrefixIterator(s.prefix, start, end, iter)
         .          .     98:}
         .          .     99:
         .          .    100:// ReverseIterator implements KVStore

Suggestions

  • SanitizeGenesisBalances could use a smarter based sorting that doesn't compound the slowness of AccAddressFromBech32 for every single comparison growing with the number of n2. This sort should take less than 60 milliseconds for 100K accounts, it currently takes 6+ seconds. The idea to speed it up is simple: to avoid the O(n2) AccAddressFromBech32, make a type that indexes Balances[i] with the byte value per item, and the sort can proceed normally then using the already retrieved and not discarded value and then a swap can be done trivially. The speed of this on a 3.1Ghz machine will drop down to like 50 milliseconds with the same results
  • We need to fix the database, we should page the database team or if we have cycles work on clearing databases without using the current interfaces, or work around them

@odeke-em
Copy link
Collaborator

@zmanian spot on, iavl is quite slow; along with SanitizeGenesisBalances.

@odeke-em
Copy link
Collaborator

@zmanian there are about 3 optimizations we can do to heavily alleviate that iavl code in dirtyItems, without having to replace it entirely:
firstly let's look at the CPU profile, see this
Screen Shot 2021-02-26 at 6 46 08 AM

If we look at this code

unsorted := make([]*kv.Pair, 0)
for key := range store.unsortedCache {
cacheValue := store.cache[key]
if dbm.IsKeyInDomain([]byte(key), start, end) {
unsorted = append(unsorted, &kv.Pair{Key: []byte(key), Value: cacheValue.value})
delete(store.unsortedCache, key)
}
}

1. Avoid unconditional map lookups

we'll notice that even from the CPU profile, there was a heavy map access with runtime.mapaccess1_faststr, which is the result of this unconditionally executed code

cacheValue := store.cache[key]
which isn't needed unless the key is in the range
We can change it to this

        for key := range store.unsortedCache {
-               cacheValue := store.cache[key]
-
-               if dbm.IsKeyInDomain([]byte(key), start, end) {
+               if dbm.IsKeyInDomain(strToByte(key), start, end) {
+                        cacheValue := store.cache[key]
+                        unsorted = append(unsorted, &kv.Pair{Key: []byte(key), Value: cacheValue.value})

                         delete(store.unsortedCache, key)
+               }
+       }

2. Let the Go compiler use the map clearing idiom which is super fast -- black magic

We can detect and see if all the elements are to be cleared from so use this pattern, and avoid deleting in the same loop

for k := range m {
   delete(m, k)
}

and to do this run through

n := len(store.unsortedCache)
for key := range... {
}

if len(unsorted) == n { // All the elements are to be cleared
    for key := range store.unsortedCache {
          delete(sotre.unsortedCache, key)
    }
} else {
    // Delete normally
}

3. Use zero allocations to convert from kv.Pair.Key a []byte slice to string

After fix 2, we can do this

func strToByte(s string) []byte {
        var b []byte
        hdr := (*reflect.SliceHeader)(unsafe.Pointer(&b))
        hdr.Cap = len(s)
        hdr.Len = len(s)
        hdr.Data = (*reflect.StringHeader)(unsafe.Pointer(&s)).Data
        return b
}

func (store *Store) dirtyItems(start, end []byte) {
        ....
        if len(unsorted) == n {
                for key := range store.unsortedCache {
                        delete(store.unsortedCache, key)
                }
        } else {
                for _, kv := range unsorted {
                        delete(store.unsortedCache, byteSliceToStr(kv.Key))
                }
        }

which cleans up the CPU profile like this
Screen Shot 2021-02-26 at 7 03 02 AM

with that, it takes my runtime for bank down to 5m50s instead of like 20+ minutes of which at times it didn't terminate before.
and with those changes :-)

@alessio
Copy link
Contributor

alessio commented Feb 26, 2021

This is amazing @odeke-em. I'm happy to delay v0.41.4 release if we manage to get these fixes in in a timely fashion. Do we have any PRs fixing this already?

@odeke-em
Copy link
Collaborator

Sorry for the late reply, I was asleep. Sure, I'll mail the PRs for these fixes.

@odeke-em
Copy link
Collaborator

I've mailed out PR #8719 and added some benchmarks too.

odeke-em added a commit that referenced this issue Feb 27, 2021
…code

After continuously profiling InitGensis with 100K accounts, it showed
pathologically slow code, that was the result of a couple of patterns:
* Unconditional and not always necessary map lookups
* O(n^2) sdk.AccAddressFromBech32 retrievals when the code is expensive,
during a quicksort

The remedy involved 4 parts:
* O(n) sdk.AccAddressFromBech32 invocations, down from O(n^2) in the quicksort
* Only doing map lookups when the domain key check has passed
* Using a black magic compiler technique of the map clearing idiom
* Zero allocation []byte<->string conversion

With 100K accounts, this brings InitGenesis down to ~6min, instead of
20+min, it reduces the sort code from ~7sec down to 50ms.

Also some simple benchmark reflect the change:
```shell
name                    old time/op    new time/op    delta
SanitizeBalances500-8     19.3ms ±10%     1.5ms ± 5%  -92.46%  (p=0.000 n=20+20)
SanitizeBalances1000-8    41.9ms ± 8%     3.0ms ±12%  -92.92%  (p=0.000 n=20+20)

name                    old alloc/op   new alloc/op   delta
SanitizeBalances500-8     9.05MB ± 6%    0.56MB ± 0%  -93.76%  (p=0.000 n=20+18)
SanitizeBalances1000-8    20.2MB ± 3%     1.1MB ± 0%  -94.37%  (p=0.000 n=20+19)

name                    old allocs/op  new allocs/op  delta
SanitizeBalances500-8      72.4k ± 6%      4.5k ± 0%  -93.76%  (p=0.000 n=20+20)
SanitizeBalances1000-8      162k ± 3%        9k ± 0%  -94.40%  (p=0.000 n=20+20)
```

The CPU profiles show the radical change as per
#7766 (comment)

Later on, we shall do more profiling and fixes but for now this brings
down the run-time for InitGenesis.

Fixes #7766
@aaronc aaronc removed the backlog label Feb 27, 2021
odeke-em added a commit that referenced this issue Feb 27, 2021
…code (#8719)

After continuously profiling InitGensis with 100K accounts, it showed
pathologically slow code, that was the result of a couple of patterns:
* Unconditional and not always necessary map lookups
* O(n^2) sdk.AccAddressFromBech32 retrievals when the code is expensive,
during a quicksort

The remedy involved 4 parts:
* O(n) sdk.AccAddressFromBech32 invocations, down from O(n^2) in the quicksort
* Only doing map lookups when the domain key check has passed
* Using a black magic compiler technique of the map clearing idiom
* Zero allocation []byte<->string conversion

With 100K accounts, this brings InitGenesis down to ~6min, instead of
20+min, it reduces the sort code from ~7sec down to 50ms.

Also some simple benchmark reflect the change:
```shell
name                    old time/op    new time/op    delta
SanitizeBalances500-8     19.3ms ±10%     1.5ms ± 5%  -92.46%  (p=0.000 n=20+20)
SanitizeBalances1000-8    41.9ms ± 8%     3.0ms ±12%  -92.92%  (p=0.000 n=20+20)

name                    old alloc/op   new alloc/op   delta
SanitizeBalances500-8     9.05MB ± 6%    0.56MB ± 0%  -93.76%  (p=0.000 n=20+18)
SanitizeBalances1000-8    20.2MB ± 3%     1.1MB ± 0%  -94.37%  (p=0.000 n=20+19)

name                    old allocs/op  new allocs/op  delta
SanitizeBalances500-8      72.4k ± 6%      4.5k ± 0%  -93.76%  (p=0.000 n=20+20)
SanitizeBalances1000-8      162k ± 3%        9k ± 0%  -94.40%  (p=0.000 n=20+20)
```

The CPU profiles show the radical change as per
#7766 (comment)

Later on, we shall do more profiling and fixes but for now this brings
down the run-time for InitGenesis.

Fixes #7766

Co-authored-by: Alessio Treglia <alessio@tendermint.com>
zmanian pushed a commit that referenced this issue Feb 27, 2021
…code (#8719)

After continuously profiling InitGensis with 100K accounts, it showed
pathologically slow code, that was the result of a couple of patterns:
* Unconditional and not always necessary map lookups
* O(n^2) sdk.AccAddressFromBech32 retrievals when the code is expensive,
during a quicksort

The remedy involved 4 parts:
* O(n) sdk.AccAddressFromBech32 invocations, down from O(n^2) in the quicksort
* Only doing map lookups when the domain key check has passed
* Using a black magic compiler technique of the map clearing idiom
* Zero allocation []byte<->string conversion

With 100K accounts, this brings InitGenesis down to ~6min, instead of
20+min, it reduces the sort code from ~7sec down to 50ms.

Also some simple benchmark reflect the change:
```shell
name                    old time/op    new time/op    delta
SanitizeBalances500-8     19.3ms ±10%     1.5ms ± 5%  -92.46%  (p=0.000 n=20+20)
SanitizeBalances1000-8    41.9ms ± 8%     3.0ms ±12%  -92.92%  (p=0.000 n=20+20)

name                    old alloc/op   new alloc/op   delta
SanitizeBalances500-8     9.05MB ± 6%    0.56MB ± 0%  -93.76%  (p=0.000 n=20+18)
SanitizeBalances1000-8    20.2MB ± 3%     1.1MB ± 0%  -94.37%  (p=0.000 n=20+19)

name                    old allocs/op  new allocs/op  delta
SanitizeBalances500-8      72.4k ± 6%      4.5k ± 0%  -93.76%  (p=0.000 n=20+20)
SanitizeBalances1000-8      162k ± 3%        9k ± 0%  -94.40%  (p=0.000 n=20+20)
```

The CPU profiles show the radical change as per
#7766 (comment)

Later on, we shall do more profiling and fixes but for now this brings
down the run-time for InitGenesis.

Fixes #7766

Co-authored-by: Alessio Treglia <alessio@tendermint.com>
zmanian pushed a commit that referenced this issue Feb 27, 2021
…code (#8719)

After continuously profiling InitGensis with 100K accounts, it showed
pathologically slow code, that was the result of a couple of patterns:
* Unconditional and not always necessary map lookups
* O(n^2) sdk.AccAddressFromBech32 retrievals when the code is expensive,
during a quicksort

The remedy involved 4 parts:
* O(n) sdk.AccAddressFromBech32 invocations, down from O(n^2) in the quicksort
* Only doing map lookups when the domain key check has passed
* Using a black magic compiler technique of the map clearing idiom
* Zero allocation []byte<->string conversion

With 100K accounts, this brings InitGenesis down to ~6min, instead of
20+min, it reduces the sort code from ~7sec down to 50ms.

Also some simple benchmark reflect the change:
```shell
name                    old time/op    new time/op    delta
SanitizeBalances500-8     19.3ms ±10%     1.5ms ± 5%  -92.46%  (p=0.000 n=20+20)
SanitizeBalances1000-8    41.9ms ± 8%     3.0ms ±12%  -92.92%  (p=0.000 n=20+20)

name                    old alloc/op   new alloc/op   delta
SanitizeBalances500-8     9.05MB ± 6%    0.56MB ± 0%  -93.76%  (p=0.000 n=20+18)
SanitizeBalances1000-8    20.2MB ± 3%     1.1MB ± 0%  -94.37%  (p=0.000 n=20+19)

name                    old allocs/op  new allocs/op  delta
SanitizeBalances500-8      72.4k ± 6%      4.5k ± 0%  -93.76%  (p=0.000 n=20+20)
SanitizeBalances1000-8      162k ± 3%        9k ± 0%  -94.40%  (p=0.000 n=20+20)
```

The CPU profiles show the radical change as per
#7766 (comment)

Later on, we shall do more profiling and fixes but for now this brings
down the run-time for InitGenesis.

Fixes #7766

Co-authored-by: Alessio Treglia <alessio@tendermint.com>
alessio pushed a commit that referenced this issue Mar 1, 2021
From: #8719
Fixes: #7766

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Alessio Treglia <alessio@tendermint.com>
Co-authored-by: Federico Kunze <31522760+fedekunze@users.noreply.github.com>
alessio pushed a commit that referenced this issue Mar 1, 2021
* store/cachekv, x/bank/types: algorithmically fix pathologically slow code (#8719)

After continuously profiling InitGensis with 100K accounts, it showed
pathologically slow code, that was the result of a couple of patterns:
* Unconditional and not always necessary map lookups
* O(n^2) sdk.AccAddressFromBech32 retrievals when the code is expensive,
during a quicksort

The remedy involved 4 parts:
* O(n) sdk.AccAddressFromBech32 invocations, down from O(n^2) in the quicksort
* Only doing map lookups when the domain key check has passed
* Using a black magic compiler technique of the map clearing idiom
* Zero allocation []byte<->string conversion

With 100K accounts, this brings InitGenesis down to ~6min, instead of
20+min, it reduces the sort code from ~7sec down to 50ms.

Also some simple benchmark reflect the change:
```shell
name                    old time/op    new time/op    delta
SanitizeBalances500-8     19.3ms ±10%     1.5ms ± 5%  -92.46%  (p=0.000 n=20+20)
SanitizeBalances1000-8    41.9ms ± 8%     3.0ms ±12%  -92.92%  (p=0.000 n=20+20)

name                    old alloc/op   new alloc/op   delta
SanitizeBalances500-8     9.05MB ± 6%    0.56MB ± 0%  -93.76%  (p=0.000 n=20+18)
SanitizeBalances1000-8    20.2MB ± 3%     1.1MB ± 0%  -94.37%  (p=0.000 n=20+19)

name                    old allocs/op  new allocs/op  delta
SanitizeBalances500-8      72.4k ± 6%      4.5k ± 0%  -93.76%  (p=0.000 n=20+20)
SanitizeBalances1000-8      162k ± 3%        9k ± 0%  -94.40%  (p=0.000 n=20+20)
```

The CPU profiles show the radical change as per
#7766 (comment)

Later on, we shall do more profiling and fixes but for now this brings
down the run-time for InitGenesis.

Fixes #7766

Co-authored-by: Alessio Treglia <alessio@tendermint.com>

* Fix issues with merge

* Apply suggestions from code review

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Alessio Treglia <alessio@tendermint.com>
Co-authored-by: Federico Kunze <31522760+fedekunze@users.noreply.github.com>
daeMOn63 pushed a commit to fetchai/cosmos-sdk that referenced this issue Aug 19, 2021
…code (#8719)

After continuously profiling InitGensis with 100K accounts, it showed
pathologically slow code, that was the result of a couple of patterns:
* Unconditional and not always necessary map lookups
* O(n^2) sdk.AccAddressFromBech32 retrievals when the code is expensive,
during a quicksort

The remedy involved 4 parts:
* O(n) sdk.AccAddressFromBech32 invocations, down from O(n^2) in the quicksort
* Only doing map lookups when the domain key check has passed
* Using a black magic compiler technique of the map clearing idiom
* Zero allocation []byte<->string conversion

With 100K accounts, this brings InitGenesis down to ~6min, instead of
20+min, it reduces the sort code from ~7sec down to 50ms.

Also some simple benchmark reflect the change:
```shell
name                    old time/op    new time/op    delta
SanitizeBalances500-8     19.3ms ±10%     1.5ms ± 5%  -92.46%  (p=0.000 n=20+20)
SanitizeBalances1000-8    41.9ms ± 8%     3.0ms ±12%  -92.92%  (p=0.000 n=20+20)

name                    old alloc/op   new alloc/op   delta
SanitizeBalances500-8     9.05MB ± 6%    0.56MB ± 0%  -93.76%  (p=0.000 n=20+18)
SanitizeBalances1000-8    20.2MB ± 3%     1.1MB ± 0%  -94.37%  (p=0.000 n=20+19)

name                    old allocs/op  new allocs/op  delta
SanitizeBalances500-8      72.4k ± 6%      4.5k ± 0%  -93.76%  (p=0.000 n=20+20)
SanitizeBalances1000-8      162k ± 3%        9k ± 0%  -94.40%  (p=0.000 n=20+20)
```

The CPU profiles show the radical change as per
cosmos/cosmos-sdk#7766 (comment)

Later on, we shall do more profiling and fixes but for now this brings
down the run-time for InitGenesis.

Fixes #7766

Co-authored-by: Alessio Treglia <alessio@tendermint.com>
daeMOn63 pushed a commit to fetchai/cosmos-sdk that referenced this issue Aug 20, 2021
…code (#8719)

After continuously profiling InitGensis with 100K accounts, it showed
pathologically slow code, that was the result of a couple of patterns:
* Unconditional and not always necessary map lookups
* O(n^2) sdk.AccAddressFromBech32 retrievals when the code is expensive,
during a quicksort

The remedy involved 4 parts:
* O(n) sdk.AccAddressFromBech32 invocations, down from O(n^2) in the quicksort
* Only doing map lookups when the domain key check has passed
* Using a black magic compiler technique of the map clearing idiom
* Zero allocation []byte<->string conversion

With 100K accounts, this brings InitGenesis down to ~6min, instead of
20+min, it reduces the sort code from ~7sec down to 50ms.

Also some simple benchmark reflect the change:
```shell
name                    old time/op    new time/op    delta
SanitizeBalances500-8     19.3ms ±10%     1.5ms ± 5%  -92.46%  (p=0.000 n=20+20)
SanitizeBalances1000-8    41.9ms ± 8%     3.0ms ±12%  -92.92%  (p=0.000 n=20+20)

name                    old alloc/op   new alloc/op   delta
SanitizeBalances500-8     9.05MB ± 6%    0.56MB ± 0%  -93.76%  (p=0.000 n=20+18)
SanitizeBalances1000-8    20.2MB ± 3%     1.1MB ± 0%  -94.37%  (p=0.000 n=20+19)

name                    old allocs/op  new allocs/op  delta
SanitizeBalances500-8      72.4k ± 6%      4.5k ± 0%  -93.76%  (p=0.000 n=20+20)
SanitizeBalances1000-8      162k ± 3%        9k ± 0%  -94.40%  (p=0.000 n=20+20)
```

The CPU profiles show the radical change as per
cosmos/cosmos-sdk#7766 (comment)

Later on, we shall do more profiling and fixes but for now this brings
down the run-time for InitGenesis.

Fixes #7766

Co-authored-by: Alessio Treglia <alessio@tendermint.com>
daeMOn63 pushed a commit to fetchai/cosmos-sdk that referenced this issue Mar 1, 2022
…code (#8719)

After continuously profiling InitGensis with 100K accounts, it showed
pathologically slow code, that was the result of a couple of patterns:
* Unconditional and not always necessary map lookups
* O(n^2) sdk.AccAddressFromBech32 retrievals when the code is expensive,
during a quicksort

The remedy involved 4 parts:
* O(n) sdk.AccAddressFromBech32 invocations, down from O(n^2) in the quicksort
* Only doing map lookups when the domain key check has passed
* Using a black magic compiler technique of the map clearing idiom
* Zero allocation []byte<->string conversion

With 100K accounts, this brings InitGenesis down to ~6min, instead of
20+min, it reduces the sort code from ~7sec down to 50ms.

Also some simple benchmark reflect the change:
```shell
name                    old time/op    new time/op    delta
SanitizeBalances500-8     19.3ms ±10%     1.5ms ± 5%  -92.46%  (p=0.000 n=20+20)
SanitizeBalances1000-8    41.9ms ± 8%     3.0ms ±12%  -92.92%  (p=0.000 n=20+20)

name                    old alloc/op   new alloc/op   delta
SanitizeBalances500-8     9.05MB ± 6%    0.56MB ± 0%  -93.76%  (p=0.000 n=20+18)
SanitizeBalances1000-8    20.2MB ± 3%     1.1MB ± 0%  -94.37%  (p=0.000 n=20+19)

name                    old allocs/op  new allocs/op  delta
SanitizeBalances500-8      72.4k ± 6%      4.5k ± 0%  -93.76%  (p=0.000 n=20+20)
SanitizeBalances1000-8      162k ± 3%        9k ± 0%  -94.40%  (p=0.000 n=20+20)
```

The CPU profiles show the radical change as per
cosmos/cosmos-sdk#7766 (comment)

Later on, we shall do more profiling and fixes but for now this brings
down the run-time for InitGenesis.

Fixes #7766

Co-authored-by: Alessio Treglia <alessio@tendermint.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:genesis relating to chain genesis C:Store T: Performance Performance improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
@aaronc @zmanian @alessio @robert-zaremba @clevinson @odeke-em @faddat and others