Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/bank: CPU: exorcise app.Deliver->runTx as it consumes more CPU cycles than app.Commit per BenchmarkOneBankMultiSendTxPerBlock #8697

Closed
2 of 4 tasks
odeke-em opened this issue Feb 25, 2021 · 9 comments
Labels
T: Performance Performance improvements

Comments

@odeke-em
Copy link
Collaborator

Summary of Bug

As part of cosmos-sdk benchmarking, this issue is to provide a guide to figuring out what culprits are and what needs to be investigated and improved. Inside x/bank/bench_test.go there is a looming need to figure out what consumes more CPU per

// Run this with a profiler, so its easy to distinguish what time comes from
// Committing, and what time comes from Check/Deliver Tx.
for i := 0; i < b.N; i++ {
benchmarkApp.BeginBlock(abci.RequestBeginBlock{Header: tmproto.Header{Height: height}})
_, _, err := benchmarkApp.Check(txGen.TxEncoder(), txs[i])
if err != nil {
panic("something is broken in checking transaction")
}
_, _, err = benchmarkApp.Deliver(txGen.TxEncoder(), txs[i])
require.NoError(b, err)
benchmarkApp.EndBlock(abci.RequestEndBlock{Height: height})
benchmarkApp.Commit()
height++
}
}

The target here is to figure out how to make things better and improve on throughput and what to care about optimizing. We've worked on continuous benchmarking infrastructure that'll extract a git commit from a PR, run benchmarks and post results to show cosmos-sdk engineers what could have changed.

Results

app.Deliver->runTx consumes much more CPU cycles (10.30s) than app.Commit (1.16s)

Commit CPU graph

app Commit

Deliver CPU graph

app Deliver

Version

Latest at 90d799f

Steps to Reproduce

To reproduce, please run

go test -run=^$ -bench=OneBankMultiSendTxPerBlock -run=60 -cpuprofile=ms.cpu -memprofile=ms.mem

and when completed, run pprof to examine the respective files

/cc @ethanfrey @cuonglm @okwme


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@odeke-em odeke-em added the T: Performance Performance improvements label Feb 25, 2021
@github-actions github-actions bot added the stale label Aug 11, 2021
@odeke-em odeke-em reopened this Aug 21, 2021
@ethanfrey
Copy link
Contributor

app.Deliver->runTx consumes much more CPU cycles (10.30s) than app.Commit (1.16s)

yes, this seems like a reasonable conclusion.

commit may have more I/O (it has all the writes, while deliver has the possibly cached reads). It would be interesting to have a graph of IO time here as well.

Also, do you use leveldb with disk backed storage? This is what makes Commit and deliver reads slower

@github-actions github-actions bot removed the stale label Aug 22, 2021
@alexanderbez
Copy link
Contributor

alexanderbez commented Aug 23, 2021

image

Snapshot from Osmosis. Not sure if helpful or relevant here. Note, osmosis has distribution epochs and that's why you see the uniform spikes.

@ethanfrey
Copy link
Contributor

Interesting real world graph. Thank you @alexanderbez

Are the commit times maxing at 100ms, or is that averaged? As I heard of multi-second end block for the distribution (but that might be end block and not commit)

@alexanderbez
Copy link
Contributor

This is an average. I'll post the Prom query shortly.

@odeke-em
Copy link
Collaborator Author

This is an average. I'll post the Prom query shortly.

Averages are fallacious -- imagine myself, my sister and Bill Gates, my 8 friends all walked into a room and someone asked for the average wealth, you can see how skewed that is.

I'd highly encourage using percentiles and not averages so for example here is how to get the 95th percentile for a metric named "metric" in Prometheus

histogram_quantile(0.95,
    sum(rate(metric_bucket[5m])) by (job, le))

cc @kirbyquerby and @cuonglm

@alexanderbez
Copy link
Contributor

Averages are fallacious -- imagine myself, my sister and Bill Gates, my 8 friends all walked into a room and someone asked for the average wealth, you can see how skewed that is.

Sure, but that's not at all what's happening here.

The query is:

rate(abci_deliver_tx_sum{job=\"tendermint\", hostname=\"$hostname\", host=\"\", chain_id=\"$chain_id\"}[5m]) / rate(abci_deliver_tx_count{job=\"tendermint\", hostname=\"$hostname\", host=\"\", chain_id=\"$chain_id\"}[5m])

But this thread isn't about Prom, so use this how you wish.

@odeke-em
Copy link
Collaborator Author

Averages are fallacious -- imagine myself, my sister and Bill Gates, my 8 friends all walked into a room and someone asked for the average wealth, you can see how skewed that is.

Sure, but that's not at all what's happening here.

The query is:


rate(abci_deliver_tx_sum{job=\"tendermint\", hostname=\"$hostname\", host=\"\", chain_id=\"$chain_id\"}[5m]) / rate(abci_deliver_tx_count{job=\"tendermint\", hostname=\"$hostname\", host=\"\", chain_id=\"$chain_id\"}[5m])

But this thread isn't about Prom, so use this how you wish.

@alexanderbez I raised that because we are dealing with latencies. Is it possible for you to help us with the bucketized p95th latency graph?

@alexanderbez
Copy link
Contributor

Yup. I'll run the query in a bit! I also have a public Gaia instance if you want to run your own Prom queries?

@elias-orijtech
Copy link
Contributor

Gentle ping. Still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: Performance Performance improvements
Projects
None yet
Development

No branches or pull requests

5 participants