EPIC: Optimizing transaction processing #14929

elias-orijtech · 2023-02-06T21:14:04Z

Summary

As brought up in a recent team meeting, optimizing the transaction processing of Cosmos is a top priority. As a point of comparison Cosmos is described as an order (or two?) of magnitude slower than Tendermint itself.

Problem Definition

Performance is important for keeping the resource requirements of Cosmos chains in check, and to alleviate the effect of denial-of-service attacks.

Work Breakdown

As usual for performance optimization,

Create or run benchmarks with realistic loads
Optimize hotspots.

CC @odeke-em for reference.

CC @tac0turtle to get the ball rolling. What are the most realistic benchmarks to focus on? Are there other issues relevant to this work?

I've played around with the benchmarks and tests in order to find something relevant. make test-sim-benchmarks seems relevant, but

$ make test-sim-benchmark
Running application benchmark for numBlocks=500, blockSize=200. This may take awhile!
main module (github.com/cosmos/cosmos-sdk) does not contain package github.com/cosmos/cosmos-sdk/simapp
make: *** [test-sim-benchmark] Error 1

Running

$ cd simapp
$ go test -mod=readonly -benchmem -run=^$ -bench ^BenchmarkFullAppSimulation$ -Enabled=true -NumBlocks=500 -BlockSize=200 -Commit=true -timeout=24h
goos: darwin
goarch: arm64
pkg: cosmossdk.io/simapp
BenchmarkFullAppSimulation-8   	       1	126039193916 ns/op	86157807584 B/op	1111032767 allocs/op
PASS
ok  	cosmossdk.io/simapp	126.989s

gives me some result, but is it a realistic load?

The text was updated successfully, but these errors were encountered:

alexanderbez · 2023-02-06T21:55:28Z

This is going to have a very large variance depending on how/what the application does with transactions. My suggestion is to consider utilizing the simulations for benchmarking

elias-orijtech · 2023-02-06T22:29:05Z

How would I utilize the simulations? Isn't simapp a simulator?

tac0turtle · 2023-02-07T05:07:34Z

I think this is a fairly large scope that may make sense to breakdown into phases.

There is the ante handler which checks transactions, execution and storage and commitment phases. It might make sense to start with that instead of benchmarking a chain.

Most of these items can be done with out running a chain, or benchmarking a chain instead the components that take part of the execution path. Tx processing is also up to applications so benchmarking modules may not need to be part of the first phase here.

elias-orijtech · 2023-02-07T11:53:04Z

Sounds good to me, in particular the part about leaving out application specific processing for now.

How would I go about running the ante handler?

alexanderbez · 2023-02-07T14:49:21Z

How would I utilize the simulations? Isn't simapp a simulator?

No, SimApp is a basic reference application implementation. The simulation framework in the SDK uses it for simulations.

yihuang · 2023-02-08T04:27:54Z

With some changes like this, I'm able profile block delivery on production data using tendermint block replay:

reset application.db to an old version
cronosd start --home /chain/.cronosd --cpu-profile /tmp/cpu.profile, it start replaying blocks at full speed.
wait at least 5 seconds, then interrupt the process as you want

In my test run, most of the blocks are empty, the profile result looks like this:

      flat  flat%   sum%        cum   cum%
     6.21s 28.66% 28.66%      6.29s 29.03%  runtime.cgocall
     5.76s 26.58% 55.24%      5.76s 26.58%  [librocksdb.so.7.9.2]
     1.99s  9.18% 64.42%      1.99s  9.18%  [libc.so.6]
     0.79s  3.65% 68.07%      0.79s  3.65%  [liblz4.so.1.9.3]
     0.69s  3.18% 71.25%      0.69s  3.18%  github.com/tendermint/tendermint/types.(*Validator).CompareProposerPriority
     0.59s  2.72% 73.97%      0.59s  2.72%  [libstdc++.so.6.0.28]
     0.33s  1.52% 75.50%      0.92s  4.25%  runtime.mallocgc
     0.32s  1.48% 76.97%      0.32s  1.48%  github.com/tendermint/tendermint/types.safeAdd (inline)
     0.30s  1.38% 78.36%      1.66s  7.66%  github.com/tendermint/tendermint/types.(*ValidatorSet).incrementProposerPriority
     0.24s  1.11% 79.46%      0.93s  4.29%  github.com/tendermint/tendermint/types.(*ValidatorSet).getValWithMostPriority (inline)
     0.24s  1.11% 80.57%      0.79s  3.65%  runtime.scanobject
     0.18s  0.83% 81.40%      0.23s  1.06%  runtime.findObject

What's interesting is tendermint IncrementProposerPriority/CompareProposerPriority pop up as hotspot, there's O(n) processing there, not sure if it's a concern.

lasarojc · 2023-02-08T10:57:22Z

Even though the applications built on Cosmos may be very different from "regular" applications, it may be worth looking into classical benchmarks to gather extra data points, such as in https://arxiv.org/pdf/2210.04484.pdf

elias-orijtech · 2023-02-08T13:31:10Z

@yihuang that sounds like exactly what I want. Can you please explain to me how I acquire a snapshot application.db and a body of block data to replay?

My only concern is that any snapshot may not have any outlier transactions: unusual transactions taking a disproportionate amount of processing. They're juicy targets for DoS attacks, yet presumably rarely seen in normal transaction logs.

yihuang · 2023-02-08T14:42:48Z

@yihuang that sounds like exactly what I want. Can you please explain to me how I acquire a snapshot application.db and a body of block data to replay?

On startup if tendermint has newer blocks than application.db, it'll replay those block automatically, so you just need to rollback your application.db to an earlier version, there are a few options:

restore application.db from a statesync snapshot, there's a PR for restore from local snapshot(Enable local statesync/snapshot restore #13521), not sure of the status.
restore to a db backup, or create backup now and wait for it to sync for a while, then restore.

It was convenient for me because I'm developing this "versiondb" feature, where I have developed a set of tools to replay the change set to any target version and dump the IAVL snapshot, also able to restore application.db from those snapshots, so basically I can restore application.db to any version in several minutes, you can find more about them here, should work for any cosmos-sdk chain, since we have the same db structure.

My only concern is that any snapshot may not have any outlier transactions: unusual transactions taking a disproportionate amount of processing. They're juicy targets for DoS attacks, yet presumably rarely seen in normal transaction logs.

yeah, that's hard to detect in benchmarks, you can't cover all the cases, probably need to monitor each block's processing time for abnormal numbers.

elias-orijtech · 2023-02-08T15:40:18Z

How do you do it without having an existing node running? I don't have one locally, but more importantly I think it's crucial to be able to run benchmarks continuously. Otherwise, performance will surely backslide in time.

yihuang · 2023-02-08T15:52:51Z

How do you do it without having an existing node running? I don't have one locally, but more importantly I think it's crucial to be able to run benchmarks continuously. Otherwise, performance will surely backslide in time.

I was just try to get a feel about the production behavior, for benchmarks need to run continuously, we'll need more isolated environment.

tac0turtle · 2023-08-18T14:49:09Z

closing this for now as the work is part of a simulator rewrite that is getting started

github-actions bot added the needs-triage Issue that needs to be triaged label Feb 6, 2023

tac0turtle changed the title ~~Optimizing transaction processing~~ EPIC: Optimizing transaction processing Feb 7, 2023

tac0turtle added T:Epic Epics and removed needs-triage Issue that needs to be triaged labels Feb 7, 2023

tac0turtle closed this as completed Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Optimizing transaction processing #14929

EPIC: Optimizing transaction processing #14929

elias-orijtech commented Feb 6, 2023

alexanderbez commented Feb 6, 2023

elias-orijtech commented Feb 6, 2023

tac0turtle commented Feb 7, 2023

elias-orijtech commented Feb 7, 2023

alexanderbez commented Feb 7, 2023

yihuang commented Feb 8, 2023 •

edited

Loading

lasarojc commented Feb 8, 2023

elias-orijtech commented Feb 8, 2023

yihuang commented Feb 8, 2023

elias-orijtech commented Feb 8, 2023

yihuang commented Feb 8, 2023

tac0turtle commented Aug 18, 2023

EPIC: Optimizing transaction processing #14929

EPIC: Optimizing transaction processing #14929

Comments

elias-orijtech commented Feb 6, 2023

Summary

Problem Definition

Work Breakdown

alexanderbez commented Feb 6, 2023

elias-orijtech commented Feb 6, 2023

tac0turtle commented Feb 7, 2023

elias-orijtech commented Feb 7, 2023

alexanderbez commented Feb 7, 2023

yihuang commented Feb 8, 2023 • edited Loading

lasarojc commented Feb 8, 2023

elias-orijtech commented Feb 8, 2023

yihuang commented Feb 8, 2023

elias-orijtech commented Feb 8, 2023

yihuang commented Feb 8, 2023

tac0turtle commented Aug 18, 2023

yihuang commented Feb 8, 2023 •

edited

Loading