txIndexer.report() method uses too much cpu on sepolia network #28907

windycrypto · 2024-02-01T08:23:37Z

System information

Geth version: v1.13.11
CL client & version: prysm@v4.2.1
OS & Version: Linux

Steps to reproduce the behaviour

We have been running an archive node for sepolia network, the node also serves the eth_getTransactionReceipt calls at a rate of 150 QPS, this call serves very well until we upgrading to v1.13.11, which causes eth_getTransactionReceipt at a very high latency (p95 >5 seconds).

I dug a bit on the eth_getTransactionReceipt method and found that in v1.13.11 if the transaction is not found atm it will trigger a progress query on the txIndexer.txIndexProgress(), the txIndexProgress() method then trigger the txIndexer.report() method and get the progress from the txIndexer mainloop through a channel. (Well I don't think it's suitable to have a channel on the path of serving a RPC call, but let's hold on that for now.)

The real cause of the high latency is inside the txIndexer.report(), it can trace to the deep internals of pebble and finally stop at the crc32.Update() method and Syscall6().

as you can see txIndexer.report() takes 81% of the sampled CPU time.

node startup options

geth
 --sepolia \
                  --datadir=/var/data/geth \
                  --syncmode=full \
                  --snapshot=false \
                  --cache=1024 \
                  --txlookuplimit=0 \
                  --http \
                  --http.addr=0.0.0.0 \
                  --http.port=8545 \
                  --http.vhosts=* \
                  --http.api=net,web3,eth,txpool,pre,engine,trace \
                  --ws \
                  --ws.addr=0.0.0.0 \
                  --ws.port=8546 \
                  --ws.origins=* \
                  --ws.api=net,web3,eth,txpool,engine \
                  --rpc.allow-unprotected-txs \
                  --metrics \
                  --pprof \
                  --pprof.addr=0.0.0.0 \
                  --pprof.port=9260 \
                  --allow-insecure-unlock \
                  --rpc.gascap=250000000 \
                  --authrpc.addr=0.0.0.0 \
                  --authrpc.port=8551 \
                  --authrpc.jwtsecret=/var/data/geth/jwt.hex \
                  --authrpc.vhosts=* \
                  --state.scheme=path \
                  --db.engine=pebble \
                  --history.transactions=0

Update

While things has not ended. We also run a mainnet ethereum node and also upgraded to v1.13.11, but the latency of the eth_getTransactionReceipt is just as low as before(p95 ~1ms). So I also capture a cpu profile for the mainnet node, btw the mainnet node and the sepolia roughly serve the same kinds and the same amount of RPC calls, but as we can see in mainnet cpu profile the txIndexer.report() only takes 0.4% of the sampled CPU time.

we can see in mainnet node's pprof it doesn't run into the crc32.Update and the Syscall6 method, maybe it's quite related to pebbledb or the db hierarchy between sepolia and mainnet...

The text was updated successfully, but these errors were encountered:

holiman · 2024-02-01T08:31:19Z

Wow, not often we see such detailed reports with flamegraphs and all. Thanks!

rjl493456442 · 2024-02-01T09:47:42Z

The indexing progress is only queried if the transaction/receipt is not found.

Are you sending lots of queries for non-existent receipts?

rjl493456442 · 2024-02-01T12:01:23Z

@cryptocifer Can you please try this branch by any chance? #28908

windycrypto · 2024-02-01T13:25:22Z

The indexing progress is only queried if the transaction/receipt is not found.

Are you sending lots of queries for non-existent receipts?

yep in our scenario wallets send transactions through our node and then polling eth_getTransactionReceipt to check wether the transaction succeeds and get its receipts. Since the transaction is only included into the block after ~15 seconds, so most of the eth_getTransactionReceipt calls are actually with non-existent receipts

let me try #28908, thanks

windycrypto · 2024-02-01T14:47:15Z

@rjl493456442 thanks! the branch does work!

I'm trying to understand the real cause, could you help me confirm whether my guess is right?

Our mainnet node also upgrades to v1.13.11 and serves the same RPC queries with the sepolia node. The only difference between these two nodes are the mainnet node is a full node and keeps only latest 90000 blocks transaction indices, while the sepolia node is an archive node keeps all of the transaction indices.

So the "TxIndexTail" of in the sepolia node db constantly points to block 0 and got no chance to be modified, so the "TxIndexTail" in pebbledb gradually sinks to deeper level. While in the mainnet node, the "TxIndexTail" gets updated frequently so it keeps on a shallow level? So without #28908's fix, repeatedly reading "TxIndexTail" from deep level costs a lot than reading from shallow level?

actually I doubt this suspicion because I believe pebbledb has cache to eliminate the high cost frequently reading from deep level. I just can't think of other causes..

rjl493456442 · 2024-02-02T02:14:56Z

I do agree with your analysis.

I believe pebbledb has cache to eliminate the high cost frequently reading from deep level

Pebble does have the cache for recently loaded block data, but the read still need to traverse the levels from top to bottom, especially all the level0 files need to be traversed one by one.

Although we do configure the bloom filter to mitigate the cost, but I guess it's still an expensive operation, (it's way more expensive than I imagine tbh).

rjl493456442 · 2024-02-06T03:07:57Z

Will close it as the fix is merged. Feel free to reopen if anything unexpected occurs again.

windycrypto added the type:bug label Feb 1, 2024

rjl493456442 self-assigned this Feb 1, 2024

rjl493456442 mentioned this issue Feb 1, 2024

core: cache transaction indexing tail in memory #28908

Merged

rjl493456442 closed this as completed Feb 6, 2024

bnoieh mentioned this issue Aug 31, 2024

feat: cache lastTail in txindexer to avoid read from db bnb-chain/op-geth#157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

txIndexer.report() method uses too much cpu on sepolia network #28907

txIndexer.report() method uses too much cpu on sepolia network #28907

windycrypto commented Feb 1, 2024 •

edited

Loading

holiman commented Feb 1, 2024

rjl493456442 commented Feb 1, 2024

rjl493456442 commented Feb 1, 2024

windycrypto commented Feb 1, 2024 •

edited

Loading

windycrypto commented Feb 1, 2024 •

edited

Loading

rjl493456442 commented Feb 2, 2024 •

edited

Loading

rjl493456442 commented Feb 6, 2024

txIndexer.report() method uses too much cpu on sepolia network #28907

txIndexer.report() method uses too much cpu on sepolia network #28907

Comments

windycrypto commented Feb 1, 2024 • edited Loading

System information

Steps to reproduce the behaviour

node startup options

Update

holiman commented Feb 1, 2024

rjl493456442 commented Feb 1, 2024

rjl493456442 commented Feb 1, 2024

windycrypto commented Feb 1, 2024 • edited Loading

windycrypto commented Feb 1, 2024 • edited Loading

rjl493456442 commented Feb 2, 2024 • edited Loading

rjl493456442 commented Feb 6, 2024

windycrypto commented Feb 1, 2024 •

edited

Loading

windycrypto commented Feb 1, 2024 •

edited

Loading

windycrypto commented Feb 1, 2024 •

edited

Loading

rjl493456442 commented Feb 2, 2024 •

edited

Loading