Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database corruption during restarts since switching to upstream v1.14.7 #360

Closed
sebastianst opened this issue Aug 13, 2024 · 3 comments · Fixed by #368
Closed

Database corruption during restarts since switching to upstream v1.14.7 #360

sebastianst opened this issue Aug 13, 2024 · 3 comments · Fixed by #368

Comments

@sebastianst
Copy link
Member

sebastianst commented Aug 13, 2024

We've recently merged in upstream v1.14.7 and during a sequencer restart, the following errors popped up:

t=2024-08-13T15:18:31+0000 lvl=info msg="Loaded most recent local block" number=8539128 hash=0x3cef05fd474e7e7dc7eaff71e135604e47985a394a59b8dccd4fe408f3c2888b td=0 age=7s
t=2024-08-13T15:18:31+0000 lvl=info msg="Loaded most recent local finalized block" number=8538465 hash=0x810a9d113721b03303e0daffd8a6e6edb62e7baf6bb1481fee6c586959d4112a td=0 age=22m13s
t=2024-08-13T15:18:31+0000 lvl=warn msg="Head state missing, repairing" number=8539128 hash=0x3cef05fd474e7e7dc7eaff71e135604e47985a394a59b8dccd4fe408f3c2888b snaproot=0xc8380132dab17c437b74e60250d0e854e15af86a97bd3158d79caf6eee78540e
t=2024-08-13T15:18:36+0000 lvl=info msg="Rewound to block with state" number=8460000 hash=0xe2d7f61e7b3f63a8e57662e28a22a0092b03fd6b5ca135ca7e9322fa9b3ca589
t=2024-08-13T15:19:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8534620"
t=2024-08-13T15:20:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8530106"
t=2024-08-13T15:21:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8525615"
t=2024-08-13T15:22:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8521225"
t=2024-08-13T15:23:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8516828"
t=2024-08-13T15:24:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8512200"
t=2024-08-13T15:25:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8507555"
t=2024-08-13T15:26:29+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 8503264"
... (repeating every minute)

So at startup, safe and finalized are reset to genesis. The op-node will then currently walk back to genesis. This was fixed locally by shutting down op-node, and using op-wheel engine rewind --set-head --to 8460000, but the source for this db corruption isn't clear yet.

@sebastianst
Copy link
Member Author

sebastianst commented Aug 14, 2024

It happened again a couple of times. That's a log excerpt from inactive sepolia seq-2.

t=2024-08-14T19:54:49+0000 lvl=info msg="Enabling metrics collection"
t=2024-08-14T19:54:49+0000 lvl=info msg="Enabling stand-alone metrics HTTP endpoint" address=0.0.0.0:6060
t=2024-08-14T19:54:49+0000 lvl=info msg="Starting metrics server" addr=http://0.0.0.0:6060/debug/metrics
t=2024-08-14T19:54:49+0000 lvl=info msg="Maximum peer count" ETH=30 total=30
t=2024-08-14T19:54:49+0000 lvl=info msg="Smartcard socket not found, disabling" err="stat /run/pcscd/pcscd.comm: no such file or directory"
t=2024-08-14T19:54:49+0000 lvl=info msg="Using pebble as db engine"
t=2024-08-14T19:54:49+0000 lvl=info msg="Set global gas cap" cap=50000000
t=2024-08-14T19:54:50+0000 lvl=info msg="Initializing the KZG library" backend=gokzg
t=2024-08-14T19:54:50+0000 lvl=info msg="Allocated trie memory caches" clean="1.20 GiB" dirty="2.00 GiB"
t=2024-08-14T19:54:50+0000 lvl=info msg="Using pebble as the backing database"
t=2024-08-14T19:54:50+0000 lvl=info msg="Allocated cache and file handles" database=/db/geth/chaindata cache="4.00 GiB" handles=524288
t=2024-08-14T19:54:51+0000 lvl=info msg="Opened ancient database" database=/db/geth/chaindata/ancient/chain readonly=false
t=2024-08-14T19:54:51+0000 lvl=info msg="State scheme set by user" scheme=path
t=2024-08-14T19:54:51+0000 lvl=warn msg="Sanitizing invalid node buffer size" provided="2.00 GiB" updated="256.00 MiB"
t=2024-08-14T19:54:51+0000 lvl=info msg="Failed to load journal, discard it" err="unexpected journal version want 1 got 0"
t=2024-08-14T19:54:51+0000 lvl=info msg="Opened ancient database" database=/db/geth/chaindata/ancient/state readonly=false
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=account.index items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=storage.index items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=account.data items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=storage.data items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncating freezer table" database=/db/geth/chaindata/ancient/state table=history.meta items=3889669 limit=3887492
t=2024-08-14T19:54:51+0000 lvl=warn msg="Truncated extra state histories" number=2177
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg=---------------------------------------------------------------------------------------------------------------------------------------------------------
t=2024-08-14T19:55:12+0000 lvl=info msg="Chain ID:  11155420 (OP Sepolia Testnet)"
t=2024-08-14T19:55:12+0000 lvl=info msg="Consensus: Optimism"
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg="Pre-Merge hard forks (block based):"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Homestead:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/homestead.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Tangerine Whistle (EIP 150): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/tangerine-whistle.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Spurious Dragon/1 (EIP 155): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Spurious Dragon/2 (EIP 158): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Byzantium:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/byzantium.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Constantinople:              #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/constantinople.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Petersburg:                  #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/petersburg.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Istanbul:                    #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/istanbul.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Muir Glacier:                #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/muir-glacier.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Berlin:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/berlin.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - London:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/london.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Arrow Glacier:               #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/arrow-glacier.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Gray Glacier:                #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/gray-glacier.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg="Merge configured:"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Hard-fork specification:    https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/paris.md"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Network known to be merged: true"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Total terminal difficulty:  0"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Merge netsplit block:       #0       "
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg="Post-Merge hard forks (timestamp based):"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Shanghai:                    @1699981200 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/shanghai.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Cancun:                      @1708534800 (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/cancun.md)"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Regolith:                    @0         "
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Canyon:                      @1699981200"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Ecotone:                     @1708534800"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Fjord:                       @1716998400"
t=2024-08-14T19:55:12+0000 lvl=info msg=" - Granite:                     @1723478400"
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:12+0000 lvl=info msg=---------------------------------------------------------------------------------------------------------------------------------------------------------
t=2024-08-14T19:55:12+0000 lvl=info msg=""
t=2024-08-14T19:55:13+0000 lvl=info msg="Loaded most recent local block" number=15931368 hash=0x3c42264d97297745ee9fdb8de3583e0abb86b9328b8e5ff1d0be5998e5dc7f9a td=0 age=37s
t=2024-08-14T19:55:13+0000 lvl=info msg="Loaded most recent local finalized block" number=15930652 hash=0xcacd6a6921079b6f87d05e81dbdf53616dc2ae92162817f56bf6ad1843b6f4fb td=0 age=24m29s
t=2024-08-14T19:55:13+0000 lvl=info msg="Loaded last snap-sync pivot marker" number=12041571
t=2024-08-14T19:55:13+0000 lvl=warn msg="Head state missing, repairing" number=15931368 hash=0x3c42264d97297745ee9fdb8de3583e0abb86b9328b8e5ff1d0be5998e5dc7f9a snaproot=0x3bcc4bbbdf638894f82d4dcf10e1654a357266c765344863b94f39b5b18f2574
t=2024-08-14T19:55:31+0000 lvl=info msg="Block state missing, rewinding further" number=15930447 hash=0x389279f3c431ee5b8e53ef16d560538b8df0bc6f83811c013ca5183d3cfed4fa elapsed=18.412s
t=2024-08-14T19:55:49+0000 lvl=info msg="Block state missing, rewinding further" number=15929815 hash=0xe23fda5a618c5a1631a3a5ce430bea581dbf6e93b49a252cfce3f209a6943b7c elapsed=36.009s
t=2024-08-14T19:55:49+0000 lvl=info msg="Rewound to block with state" number=15929063 hash=0xa574d6ddc91d920271ae3e15e61f6e43f61d70ff9a314cfcc9c21653e3a600fb
t=2024-08-14T19:55:51+0000 lvl=error msg="Error in block freeze operation" err="canonical hash missing, can't freeze block 15930535"
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded most recent local block" number=15929063 hash=0xa574d6ddc91d920271ae3e15e61f6e43f61d70ff9a314cfcc9c21653e3a600fb td=0 age=1h18m24s
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded most recent local finalized block" number=15930652 hash=0xcacd6a6921079b6f87d05e81dbdf53616dc2ae92162817f56bf6ad1843b6f4fb td=<nil> age=25m26s
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded last snap-sync pivot marker" number=12041571
t=2024-08-14T19:56:10+0000 lvl=warn msg="Enabling snapshot recovery" chainhead=15929063 diskbase=15930615
t=2024-08-14T19:56:10+0000 lvl=warn msg="Snapshot is not continuous with chain" snaproot=0x839974d8a9d9852588fd32eebfde41e3f07f2d0b4806c953db41a44c22137807 chainroot=0xaef4e98a80d75974e247db4e0a9bb355657d404955416f7652ae49333cf87c55
t=2024-08-14T19:56:10+0000 lvl=info msg="Initialized transaction indexer" range="last 2350000 blocks"
t=2024-08-14T19:56:10+0000 lvl=info msg="Initialising Ethereum protocol" network=11155420 dbversion=8
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded local transaction journal" transactions=951 dropped=0
t=2024-08-14T19:56:10+0000 lvl=info msg="Regenerated local transaction journal" transactions=811 accounts=145
t=2024-08-14T19:56:10+0000 lvl=warn msg="Switch sync mode from snap sync to full sync"
t=2024-08-14T19:56:10+0000 lvl=info msg="Gasprice oracle is ignoring threshold set" threshold=2
t=2024-08-14T19:56:10+0000 lvl=warn msg="Engine API enabled" protocol=eth
t=2024-08-14T19:56:10+0000 lvl=info msg="Starting peer-to-peer node" instance=Geth/v1.101407.0-rc.1-ee83530a-20240808/linux-amd64/go1.22.6
t=2024-08-14T19:56:10+0000 lvl=info msg="New local node record" seq=1719261383223 id=89b03f986538b49bad99202e4429aac50b3870c6f1e14c8e9fe27ac9d5536c71 ip=10.17.64.82 udp=30303 tcp=30304
t=2024-08-14T19:56:10+0000 lvl=info msg="Started P2P networking" self="enode://1a5d373dbc1cf14646d68dc6cd396fea57409e51088a0249ce96c5daeeb5453421f59c6dc3ce67cc60450ea8acc95cd8a44bcd4f4fdc25f6ed55901e7b4756ed@10.17.64.82:30304?discport=30303"
t=2024-08-14T19:56:10+0000 lvl=info msg="IPC endpoint opened" url=/db/geth.ipc
t=2024-08-14T19:56:10+0000 lvl=info msg="Loaded JWT secret file" path=/etc/op-geth/jwt-secret.txt crc32=0x815e9662
t=2024-08-14T19:56:10+0000 lvl=info msg="HTTP server started" endpoint=[::]:8545 auth=false prefix="" cors=* vhosts=*
t=2024-08-14T19:56:10+0000 lvl=info msg="WebSocket enabled" url=ws://[::]:8546
t=2024-08-14T19:56:10+0000 lvl=info msg="WebSocket enabled" url=ws://[::]:8551
t=2024-08-14T19:56:10+0000 lvl=info msg="HTTP server started" endpoint=[::]:8551 auth=true prefix="" cors=localhost vhosts=localhost
t=2024-08-14T19:56:11+0000 lvl=info msg="Indexed transactions" blocks=2305 txs=14023 tail=13579064 elapsed=887.910ms
t=2024-08-14T19:56:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T19:57:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T19:58:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T19:59:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T20:00:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T20:01:51+0000 lvl=error msg="Number of finalized block is missing"
t=2024-08-14T20:02:51+0000 lvl=error msg="Number of finalized block is missing"

@sebastianst
Copy link
Member Author

Related: ethereum/go-ethereum#30119

@sebastianst
Copy link
Member Author

sebastianst commented Aug 22, 2024

Fix implemented in #368 - adds journal v0 to v1 migration code. ethereum/go-ethereum#28940 introduced a new journal format, but didn't add migration code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant