Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Subnet tests failed after finality timer exceeded #248

Open
dnkolegov opened this issue Aug 27, 2022 · 2 comments
Open

Subnet tests failed after finality timer exceeded #248

dnkolegov opened this issue Aug 27, 2022 · 2 comments
Labels
bug Something isn't working Hierarchical (B)

Comments

@dnkolegov
Copy link
Collaborator

dnkolegov commented Aug 27, 2022

Sometimes tests for subnets failed with the following artifacts:

Multiple actor not found records.

2022-08-27T16:09:56.228Z	WARN	rpc	go-jsonrpc@v0.1.5/handler.go:279	error in RPC call to 'Filecoin.SubnetStateGetActor': resolution lookup failed (t1ddvt45g3xpgo3z6exmiuscfgb5vead7j5qqvtna):
    github.com/filecoin-project/lotus/chain/state.(*StateTree).GetActor
        /home/circleci/project/chain/state/statetree.go:362
  - resolve address t1ddvt45g3xpgo3z6exmiuscfgb5vead7j5qqvtna:
    github.com/filecoin-project/lotus/chain/state.(*StateTree).lookupIDinternal
        /home/circleci/project/chain/state/statetree.go:327
  - actor not found

Multiple previous checkpoint not consistent with previous check committed records:

2022-08-27T16:09:53.220Z	INFO	mir-manager	mir/logger.go:29	PBFT: Done with segment.	{"info": ["epoch",3,"instance",0]}
2022-08-27T16:09:53.220Z	WARN	vm	vm/vm.go:543	Send actor error	{"from": "t1n6subb3h5yar4vv5th2ho4kjdrbq4b7tdtwa6nq", "to": "t01002", "nonce": 4, "method": "5", "height": "101", "error": "previous checkpoint not consistent with previous check committed (RetCode=16):\n    github.com/filecoin-project/lotus/chain/consensus/hierarchical/actors/subnet.(*SubnetState).verifyCheck\n        /home/circleci/project/chain/consensus/hierarchical/actors/subnet/subnet_actor.go:254"}
2022-08-27T16:09:53.220Z	WARN	vm	vm/runtime.go:350	Abortf: previous checkpoint not consistent with previous check committed
2022-08-27T16:09:53.220Z	INFO	mir-manager	mir/logger.go:29	PBFT: Done with segment.	{"info": ["epoch",3,"instance",0]}
2022-08-27T16:09:53.220Z	INFO	mir-manager	mir/logger.go:29	PBFT: Done with segment.	{"info": ["epoch",3,"instance",1]}
2022-08-27T16:09:53.220Z	WARN	vm	vm/runtime.go:161	LegacyVM.Call failure in call from: t01001 to t01002: previous checkpoint not consistent with previous check committed (RetCode=16):
    github.com/filecoin-project/lotus/chain/consensus/hierarchical/actors/subnet.(*SubnetState).verifyCheck
        /home/circleci/project/chain/consensus/hierarchical/actors/subnet/subnet_actor.go:254
2022-08-27T16:09:53.220Z	WARN	vm	vm/vm.go:543	Send actor error	{"from": "t1n6subb3h5yar4vv5th2ho4kjdrbq4b7tdtwa6nq", "to": "t01002", "nonce": 4, "method": "5", "height": "101", "error": "previous checkpoint not consistent with previous check committed (RetCode=16):\n    github.com/filecoin-project/lotus/chain/consensus/hierarchical/actors/subnet.(*SubnetState).verifyCheck\n        /home/circleci/project/chain/consensus/hierarchical/actors/subnet/subnet_actor.go:254"}

The result after 40 minutes:

bafy2bzaceczfowk7bvbk4upmt6vmkx6s42uszjatv57rui6tbpybysfqyjtfe	{"miner": "/root:twa6nq", "epoch": "2478"}
    eudico_hc_test.go:615: 
        	Error Trace:	eudico_hc_test.go:615
        	Error:      	Received unexpected error:
        	            	finality timer exceeded:
        	            	    github.com/filecoin-project/lotus/itests/kit.WaitSubnetActorBalance
        	            	        /home/circleci/project/itests/kit/tools_eudico.go:75
        	Test:       	TestHC_TwoNodesTestsWithMirConsensus//root/mir-/subnet/mir/testBasicFlowOnTwoNodes
    eudico_hc_test.go:353: [*] defer: cancelling test context
2022-08-27T16:29:42.212Z	INFO	mir-consensus	mir/mine.go:66	Mir miner completed	{"addr": "t1q4e5bykxiq3cd4miifahxkai3s253urn266cq3a"}
    eudico_hc_test.go:422: [*] miner A in root net stopped
2022-08-27T16:29:42.212Z	INFO	mir-consensus	mir/mine.go:66	Mir miner completed	{"addr": "t1n6subb3h5yar4vv5th2ho4kjdrbq4b7tdtwa6nq"}
    eudico_hc_test.go:436: [*] miner B in root net stopped
    eudico_hc_test.go:345: [*] stopping test ensemble
    eudico_hc_test.go:349: [*] ensemble stopped

The full log is here - https://app.circleci.com/pipelines/github/filecoin-project/eudico/503/workflows/9082ed8e-888c-4a35-9ef4-4035be2f2cde/jobs/6168/tests

@adlrocha Let's try to catch this bug.

@adlrocha adlrocha added Hierarchical (B) bug Something isn't working labels Aug 29, 2022
@adlrocha
Copy link
Collaborator

Do we know after what code change this bug was introduced, or it just happens sometimes?

The actor error happens because the account actor for the user hasn't been initialized in the subnet, this can be either because the fund message (or the cross-net message assigning funds) to it hasn't been proposed successfully in a block.

The checkpoint error may also be related, and it usually happens when the local view of a peer in the checkpoint process is not in sync with the state on-chain due to a checkpoint not being committed successfully (if this is the reason, it is definitely a bug. The checkpoint protocol should be able to recover from checkpoint not being successfully committed, etc.).

@dnkolegov
Copy link
Collaborator Author

dnkolegov commented Aug 29, 2022

It has been happening from time to time

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working Hierarchical (B)
Projects
None yet
Development

No branches or pull requests

2 participants