Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to start graph syncer actor panic #297

Open
gpBlockchain opened this issue Oct 31, 2024 · 1 comment · May be fixed by #301
Open

Failed to start graph syncer actor panic #297

gpBlockchain opened this issue Oct 31, 2024 · 1 comment · May be fixed by #301

Comments

@gpBlockchain
Copy link

When node 1 is connecting to node 2, there is a chance that node 1 will panic when node 2 stop.

  2024-10-31T03:41:31.089148Z DEBUG fnn::fiber::network: Auto announcing our node to peer PeerId(QmS8y8sAoF7DH89st7fquUVW9Y1W3cgcnPgWjPe6tcm1dw) (message: NodeAnnouncement { signature: Some(EcdsaSignature(304402200e642919ae0c2c981208c27745f78ba7b2679236bdf42f61324b170e19f4d1860220673f288f5972fae0dd71ca478ba8861eae6bf2c7644b335e870cdb07982b8f73)), features: 0, version: 1730346082486, node_id: Pubkey(PublicKey(f034c78a975fda47fc895f5cb17a44331e455ba7630c2ea2d0c050983eb814c35514771f971e93c778fd1cd9bfd5957c947dfd1cf023415d3af7008c82d2140e)), alias: AnnouncedNodeName(), addresses: ["/ip4/127.0.0.1/tcp/8228/p2p/QmY8dEogQ4GK2KwZLmrtnJLULAuWuE8yPDqzegMjD7LmM7"], chain_hash: Hash256(0x0000000000000000000000000000000000000000000000000000000000000000), auto_accept_min_ckb_funding_amount: 16200000000, udt_cfg_infos: UdtCfgInfos([UdtArgInfo { name: "RUSD", script: UdtScript { code_hash: H256 ( [ 0x11, 0x42, 0x75, 0x5a, 0x04, 0x4b, 0xf2, 0xee, 0x35, 0x8c, 0xba, 0x9f, 0x2d, 0xa1, 0x87, 0xce, 0x92, 0x8c, 0x91, 0xcd, 0x4d, 0xc8, 0x69, 0x2d, 0xed, 0x03, 0x37, 0xef, 0xa6, 0x77, 0xd2, 0x1a ] ), hash_type: Type, args: "0x878fcc6f1f08d48e87bb1c3b3d5083f23f8a39c5d5c764f253b55b998526439b" }, auto_accept_amount: Some(10000000000), cell_deps: [UdtCellDep { dep_type: Code, tx_hash: H256 ( [ 0xed, 0x7d, 0x65, 0xb9, 0xad, 0x3d, 0x99, 0x65, 0x7e, 0x37, 0xc4, 0x28, 0x5d, 0x58, 0x5f, 0xea, 0x8a, 0x5f, 0xca, 0xf5, 0x81, 0x65, 0xd5, 0x4d, 0xac, 0xf9, 0x02, 0x43, 0xf9, 0x11, 0x54, 0x8b ] ), index: 0 }] }]) })
    at src/fiber/network.rs:3325
    in ractor::actor::Actor with id: "0.2", name: "Network QmY8dEogQ4GK2KwZLmrtnJLULAuWuE8yPDqzegMjD7LmM7"

  2024-10-31T03:41:31.092668Z DEBUG fnn::fiber::network: Adding peer to dynamic syncing peers list: peer PeerId(QmS8y8sAoF7DH89st7fquUVW9Y1W3cgcnPgWjPe6tcm1dw), succeeded syncing 0, failed syncing 1, pinned syncing peers 0
    at src/fiber/network.rs:2421
    in ractor::actor::Actor with id: "0.2", name: "Network QmY8dEogQ4GK2KwZLmrtnJLULAuWuE8yPDqzegMjD7LmM7"

thread 'tokio-runtime-worker' panicked at /Users/guopenglin/demo2/fiber/src/fiber/network.rs:2445:14:
Failed to start graph syncer actor: ActorAlreadyRegistered("Graph syncer to QmS8y8sAoF7DH89st7fquUVW9Y1W3cgcnPgWjPe6tcm1dw")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
@contrun
Copy link
Collaborator

contrun commented Oct 31, 2024

The log here is incomplete. Can you paste the whole log here?

My hunch is that there is a race condition between quitting the old graph syncer on peer disconnect and starting a new graph syncer. This is because, currently we are exiting the graph syncer by the following procedures.

  1. tentacle tells network actor peer has disconnected.
  2. Network actor tells graph syncer peer has disconnected.
  3. Graph syncer confirms that peer disconnected, and sends a syncing status back to the network actor.
  4. Network actor stops the graph syncer.

Step 2, 3, 4 can be running concurrently with a new peer connection. So we may start a new graph syncer with the same name to the older one, thus a panic.

In fact step 1 (tentacle notifies network actor a peer has disconnected) may run concurrently with the event that tentacle notifies the same peer is connecting. This is because tentacle does not guarantee anything about the order of different session.

I think we need to

  1. Make the name of graph syncer unique even for the same peer.
  2. Shorten the process of stopping graph syncer. We can just stop the graph syncer actor on peer disconnected.

@contrun contrun linked a pull request Nov 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants