Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: markets - restarting during Publish stage causes piece CID mismatch at AddPiece #7815

Closed
1 task done
dirkmc opened this issue Dec 17, 2021 · 3 comments
Closed
1 task done
Labels
area/markets/provider area/markets Area: Markets kind/bug Kind: Bug P1 P1: Must be resolved
Milestone

Comments

@dirkmc
Copy link
Contributor

dirkmc commented Dec 17, 2021

Lotus component

  • lotus miner/market - storage deal

Lotus Version

lotus-miner at commit hash a4728d3

Describe the Bug

Similar to #7577

Restarting markets process at the Publish stage causes AddPiece failed: got unexpected piece CID: ...

Repo Steps

  1. Make a deal
  2. Wait for it to get into the publish state
  3. Restart the Market node (or the miner if no MRA)
  4. Trigger deal publishing
  5. See if the deal is added to the sector (on master it will fail with commp mismatch)

Markets:

2021-12-17T15:37:01.890+0100    WARN    providerstates  providerstates/provider_states.go:561   deal bafyreiekx5n5go6z2zknrgwzi3rzeaxwt4uum7otndwwlfmbnqbcm57zuy failed: handing off deal to node: packing piece baga6ea4seaqp4hnfnt3khmcea7zl2f4v42w26fzlkxonqckypd3cgsnu23g2ski: AddPiece failed: got unexpected piece CID: expected:baga6ea4seaqp4hnfnt3khmcea7zl2f4v42w26fzlkxonqckypd3cgsnu23g2ski, got:baga6ea4seaqfctcdlq6qju2juu3f7pkz77drgyurcf4fteobupctv4rapf2buly
2021-12-17T15:37:01.902+0100    INFO    markets loggers/loggers.go:20   storage provider event  {"name": "ProviderEventFailed", "proposal CID": "bafyreiekx5n5go6z2zknrgwzi3rzeaxwt4uum7otndwwlfmbnqbcm57zuy", "state": "StorageDealError", "message": "handing off deal to node: packing piece baga6ea4seaqp4hnfnt3khmcea7zl2f4v42w26fzlkxonqckypd3cgsnu23g2ski: AddPiece failed: got unexpected piece CID: expected:baga6ea4seaqp4hnfnt3khmcea7zl2f4v42w26fzlkxonqckypd3cgsnu23g2ski, got:baga6ea4seaqfctcdlq6qju2juu3f7pkz77drgyurcf4fteobupctv4rapf2buly"}

Miner:

2021-12-17T15:37:01.890+0100    WARN    sectors storage-sealing/fsm.go:622      sector 2566 got error event sealing.SectorAddPieceFailed: got unexpected piece CID: expected:baga6ea4seaqp4hnfnt3khmcea7zl2f4v42w26fzlkxonqckypd3cgsnu23g2ski, got:baga6ea4seaqfctcdlq6qju2juu3f7pkz77drgyurcf4fteobupctv4rapf2buly
2021-12-17T15:37:01.890+0100    WARN    rpc     go-jsonrpc@v0.1.5/handler.go:279        error in RPC call to 'Filecoin.SectorAddPieceToAny': got unexpected piece CID: expected:baga6ea4seaqp4hnfnt3khmcea7zl2f4v42w26fzlkxonqckypd3cgsnu23g2ski, got:baga6ea4seaqfctcdlq6qju2juu3f7pkz77drgyurcf4fteobupctv4rapf2buly:
    github.com/filecoin-project/lotus/extern/storage-sealing.(*Sealing).handleAddPiece
        /home/magik6k/github.com/filecoin-project/go-lotus/extern/storage-sealing/input.go:230

At Publish stage:

$ cp .lotusminer/deal-staging/fstmp1108837364 fs-before

$ ls .lotusminer/deal-staging/fstmp1108837364 -lh
-rw------- 1 root root 71M Dec 17 16:31 .lotusminer/deal-staging/fstmp1108837364

$ sha256sum .lotusminer/deal-staging/fstmp1108837364
b0748f8f53d1f630254114d74d977ea32a92ed26dcc1afaa4f00b9269382fcc6  .lotusminer/deal-staging/fstmp1108837364

After restart:

$ ls .lotusminer/deal-staging/fstmp1108837364 -l
-rw------- 1 root root 73697647 Dec 17 16:32 .lotusminer/deal-staging/fstmp1108837364

$ sha256sum .lotusminer/deal-staging/fstmp1108837364
4168c3f5da8e54559c6e103c159bca5870c4e097e14e356f9491cd2e1d0168af  .lotusminer/deal-staging/fstmp1108837364

Diff of the files

@magik6k
Copy link
Contributor

magik6k commented Dec 17, 2021

The issue appears to be caused by reopening the car with OpenReadWrite
2021-12-17-170519_3838x2158_scrot

@magik6k magik6k added this to the v1.13.3 milestone Dec 17, 2021
@dirkmc
Copy link
Contributor Author

dirkmc commented Dec 17, 2021

I believe the problem may be that we're opening a CAR file for read when it's already open for read/write.

We open the CAR file for read/write on restart:
https://github.com/filecoin-project/go-fil-markets/blob/e111ec29d24d02586718dbab465571e0b8220a24/storagemarket/impl/provider.go#L617-L622

	// re-track all deals for whom we still have a local blockstore.
	for _, d := range deals {
		if _, err := os.Stat(d.InboundCAR); err == nil && d.Ref != nil {
			_, _ = p.stores.GetOrOpen(d.ProposalCid.String(), d.InboundCAR, d.Ref.Root)
		}
	}

Then we open it for read when we do the deal hand-off:
https://github.com/filecoin-project/go-fil-markets/blob/e111ec29d24d02586718dbab465571e0b8220a24/storagemarket/impl/providerstates/provider_states.go#L340-L342

		carFilePath = deal.InboundCAR

		v2r, err := environment.ReadCAR(deal.InboundCAR)

Probably when we do deal hand-off we should call stores.GetOrOpen() on the CAR file.

@jennijuju jennijuju modified the milestones: v1.13.3, v1.15.0 Jan 2, 2022
@jennijuju jennijuju added the P1 P1: Must be resolved label Jan 2, 2022
@jennijuju
Copy link
Member

fixed in filecoin-project/go-fil-markets#658

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/markets/provider area/markets Area: Markets kind/bug Kind: Bug P1 P1: Must be resolved
Projects
None yet
Development

No branches or pull requests

3 participants