Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WASM files to state sync #816

Closed
wants to merge 1 commit into from
Closed

Add WASM files to state sync #816

wants to merge 1 commit into from

Conversation

assafmo
Copy link
Contributor

@assafmo assafmo commented Apr 25, 2022

This helps state sync to track the WASM files too, as they're not in the IAVL.

This helps state sync to track the WASM files too, as they're not in IAVL
@assafmo assafmo requested a review from alpe as a code owner April 25, 2022 19:19
@@ -682,6 +682,11 @@ func NewWasmApp(
app.scopedIBCKeeper = scopedIBCKeeper
app.scopedTransferKeeper = scopedTransferKeeper
app.scopedWasmKeeper = scopedWasmKeeper

app.SnapshotManager().RegisterExtensions(
wasm.NewWasmSnapshotter(filepath.Join(wasmDir, "wasm", "state", "wasm")),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably a nicer way of doing this, but honestly this directory structure is buried so deep inside https://github.com/CosmWasm/cosmwasm/blob/be4d94a8d73ceab359226b8ea0a5b4be032a07e6/packages/vm/src/cache.rs#L105 so maybe it's better to leave it that way for now. Just a thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that top level wasm comes from wasmd from long ago.
I have thought of changing it, but it would be a very painful breaking change to all running nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, notice wasmDir := filepath.Join(homePath, "wasm") on line 448 or so. That already contains the first wasm.

It should be filepath.Join(wasmDir, "state", "wasm")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file structure is private and does not provide any stability guarantees. There is not even a guarantee Wasm is stored as file on disk in the next version of cosmwasm-vm. The file system should not be accessed from wasmd. Instead there is GetCode and Create in wasmvm to load and store a Wasm blob.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another "wasm" along the way

wasmer, err := wasmvm.NewVM(filepath.Join(homeDir, "wasm"), supportedFeatures, contractMemoryLimit, wasmConfig.ContractDebugMode, wasmConfig.MemoryCacheSize)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file system should not be accessed from wasmd. Instead there is GetCode and Create in wasmvm to load and store a Wasm blob.

This is a good point, we can iterate over the values in the iavl tree rather than filenames. That is cleaner. And it will also "force" the node restoring from snapshotting to do the precompiling of the wasm files upon syncing, not later, providing more stable execution times later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds legit!

Copy link
Member

@ethanfrey ethanfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.

It looks solid in general. Added a few comments, also an idea how to make a simple unit test just copying arbitrary files based on names between two dirs. That would be a nice sanity check before manual testing.

What do you think of a custom SnapshotItem payload format (not just appending bytes)?

@@ -682,6 +682,11 @@ func NewWasmApp(
app.scopedIBCKeeper = scopedIBCKeeper
app.scopedTransferKeeper = scopedTransferKeeper
app.scopedWasmKeeper = scopedWasmKeeper

app.SnapshotManager().RegisterExtensions(
wasm.NewWasmSnapshotter(filepath.Join(wasmDir, "wasm", "state", "wasm")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, notice wasmDir := filepath.Join(homePath, "wasm") on line 448 or so. That already contains the first wasm.

It should be filepath.Join(wasmDir, "state", "wasm")

return err
}

// In case snapshotting needs to be deterministic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah although I'm still not sure whether that's needed.

return err
}

// snapshotItem is 64 bytes of the file name, then the actual WASM bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works. But maybe we want a real (eg. more future-proof) format.

Eg. like a Protobuf object with two fields and a type. Especially if we ever want to sync anything else

return nil
}

func (ws *WasmSnapshotter) Restore(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do a unit test with some simple data, by feeding the output of snapshot into restore (or another instance with different directory) and ensuring the proper files are copied verbatim (and misnamed files ignored)?


func (ws *WasmSnapshotter) Restore(
height uint64, format uint32, protoReader protoio.Reader,
) (snapshot.SnapshotItem, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to read the entire data and return nothing ever. Why does it return (snapshot.SnapshotItem, error) rather than error? (eg. Is there something I/we are missing?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not your design, but wonder why it is there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I took this from the sdk tests. Not sure if the return value goes anywhere here, but I can confirm that this works without any issue (tested manually on Secret). Also might be irrelevant for this case as we're dealing with files and not IAVL.

Copy link
Contributor

@jhernandezb jhernandezb Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this logic is in a loop where it seems like its the extension responsibility to handle the next SnapshotItem to the next one, which doesn't make sense because a plugin can break another one without knowing.

See https://github.com/cosmos/cosmos-sdk/blob/6f070623741fe0d6851d79ada41e6e2b1c67e236/snapshots/manager.go#L333

My guess is that is possible that in the protobuf stream there might be more items and if the format is not the one that wasmd can understand then it should return it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that is possible that in the protobuf stream there might be more items and if the format is not the one that wasmd can understand then it should return it.

Yes, very good idea. It seems like we should not error on some unknown extension, but just return it, as there may in the future be multiple extensions than need to cooperate somehow. Not sure how this all works together.

Maybe @yihuang could clarify how we should handle this properly

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, very good idea. It seems like we should not error on some unknown extension, but just return it, as there may in the future be multiple extensions than need to cooperate somehow. Not sure how this all works together.

exactly, we should return the item if it's unknown to us.

Copy link

@yihuang yihuang Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which doesn't make sense because a plugin can break another one without knowing.

yeah, that's possible unfortunately, maybe we can create some helpers at module side to retain the last item automatically, without changing the api in cosmos-sdk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I got it... we either end on EOF, or when we hit something where GetExtensionPayload() returns nil (another type we don't expect)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which doesn't make sense because a plugin can break another one without knowing.

yeah, that's possible unfortunately, maybe we can create some helpers at module side to retain the last item automatically, without changing the api in cosmos-sdk.


// ExtensionPayloadReader only reads the payloads of current extension.
type ExtensionPayloadReader struct {
	protoReader protoio.Reader
	LastItem    types.SnapshotItem
}

func NewExtensionPayloadReader(protoReader protoio.Reader) *ExtensionPayloadReader {
	return &ExtensionPayloadReader{
		protoReader: protoReader,
		LastItem:    types.SnapshotItem{},
	}
}

// ReadPayload reads a payload item from the stream.
func (epr *ExtensionPayloadReader) ReadPayload() ([]byte, error) {
	epr.LastItem = types.SnapshotItem{}
	if err := epr.protoReader.ReadMsg(&epr.LastItem); err != nil {
		return nil, err
	}
	payload := epr.LastItem.GetExtensionPayload()
	if payload == nil {
		// end of payload stream of current extension
		return nil, io.EOF
	}
	return payload.Payload, nil
}

I think of sth like this could help, the manager pass the ExtensionPayloadReader extension's Restore method, but it's api breaking, so maybe change this in next version.

Copy link
Member

@webmaster128 webmaster128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Assaf

@@ -682,6 +682,11 @@ func NewWasmApp(
app.scopedIBCKeeper = scopedIBCKeeper
app.scopedTransferKeeper = scopedTransferKeeper
app.scopedWasmKeeper = scopedWasmKeeper

app.SnapshotManager().RegisterExtensions(
wasm.NewWasmSnapshotter(filepath.Join(wasmDir, "wasm", "state", "wasm")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file structure is private and does not provide any stability guarantees. There is not even a guarantee Wasm is stored as file on disk in the next version of cosmwasm-vm. The file system should not be accessed from wasmd. Instead there is GetCode and Create in wasmvm to load and store a Wasm blob.

var wasmFileNameRegex = regexp.MustCompile(`^[a-f0-9]{64}$`)

func (ws *WasmSnapshotter) Snapshot(height uint64, protoWriter protoio.Writer) error {
wasmFiles, err := ioutil.ReadDir(ws.wasmDirectory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasmFiles, i.e. the list of checksums needs to come from the main Tendermint database. Otherwise a node can inject all kind of data here bypassing consensus.

return []uint32{1}
}

var wasmFileNameRegex = regexp.MustCompile(`^[a-f0-9]{64}$`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those file names can change any time. I'm actually thinking about adding a .wasm file extension for node operator friendlyness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, but for a given wasmd version, there will be only one storage format (and realistically, changing this will be a major state-breaking change needing an internal wasmvm migration, so unlikely to happen, or at least very infrequently).

If it does change, we make a new wasmd (eg. 2.0) with snapshot format 2. Upon receiving that the data is parsed in a different way (eg the new internal format). The snapshotter itself (they one reading), will just need to implement the new format when we break this internal storage format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I do agree with using the GetCode and Create calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's better to do it through the proper API, and let future migrations less to worry about WRT snapshotting.

Copy link
Contributor Author

@assafmo assafmo Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also "solves" the wasm/wasm/state/wasm issue here.

@assafmo
Copy link
Contributor Author

assafmo commented Apr 25, 2022

@webmaster128 is right, it's better to go through the existing API instead of accessing the filesystem directly.

@webmaster128
Copy link
Member

@webmaster128 is right, it's better to go through the existing API instead of accessing the filesystem directly.

Thank you. Let me know if you need additional APIs or more advances features than the two mentioned above. But those are the two basic operations.

Comment on lines +98 to +100
snapshotItem := append([]byte(wasmFile.Name()), wasmBytes...)

snapshot.WriteExtensionItem(protoWriter, snapshotItem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered compressing the Wasm before transfer? Or is this taken care of at a different level of the state sync stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't consider it, not sure if it's a problem. The wasm files are usually far smaller than the rest of the blockchain's state.

It can be a nice addition though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure if it's taken care of by the sdk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, no priority or blocker for sure. I was just curious since Wasm blobs can be compressed nicely. But we can always make that an optional extension later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting observation nonetheless. 💪

Copy link
Contributor

@jhernandezb jhernandezb Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasmd already gzips wasm blobs when you submit a tx or through governance.

if wasmUtils.IsWasm(wasm) {

func GzipIt(input []byte) ([]byte, error) {

and then uncompressed when the keeper.Create is called

func uncompress(src []byte, limit uint64) ([]byte, error) {

maybe some of this can be reused it really reduces the size but it will depend on each chain on how many contracts they have. Although as previously mentioned smaller than the actual chain data.

@giansalex
Copy link
Contributor

giansalex commented Apr 26, 2022

I was working on this feature, but tests are pending... I have updated with master branch, maybe it can be used as a reference.

Snapshot:

  • Iterate codes
  • Remove duplicate hash
  • Save CodeHash, WasmBytes using SnapshotWasmItem

Restore:

  • Get codehash, wasmbytes from SnapshotPayload
  • Create wasm using vm
  • Verify hash

https://github.com/disperze/wasmd/blob/state-sync-sdk-45.2/x/wasm/keeper/snapshot.go
[ full diff ]

@webmaster128
Copy link
Member

@giansalex Thank you! Could you add your work as a second PR to provide better visibility and being able to comment inline?

Copy link
Member

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and all the good comments here. Nice collaboration!
I think @assafmo solution was a very pragmatic approach and automates what people already do with sharing zipped wasm folders.
I completely agree with @webmaster128 suggestions for decoupling and a more resilient solution that uses existing apis. Upstream implementations may change and can break this easily otherwise.
Proper tests can be built with our mock

@giansalex Please do a draft PR, so that we can comment already. 🙏
I peeked at the code and it seems to address all the raised points. Good work

@ethanfrey
Copy link
Member

I got a anon review via DM:

When restoring all the wasm files, we must ensure to tell the wasmvm to pin the proper codes.

If we use the Create method, this will just be one more check on CodeInfo and possibly another call to wasmvm

@alpe
Copy link
Member

alpe commented Apr 26, 2022

InitializePinnedCodes can be used to pin all codes again.

@assafmo
Copy link
Contributor Author

assafmo commented Apr 26, 2022

I got a anon review via DM:

When restoring all the wasm files, we must ensure to tell the wasmvm to pin the proper codes.

If we use the Create method, this will just be one more check on CodeInfo and possibly another call to wasmvm

I'm not sure it that's necessary. Codes info are already stored in the IAVL.

@webmaster128
Copy link
Member

webmaster128 commented Apr 26, 2022

I'm not sure it that's necessary. Codes info are already stored in the IAVL.

InitializePinnedCodes is basically the answer to that DM request. It's really about lifting contracts into cache after a node restart or state sync.

@tomtau
Copy link

tomtau commented Apr 27, 2022

copy-pasting from Discord:

fyi we have this simple test on CI for state sync: https://github.com/crypto-org-chain/chain-main/blob/master/integration_tests/test_basic.py#L96 via this tool: https://github.com/crypto-com/pystarport ... so I imagine a test on CI for wasmd may be an extension of it: create a network, deploy a contract there, create a new node with state sync to join the network, submit a tx to invoke the contract-related functionality, assert that query responses from the state sync'ed and the original nodes are the same

@ethanfrey
Copy link
Member

@assafmo Very helpful PR showing how to implement such an extension.

There have been a number of wonderful comments explaining how to improve the process, and add more features (pinning, compression, etc).

I would like to get a version of this in incorporating all the input. Do you want to update this PR with all feedback? Or shall I write one? I'm happy either day. I would try to write it or review it tomorrow (Thursday) to get this in soon, and let people test manually on some tag.

@assafmo
Copy link
Contributor Author

assafmo commented Apr 27, 2022

I'm not sure if I can get to it that soon, so go for it, no ego here. ❤️
BTW I'm not sure that pinning is necessary here.

@ethanfrey ethanfrey mentioned this pull request Apr 27, 2022
Copy link
Member

@ethanfrey ethanfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on this, I have made a new PR that attempts to incorporate all the discussion.

Please check out #823

It should be complete now, "just" missing tests, but good for a first review.


func (ws *WasmSnapshotter) Restore(
height uint64, format uint32, protoReader protoio.Reader,
) (snapshot.SnapshotItem, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I got it... we either end on EOF, or when we hit something where GetExtensionPayload() returns nil (another type we don't expect)

@ethanfrey
Copy link
Member

I was working on this feature, but tests are pending... I have updated with master branch, maybe it can be used as a reference.

@giansalex sorry I didn't see your code earlier. I was looking at open PRs... even draft.

I will look now, definitely would have saved me time yesterday and some things I can learn from it still

@assafmo
Copy link
Contributor Author

assafmo commented Apr 28, 2022

Closing in favor of #823.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants