Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state-sync cannot restore payloads larger than 64MB #8325

Closed
mhofman opened this issue Sep 12, 2023 · 5 comments · Fixed by agoric-labs/cosmos-sdk#304 or #8507
Closed

state-sync cannot restore payloads larger than 64MB #8325

mhofman opened this issue Sep 12, 2023 · 5 comments · Fixed by agoric-labs/cosmos-sdk#304 or #8507
Assignees
Labels
agoric-cosmos bug Something isn't working

Comments

@mhofman
Copy link
Member

mhofman commented Sep 12, 2023

Describe the bug

cosmos-sdk hardcodes a maximum payload size of 64MB used when restoring from a state-sync snapshot.

Unfortunately mainnet has an artifact larger than that (the boostrap vat's first transcript span), resulting in the inability to restore from state-sync.

The error is pretty cryptic, simply:

ERR failed to restore snapshot err="extension swingset restore: short buffer"
ERR State sync failed err="state sync aborted" module=statesync

There is no stack trace available being a built-in error, but it's returned from the protobuf io package

To Reproduce

Try to restore from state-sync using agoric-upgrade-11 software

Expected behavior

Restore from state-sync successful.

Platform Environment

agoric-upgrade-11

Additional context

I have verified that locally patching /go/pkg/mod/github.com/agoric-labs/cosmos-sdk@v0.45.11-alpha.agoric.3/snapshots/manager.go to increase the value allows a state-sync restore to succeed.

While this is safe to patch locally as it doesn't affect consensus, releasing as a hotfix is more difficult because of the risk of cosmos package version sneaking into JS errors which are in consensus.

@mhofman mhofman added bug Something isn't working agoric-cosmos labels Sep 12, 2023
@mhofman
Copy link
Member Author

mhofman commented Sep 12, 2023

We need to pick a number for a higher limit. For reference, we have similar size prefixed constraints in our netstring implementation between Swingset and xsnap, and the limit is 1GB - 1 (9 numerical digits). I would want at least 256 MB in case we do end up with larger artifacts down the road for some reason. The upper constraint is mainly a memory allocation one I believe.

Testing plan: extend the cosmos-sdk unit tests of state-sync to attempt the restore of 2 payloads, one just below the increased limit, one right over.

@mhofman mhofman self-assigned this Sep 12, 2023
@mhofman
Copy link
Member Author

mhofman commented Sep 12, 2023

It looks like protobuf only allocates up to the needed length so it should be safe to have this number at a higher limit.

@mhofman
Copy link
Member Author

mhofman commented Sep 13, 2023

Reopening until cosmos-sdk is updated, assigning to @JimLarson

@mhofman
Copy link
Member Author

mhofman commented Oct 5, 2023

Closing in favor of #8223 which will pick up the new Cosmos SDK tag.

@mhofman mhofman closed this as completed Oct 5, 2023
@mhofman mhofman assigned mhofman and unassigned JimLarson Oct 5, 2023
@mhofman mhofman reopened this Nov 7, 2023
@mhofman
Copy link
Member Author

mhofman commented Nov 7, 2023

We cannot wait until #8223 after all, so re-opening to track update of cosmos-sdk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agoric-cosmos bug Something isn't working
Projects
None yet
2 participants