Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce xsnap version on chain start #7012

Closed
mhofman opened this issue Feb 16, 2023 · 5 comments · Fixed by #9618
Closed

Enforce xsnap version on chain start #7012

mhofman opened this issue Feb 16, 2023 · 5 comments · Fixed by #9618
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE xsnap the XS execution tool

Comments

@mhofman
Copy link
Member

mhofman commented Feb 16, 2023

What is the Problem Being Solved?

If an upgrade contains an xsnap-worker minor version change (the kind that is snapshot and JS execution compatible), the validator must build and use this new version. See #6361

If they don't their node will likely fall off consensus at some point. While our update instructions include yarn install && yarn build, which will rebuild XS, the chain process would likely not fail early enough if those steps were not performed, resulting in a hard to recover state.

Description of the Design

Include a check in when swingset or cosmic-swingset starts that the xsnap version is the expected one.

This check should be by-passable so that we can manually use other XS revisions that are supposed to be consensus matching.

Likely related to #6596

Security Considerations

This would ensure all nodes will run as expected after an upgrade, guaranteeing the health of the chain.

Scaling Considerations

Test Plan

@mhofman mhofman added enhancement New feature or request SwingSet package: SwingSet cosmic-swingset package: cosmic-swingset xsnap the XS execution tool labels Feb 16, 2023
@ivanlei ivanlei added the vaults_triage DO NOT USE label Feb 21, 2023
@ivanlei ivanlei added this to the Vaults EVP milestone Feb 21, 2023
@warner
Copy link
Member

warner commented Apr 19, 2023

@mhofman What were you thinking we could use as a version number/string? If the goal is to detect when agoric-sdk has been updated but packages/xsnap has not been recompiled, then it sounds like we need to put something in the compiled binary, which can either be reported with e.g. xsnap --version, or returned as part of some special netstring-pipe command. And then compared against.. something we check into the packages/xsnap/ JS code?

@warner
Copy link
Member

warner commented Apr 24, 2023

@ivanlei and I figure that it'd be nice to have this, but it's not such a huge footgun that we really need it in this release. Validators have multiple ways to fail to recompile everything, but failing to run yarn build is not as big as failing to run make (to rebuild the golang code). So we're kicking this out of the release.

@warner warner removed this from the Vaults EVP milestone Apr 24, 2023
@mhofman
Copy link
Member Author

mhofman commented Apr 24, 2023

Given that we have stricter consensus checks now (snapshot hashes, and soon computrons per delivery), and that a validator can recover by using state-sync, I agree this is not really needed anymore.

@warner
Copy link
Member

warner commented Apr 25, 2023

Note to self: xsCommon.h has a triplet of version constants:

https://github.com/agoric-labs/moddable/blob/428864a754c48486f42bd43a1fdf3e3de8aaefe6/xs/sources/xsCommon.h#L107-L109

  • 1: these would only represent changes to the upstream XS code, on release boundaries
    • not any changes we might cherry-pick from between releases
    • and not any local changes we apply that don't go into upstream
  • 2: we'd need to export them somehow
    • probably return them from the setBundle command, or some other command that happens very early, and only once per worker

warner added a commit to agoric-labs/xsnap-pub that referenced this issue May 24, 2023
This is a quick hack to embed a distinctive string into the xsnap
binary, for use by the agoric `agd` command, to ensure that xsnap has
actually been recompiled recently. If a validator somehow forgets to
re-compile xsnap, `agd` will notice the mismatch and complain, rather
than proceeding with the wrong executable.

Running `xsnap -n` emits this "network upgrade version" string,
currently `agoric-upgrade-10`.

In the future, we'll update this string to something new (and perhaps
more fine-grained than a once-per-chain-upgrade value) when we make
changes to xsnap.

refs Agoric/agoric-sdk#7012
warner added a commit that referenced this issue May 24, 2023
This is a quick hack to assert that xsnap has been recompiled
recently. `bin/agd` now runs `xsnap -n` to get the "network upgrade
version", and asserts that it matches an expected value of
`agoric-update-10`. If a user has somehow failed to recompile `xsnap`,
their old copy won't understand the `-n` command, and `agd` will fail,
rather than running a stale version (with highly confusing results).

refs #7012
@warner
Copy link
Member

warner commented May 24, 2023

@ivanlei and @arirubinstein and I added a hack just now, in #7831 and agoric-labs/xsnap-pub#41 , to add an xsnap -n command which emits agoric-upgrade-10 (the "network name" of the new chain release, so -n stands for "network upgrade version"), and to have our bin/agd shell script use a relative path (relative to thisdir, a one-level -deep readline of the agd being invoked), to call xsnap -n and insist that output is agoric-upgrade-10.

This is enough to solve today's problem, which is to insist that xsnap has been recompiled recently (at least after the landing of #7831). We'll need to reconsider this more carefully next week, to solve the next problem, which begins when we make any change to XS or xsnap, or when we want to introduce the agoric-upgrade-11 handler into the cosmos/golang code (to facilitate automated upgrade-from-current-mainnet tests, and which really ought to come with a change to agd to look for agoric-upgrade-11 instead of -10). Some things we'd like to improve:

  • xsnap -v already includes the XS version triple, and a string copied in from the package.json version of packages/xsnap (aka @agoric/xsnap). This is pretty close to what we want, so it'd be nice to use it directly somehow.
    • like, maybe we could update agd to expect xsnap -v to match a particular value
    • but, our build/release process (MAINTAINERS.md) updates the package.json versions in one fell swoop, and we'd need to modify bin/agd after running that process and learning the new version, but then does bin/agd get modified in the same commit as the one lerna version creates? Or do we have to add an extra commit, which would mean the sdk-wide agoric-sdk tag wouldn't include the newer agd.
    • and, if we make changes to xsnap-platform.c, they wouldn't be included in the xsnap -v output until after we update packages/xsnap/ and perform a version-bumping release sequence, whereas I'd really prefer to discover the mismatch earlier than that
  • agd now has detailed knowledge about where the xsnap executable lives, which I'd prefer to be hidden within packages/xsnap
    • if agd weren't a shell script, we could add a (JS) API to packages/xsnap which runs xsnap -v and reports back the string
    • maybe packages/xsnap could have a tool at a well-known path, which agd could invoke with node .../packages/xsnap/tools/xsnap-version.js
    • but really, packages/xsnap/src/xsnap.js should be the outer half, not agd
  • frequency of update: ideally we'd like this string to be updated upon any change to XS or xsnap. Incrementing a counter on both sides is simple but manual and prone to being forgotten. In a perfect world this would be driven by git hashes somehow.

warner added a commit to agoric-labs/xsnap-pub that referenced this issue May 24, 2023
This is a quick hack to embed a distinctive string into the xsnap
binary, for use by the agoric `agd` command, to ensure that xsnap has
actually been recompiled recently. If a validator somehow forgets to
re-compile xsnap, `agd` will notice the mismatch and complain, rather
than proceeding with the wrong executable.

Running `xsnap -n` emits this "network upgrade version" string,
currently `agoric-upgrade-10`.

In the future, we'll update this string to something new (and perhaps
more fine-grained than a once-per-chain-upgrade value) when we make
changes to xsnap.

refs Agoric/agoric-sdk#7012
warner added a commit that referenced this issue May 24, 2023
This is a quick hack to assert that xsnap has been recompiled
recently. `bin/agd` now runs `xsnap -n` to get the "network upgrade
version", and asserts that it matches an expected value of
`agoric-update-10`. If a user has somehow failed to recompile `xsnap`,
their old copy won't understand the `-n` command, and `agd` will fail,
rather than running a stale version (with highly confusing results).

refs #7012
warner added a commit that referenced this issue May 24, 2023
This is a quick hack to assert that xsnap has been recompiled
recently. `bin/agd` now runs `xsnap -n` to get the "network upgrade
version", and asserts that it matches an expected value of
`agoric-update-10`. If a user has somehow failed to recompile `xsnap`,
their old copy won't understand the `-n` command, and `agd` will fail,
rather than running a stale version (with highly confusing results).

refs #7012
warner added a commit that referenced this issue May 24, 2023
This is a quick hack to assert that xsnap has been recompiled
recently. `bin/agd` now runs `xsnap -n` to get the "network upgrade
version", and asserts that it matches an expected value of
`agoric-update-10`. If a user has somehow failed to recompile `xsnap`,
their old copy won't understand the `-n` command, and `agd` will fail,
rather than running a stale version (with highly confusing results).

refs #7012
@mergify mergify bot closed this as completed in #9618 Jul 1, 2024
@mergify mergify bot closed this as completed in 78b6fec Jul 1, 2024
mhofman pushed a commit that referenced this issue Jul 1, 2024
closes: #9614
closes: #7012
refs: agoric-labs/xsnap-pub#49

## Description
Wire a mechanism to force rebuilds of xsnap when some build environments change, or if the xsnap binary is found to be out of date through the embedded xsnap package version.

Moves the `agd build` check to depend on a version check implemented by the xsnap package, which now only depends on the package version that is dynamically inserted instead of an explicit magic string that wasn't getting updated.

#7012 (comment) expressed concerns with relying on the xsnap package version, however:
- any changes to `xsnap-pub` or `moddable` submodules would result in a change to the checked in `build.env` file, which would be detected by `lerna` during the publish process. While a version bump would not apply for contributors of the sdk (aka not every xsnap change results in a version change), it will apply for anyone using published versions, even if the change is in submodules only.
- By having the version check look up the expected version from the `package.json` file directly, we avoid having to modify both  the `package.json` file and the check itself. Only the automatically generated single publish commit will trigger a forced rebuild, and satisfy the check on its own,
- At the same time we remove the dependency of `agd build` onto the the internal structure of xsnap.

### Security Considerations
Forces all validators to use the expected version of xsnap

### Scaling Considerations
Does a few more checks when building xsnap, and causes full rebuilds in some cases where they might not be strictly necessary.

### Documentation Considerations
Should all be transparent

### Testing Considerations
Manually tested by touching the xsnap package version, or reverting just the release binary to a previous copy (or deleting altogether). If the binary is meddled with like this, `agd build` checks will fail, but a manual `yarn build` will fix. I thought this was better than transparently forcing a rebuild in that abnormal situation.

### Upgrade Considerations
Smoother upgrades by validators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE xsnap the XS execution tool
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants