Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent test failure in vat replay #5266

Closed
warner opened this issue May 2, 2022 · 4 comments
Closed

intermittent test failure in vat replay #5266

warner opened this issue May 2, 2022 · 4 comments
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet test

Comments

@warner
Copy link
Member

warner commented May 2, 2022

https://github.com/Agoric/agoric-sdk/runs/6242581254?check_suite_focus=true#step:5:702

has an intermittent test failure:

anachrophobia strikes vat v2
delivery completed with 2 expected syscalls remaining
expected: {"0":"vatstoreGet","1":"vom.rc.o-50","length":2}
expected: {"0":"vatstoreGetAfter","1":"","2":"vom.ir.o-50|","length":3}
REJECTED from ava test: (Error#14)
Error#14: historical inaccuracy in replay of v2
  at Object.finishReplayDelivery (.../swingset-vat/src/kernel/vat-loader/transcript.js:91:13)
  at replayOneDelivery (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:189:19)
  at async Object.replayTranscript (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:220:1)
  at async ensureVatOnline (.../swingset-vat/src/kernel/vat-warehouse.js:134:1)
  at async Object.start (.../swingset-vat/src/kernel/vat-warehouse.js:177:1)
  at async Object.start (.../swingset-vat/src/kernel/kernel.js:1314:1)
  at async makeSwingsetController (packages/SwingSet/src/controller/controller.js:310:3)
  at async buildVatController (packages/SwingSet/src/controller/controller.js:503:22)
  at async packages/SwingSet/test/vat-admin/terminate/test-terminate.js:187:16

  ✖ vat-admin › terminate › terminate › replay does not resurrect dead vat Rejected promise returned by test

in a PR based on current trunk (fce1034).

@warner warner added bug Something isn't working SwingSet package: SwingSet labels May 2, 2022
@warner
Copy link
Member Author

warner commented May 3, 2022

This appears to be vat-admin (v2) on deliveryNum: 5 receiving an import of the root object (of the soon-to-be-terminated dynamic vat exercised by this test), sending it to the parent vat, and dropping it. Then the dispatch.bringOutYourDead on deliveryNum: 6 performs GC, which should notice the now-finalized Presence object, check that nothing else is referencing the vref, and then informs the kernel.

In most runs (including all the ones I do locally, as I haven't been able to reproduce this), both the original pass and the replay will observe this o-50 Presence get GCed during delivery 6, so they query the virtual refcounts (vom.rc.o-50) and the used-as-a-weak-key status (vom.ir.o-50|), see that neither is used, and then perform a syscall.dropImports and syscall.retireImports for o-50.

In the CI failure, the original pass did the same, but the replay did not. Our direct evidence is that the crank finished without seeing the two expected vatstore calls. Our best guess is that something else (perhaps some AVA logging/debugging mechanism?) retained a reference to the Presence, preventing it from being collected. Or something interfered with V8 doing a complete GC, and the Presence was not collected during the gc() call within bringOutYourDead.

My pet theory is that something about AVA's test parallelism is causing some clever incremental gc() behavior to get interrupted, and V8 doesn't try (or know how to) resume after the interruption, causing gc() to be incomplete, where normally it would be complete. But I have no idea how to test such a theory.

Incidentally, note that GC syscalls are not recorded in the transcript, under the somewhat outdated theory that this would enable us to tolerate GC variation between validators. However, the vatStore syscalls which inform the GC decisions are part of the transcript (and must be, because their return values are critical), so realistically speaking, we should probably go back to recording those syscalls, and enforcing their correct replay.

We've never seen this sort of thing happen under XS, so we're not worried about production systems. So we're going to deprioritize this one.

I'm going to merge this into #3240 , since this seems to be a modern example of that same problem, and #3240 has more pointers to V8 implementation details.

@warner warner closed this as completed May 3, 2022
@warner
Copy link
Member Author

warner commented May 3, 2022

Oh, you know, test-terminate.js has a test.serial on two of the unit tests (including the one that failed), but not on the others, and I think one of our conclusions from #3240 was that maybe you need test.serial on all tests in the file. So I'm going to re-open this one with the task of changing test-terminate.js to be fully serialized.

@warner warner reopened this May 3, 2022
warner added a commit that referenced this issue May 3, 2022
The extra information might help with debugging.

refs #5266
warner added a commit that referenced this issue May 4, 2022
We think GC under Node.js is a bit squirrly in the face of AVA's
parallelism, and moving the intermittently-failing test into its own
file might help

refs #5266
@warner
Copy link
Member Author

warner commented May 4, 2022

Let's apply #5291, which might help (or might not), and then close this in favor of the longer-term keep-an-eye-on-it #3240 or #5202

warner added a commit that referenced this issue May 5, 2022
We think GC under Node.js is a bit squirrly in the face of AVA's
parallelism, and moving the intermittently-failing test into its own
file might help

refs #5266
@warner
Copy link
Member Author

warner commented May 5, 2022

That's landed, closing this in the hopes that it goes away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet test
Projects
None yet
Development

No branches or pull requests

3 participants