Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

match incref/decref calls within kernel #3264

Closed
warner opened this issue Jun 7, 2021 · 3 comments · Fixed by #3270
Closed

match incref/decref calls within kernel #3264

warner opened this issue Jun 7, 2021 · 3 comments · Fixed by #3270
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Jun 7, 2021

While working through unit tests for #3109, I noticed that we weren't consistently performing matched incref/decref calls in all places within the kernel.

If we trace a typical message, it begines life in the kernel-side handler for syscall.send, which routes to kernelSyscall.js: doSend():

  • slots are increfed in doSend() before the message is passed to kernelKeeper.addToRunQueue
  • later, when the send event is pulled off the run-queue by kk.getNextMsg, the slots are not decrefed at that time
  • the event is classified by processQueueMessage
    • for send events, the slots are decrefed in processQueueMessage before being sent to deliverToTarget
    • create-vat events do not have any slots, so nothing needs to be done for them
    • notify events name a promise, however I don't think we need to hold a reference to it (and in fact we probably shouldn't, since it might have been resolved and delivered already, and thus we might have been able to delete it already)
    • GC actions have slots but should specifically not cause refcounts to be retained
  • deliverToTarget breaks the target down into several cases:
    • targets which are objects, or promises resolved to a single object, route to deliverToVat
    • targets which are orphaned objects (exports of a terminated vat) drop the message and provoke resolveToError on their result promise, if any
    • targets which are rejected promises provoke resolveToError
    • targets which are resolved to non-objects provoke resolveToError
    • targets which are unresolved promises whose decider enabled pipelining use deliverToVat
    • unresolved promises whose decider did not enable pipelining, or without a decider, use kk.addMessageToPromiseQueue to move the message to the kernel promise table's queue entry
  • when a promise is resolved in kernel.js: doResolve, kk.getResolveablePromise returns the complete state of the promise (including the queued messages)
    • we incref the resolution data's slots, then call kk.resolveKernelPromise is used to replace the saved state with the resolution data
    • then kernel.js: notifySubscribersAndQueue is called to push a notify event for each subscriber, and call kernelSyscallHandler.send() for each queued message
      • this delegates to doSend, so it increfs the message slots just as if a vat did a brand new syscall.send
  • when a resolved promise becomes unreferenced, it is deleted with a call to kernelKeeper.js: deleteKernelPromise() from within processRefcounts(), which uses deleteKernelPromiseState() to delete all the relevant keys. This does not change the refcounts of the resolution data's slots.

The first mismatch I'm looking at is a message which gets queued on an unresolved promise. We decref its slots when pulling it off the run-queue, but do not increment them when adding to the promise queue. We do not decref them when removing from the promise queue, but we do incref them when adding the message back to the run-queue. For promises that get resolved, the final refcount is correct, but they'll be too low while the message is on the promise queue, and are thus vulnerable to being dropped early.

The second mismatch is in the promise's resolution data. This is incremented during doResolve, and decremented in processRefcounts just before calling deleteKernelPromise. I think the incref/decrefs are matched and sound, but it feels dangerous to have the two calls happen in different places: I kind of want both increment and decrement to happen in kernelKeeper.js.

@warner warner added bug Something isn't working SwingSet package: SwingSet labels Jun 7, 2021
@warner warner added this to the Testnet: Stress Test Phase milestone Jun 7, 2021
@warner warner self-assigned this Jun 7, 2021
@warner
Copy link
Member Author

warner commented Jun 7, 2021

cc @FUDCo

@warner
Copy link
Member Author

warner commented Jun 7, 2021

For messages, here's one approach:

  • decref the slots when we pull the event off the run-queue (in kernelKeeper.getNextMsg)
  • incref them when adding it to either the run-queue (in kernelKeeper.addToRunQueue) or a promise-queue (in kernelKeeper.addMessageToPromiseQueue)
  • decref them when removing all messages from the promise queue (currently done in kernel.js: notifySubscribersAndQueue, but maybe we should add a kernelKeeper.drainMessagesFromPromiseQueue iterator that also decrefs)
  • incref them when adding back to the run-queue just like we currently do in notifySubscribersAndQueue / kernelSyscallHandler.send

I like the idea of strictly managing the refcounts from within kernelKeeper.js, that feels far more likely to remain correct under maintenance.

One downside is the DB churn when we pull an item off the run-queue only to push it onto a promise-queue, and likewise when promise-queue messages are transferred right back to the run-queue. If/when we implement promise forwarding (resolving a promise to a different promise), we might do more of the same, depending upon the implementation and the ordering properties we choose to provide.

So an alternative approach would avoid the churn when moving messages, and instead aim to incref/decref only upon the creation or destruction of messages:

  • incref during syscall.send
  • decref in deliverToTarget, but only if we call deliverToVat or resolveToError
    • if we call kk.addMessageToPromiseQueue, leave the refcounts alone
    • if we recurse back into deliverToTarget, delegate the decision to the recursive call
  • notifySubscribersAndQueue must not do a new incref, so it must join the control flow of syscall.send/kernelSyscallHandler.send/doSend somewhere after the incref it does

This would result in less down-then-up refcount churn, but would distribute the incref/decref calls out through more code, which feels less reliable to me.

@warner
Copy link
Member Author

warner commented Jun 8, 2021

@FUDCo had a great idea: add a method to kernelKeeper that moves all the promise-queued messages to the run-queue, leaving their refcounts intact. That will reduce half the churn.

warner added a commit that referenced this issue Jun 8, 2021
Add more type assertions to kernelKeeper.addMessageToPromiseQueue(), which is
defined to take a Message (with method, result, and args capdata).

test-state was giving it the wrong things (run-queue events, which include
both 'send' for messages, and 'notify' for promise resolution notifications).
Promise queue entries are only ever message sends, never anything else. I
clean up test-state to only queue messages, and to track a few more values
that will be meaningful when the refcounts and queue handling change soon.

I also changed the resolution tests to use properly-allocated objects, rather
than arbitrarily-chosen object references. This doesn't matter now, but when
the upcoming refcount improvements are made,
`kernelKeeper.resolveKernelPromise` will need the capdata to reference real
objects with real refcounts that can be modified.

refs #3264 (prep/cleanup)
warner added a commit that referenced this issue Jun 8, 2021
As described in #3264, we were not maintaining correct reference counts when
messages get queued to an unresolved promise.

In the new approach, the lifetime of a message is:

* all krefs (target, args, and optional result) of a message are increfed
  during `doSend`, as before
  * this adds a `type: 'send'` event to the run-queue
* when the `send` event is pulled off the run-queue, all krefs are
  decrefed (as before)
* if the event is delivered to a vat, great
* if the event is delivered to a promise,
  `kernelKeeper.addMessageToPromiseQueue` increfs all krefs
  * (this is new: previously `addMessageToPromiseQueue` did not change the
    refcounts)
* later, if/when the promise is resolved, the messages are transferred
  laterally from the promise queue to `send` events on the run-queue, without
  changing their refcounts
  * (this is new: previously the transfer was done with `doSend`, which
    incremented the refcounts)
  * the counts are decremented when the `send` event is removed from the
    run-queue, as usual

The result is the same number of increments and decrements as before, but the
increment is done earlier, so messages on a promise queue will maintain a
refcount on all their krefs (target, args, and any result). This protects
objects and promises which would otherwise have been eligible for collection
while they sat on the promise queue.

Strictly speaking we don't need to maintain a refcount on the target (which
is also kind of redundant: everything on the queue for promise `kp1`
obviously has `target: 'kp1'`), since that will always be additionally held
by the c-list of the decider (at least). But by making all promise queues do
the same thing as the main run-queue, we can laterally transfer messages from
promise- to run-queue without DB churn (decrementing then immediately
incrementing the counts).

This change moves the responsibility for the lateral transfer from
`notifySubscribersAndQueue` in kernel.js to
`kernelKeeper.resolveKernelPromise()`, deleting the former entirely and
merging the subscription loop into `doResolve`.

closes #3264
warner added a commit that referenced this issue Jun 8, 2021
As described in #3264, we were not maintaining correct reference counts when
messages get queued to an unresolved promise.

In the new approach, the lifetime of a message is:

* all krefs (target, args, and optional result) of a message are increfed
  during `doSend`, as before
  * this adds a `type: 'send'` event to the run-queue
* when the `send` event is pulled off the run-queue, all krefs are
  decrefed (as before)
* if the event is delivered to a vat, great
* if the event is delivered to a promise,
  `kernelKeeper.addMessageToPromiseQueue` increfs all krefs
  * (this is new: previously `addMessageToPromiseQueue` did not change the
    refcounts)
* later, if/when the promise is resolved, the messages are transferred
  laterally from the promise queue to `send` events on the run-queue, without
  changing their refcounts
  * (this is new: previously the transfer was done with `doSend`, which
    incremented the refcounts)
  * the counts are decremented when the `send` event is removed from the
    run-queue, as usual

The result is the same number of increments and decrements as before, but the
increment is done earlier, so messages on a promise queue will maintain a
refcount on all their krefs (target, args, and any result). This protects
objects and promises which would otherwise have been eligible for collection
while they sat on the promise queue.

Strictly speaking we don't need to maintain a refcount on the target (which
is also kind of redundant: everything on the queue for promise `kp1`
obviously has `target: 'kp1'`), since that will always be additionally held
by the c-list of the decider (at least). But by making all promise queues do
the same thing as the main run-queue, we can laterally transfer messages from
promise- to run-queue without DB churn (decrementing then immediately
incrementing the counts).

This change moves the responsibility for the lateral transfer from
`notifySubscribersAndQueue` in kernel.js to
`kernelKeeper.resolveKernelPromise()`, deleting the former entirely and
merging the subscription loop into `doResolve`.

closes #3264
warner added a commit that referenced this issue Jun 9, 2021
As described in #3264, we were not maintaining correct reference counts when
messages get queued to an unresolved promise.

In the new approach, the lifetime of a message is:

* all krefs (target, args, and optional result) of a message are increfed
  during `doSend`, as before
  * this adds a `type: 'send'` event to the run-queue
* when the `send` event is pulled off the run-queue, all krefs are
  decrefed (as before)
* if the event is delivered to a vat, great
* if the event is delivered to a promise,
  `kernelKeeper.addMessageToPromiseQueue` increfs all krefs
  * (this is new: previously `addMessageToPromiseQueue` did not change the
    refcounts)
* later, if/when the promise is resolved, the messages are transferred
  laterally from the promise queue to `send` events on the run-queue, without
  changing their refcounts
  * (this is new: previously the transfer was done with `doSend`, which
    incremented the refcounts)
  * the counts are decremented when the `send` event is removed from the
    run-queue, as usual

The result is the same number of increments and decrements as before, but the
increment is done earlier, so messages on a promise queue will maintain a
refcount on all their krefs (target, args, and any result). This protects
objects and promises which would otherwise have been eligible for collection
while they sat on the promise queue.

Strictly speaking we don't need to maintain a refcount on the target (which
is also kind of redundant: everything on the queue for promise `kp1`
obviously has `target: 'kp1'`), since that will always be additionally held
by the c-list of the decider (at least). But by making all promise queues do
the same thing as the main run-queue, we can laterally transfer messages from
promise- to run-queue without DB churn (decrementing then immediately
incrementing the counts).

This change moves the responsibility for the lateral transfer from
`notifySubscribersAndQueue` in kernel.js to
`kernelKeeper.resolveKernelPromise()`, deleting the former entirely and
merging the subscription loop into `doResolve`.

closes #3264
warner added a commit that referenced this issue Jun 9, 2021
As described in #3264, we were not maintaining correct reference counts when
messages get queued to an unresolved promise.

In the new approach, the lifetime of a message is:

* all krefs (target, args, and optional result) of a message are increfed
  during `doSend`, as before
  * this adds a `type: 'send'` event to the run-queue
* when the `send` event is pulled off the run-queue, all krefs are
  decrefed (as before)
* if the event is delivered to a vat, great
* if the event is delivered to a promise,
  `kernelKeeper.addMessageToPromiseQueue` increfs all krefs
  * (this is new: previously `addMessageToPromiseQueue` did not change the
    refcounts)
* later, if/when the promise is resolved, the messages are transferred
  laterally from the promise queue to `send` events on the run-queue, without
  changing their refcounts
  * (this is new: previously the transfer was done with `doSend`, which
    incremented the refcounts)
  * the counts are decremented when the `send` event is removed from the
    run-queue, as usual

The result is the same number of increments and decrements as before, but the
increment is done earlier, so messages on a promise queue will maintain a
refcount on all their krefs (target, args, and any result). This protects
objects and promises which would otherwise have been eligible for collection
while they sat on the promise queue.

Strictly speaking we don't need to maintain a refcount on the target (which
is also kind of redundant: everything on the queue for promise `kp1`
obviously has `target: 'kp1'`), since that will always be additionally held
by the c-list of the decider (at least). But by making all promise queues do
the same thing as the main run-queue, we can laterally transfer messages from
promise- to run-queue without DB churn (decrementing then immediately
incrementing the counts).

This change moves the responsibility for the lateral transfer from
`notifySubscribersAndQueue` in kernel.js to
`kernelKeeper.resolveKernelPromise()`, deleting the former entirely and
merging the subscription loop into `doResolve`.

closes #3264
warner added a commit that referenced this issue Jun 9, 2021
Add more type assertions to kernelKeeper.addMessageToPromiseQueue(), which is
defined to take a Message (with method, result, and args capdata).

test-state was giving it the wrong things (run-queue events, which include
both 'send' for messages, and 'notify' for promise resolution notifications).
Promise queue entries are only ever message sends, never anything else. I
cleaned up test-state to only queue messages, and to track a few more values
that will be meaningful when the refcounts and queue handling change soon.

I also changed the resolution tests to use properly-allocated objects, rather
than arbitrarily-chosen object references. This doesn't matter now, but when
the upcoming refcount improvements are made,
`kernelKeeper.resolveKernelPromise` will need the capdata to reference real
objects with real refcounts that can be modified.

refs #3264 (prep/cleanup)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant