Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Await actor alarm deletion cache flush outputGate when alarm succeeds #1601

Merged
merged 1 commit into from
Feb 5, 2024

Conversation

jqmmes
Copy link
Contributor

@jqmmes jqmmes commented Feb 1, 2024

When an alarm handler successfully completes its execution, the alarm is marked ready for deletion in actor-cache (unless it's been updated since the start of handler execution). This alarm change will only be reflected in storage, once cache flushes and changes are written.

If we have access to actor-cache, we must await for the alarm deletion or update to flush(), meaning the changes were written to storage before we return the alarm result.

If we don't await, it's possible for alarm manager to pull the wrong alarm value (the same alarm that just completed) from storage before these changes are actually made, rerunning it, when it shouldn't.

@jqmmes jqmmes requested review from a team as code owners February 1, 2024 15:39
@jqmmes jqmmes requested review from kentonv and hoodmane February 1, 2024 15:39
// same alarm that just completed) from CRDB before these changes are actually made,
// rerunning it, when it shouldn't.
KJ_IF_SOME(persistent, actor.getPersistent()) {
KJ_IF_SOME(pendingFlush, persistent.onNoPendingFlush()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe be waiting on the output gate, instead of directly on underlying storage? They are usually the same thing but if there's a case where they don't match, I would hope that breaking the output gate will prevent an alarm from being treated as successful.

Copy link
Contributor Author

@jqmmes jqmmes Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you're suggesting to await for actor->getOutputgate() instead of actor-cache::onNoPendingFlush(), since the output gate, to prevent situation where the output gate is not the same, correct?
That makes sense, since the onNoPendingFlush will return a promise that is output gate locked, if there's a flush pending.

@jqmmes jqmmes force-pushed the joaquim/await-alarm-deletion-output-gate branch from c277944 to 7808de9 Compare February 5, 2024 16:00
@jqmmes jqmmes force-pushed the joaquim/await-alarm-deletion-output-gate branch from 7808de9 to a2cb92a Compare February 5, 2024 16:00
@jqmmes jqmmes merged commit 9d9608b into main Feb 5, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants