Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add replay phase to JournalFailureExceptions in typed #32557

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

leviramsey
Copy link
Contributor

onRecoveryFailure can occur and log its message in three ways:

  • failed to apply event to state
  • recovery completed handler failed
  • the journal reported a failure

The second one, in particular, is difficult to distinguish from the third (both will log the message without regard to a particular event), which is unfortunate since they're very different problems (the second is a problem in the recovery completed handler, the third is either infra (incl. journal plugin) or serialization).

Copy link
Member

@johanandren johanandren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, seems useful

case NonFatal(cause) =>
onRecoveryFailure(cause, None)
case ReplayMessagesFailure(cause) =>
onRecoveryFailure(cause, Some(response), "failure-from-journal")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this threw previously, that would lead to a new call to onRecoveryFailure, but maybe it makes more sense that it doesn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't throw since we have:

setup.onSignal(state.state, RecoveryFailed(cause), catchAndLog = true)

but much other code there as well, which hopefully doesn't throw. Should we guard against that with an additional catch inside onRecoveryFailure?

It should always end with:

throw new JournalFailureException(msg, cause)

} catch {
case ex: UnstashException[_] =>
// let supervisor handle it, don't treat as recovery failure
throw ex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is correct that unstash exceptions can only ever happen in onRecoveryComplete.

Copy link
Member

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good

} catch {
case NonFatal(ex) =>
state = state.copy(repr.sequenceNr)
onRecoveryFailure(ex, eventForErrorReporting.toOption, "replaying-event")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, onRecoveryFailure will always end with:

throw new JournalFailureException(msg, cause)

and then we caught that again in the outer catch, and called onRecoveryFailure.

Doesn't make sense with that double handling, so this should be fine.

case NonFatal(cause) =>
onRecoveryFailure(cause, None)
case ReplayMessagesFailure(cause) =>
onRecoveryFailure(cause, Some(response), "failure-from-journal")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't throw since we have:

setup.onSignal(state.state, RecoveryFailed(cause), catchAndLog = true)

but much other code there as well, which hopefully doesn't throw. Should we guard against that with an additional catch inside onRecoveryFailure?

It should always end with:

throw new JournalFailureException(msg, cause)

@leviramsey leviramsey merged commit 508de29 into akka:main Oct 14, 2024
5 checks passed
@leviramsey leviramsey deleted the persistence-recovery-failures branch October 14, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants