Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

undo action error type could be more explicit [incomplete] #283

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

davepacheco
Copy link
Collaborator

Following up on #138: this PR makes a few changes:

  • Changed UndoActionError (which was an enum with one variant) to a simpler struct representing only that case. I decided while working on this that the generality of an enum didn't seem worth it.
  • Changed the type signature for undo actions to return UndoActionPermanentError

The goal here is to make it more obvious to authors of undo action functions that it's rather a big deal to return an error from an undo action and they should only do that for permanent errors. This isn't a change in behavior.

This is the sort of change where we'll definitely want to convert important consumers before committing to this (i.e., landing this PR). I've started converting Omicron, and I'm honestly not sure if this is worthwhile. There are over 50 undo actions, and each may have a few code paths that need adjustments, and I'm not sure enough about this change to go do that.

Some design notes about this:

  • I think what's most important is that when you write the type signature, the type says that this is a permanent failure.
  • At one point I also thought it would be nice if consumers explicitly had to wrap any error they produced with UndoActionPermanentError, but as I started making such a change mechanically, it seemed like a waste of time to convert existing stuff.
  • As a result, I chose to impl From<anyhow::Error> for UndoActionPermanentError, figuring that any existing consumers are already producing anyhow::Error. This helps, but in practice, most of the undo actions in Omicron are not directly producing anyhow::Error, but rather producing omicron_common::api::external::Error, which impl's std::error::Error, for which anyhow::Error impls From. So in practice, quite a lot of call sites need to be updated anyway.
  • I considered impl'ing From<StdError> for UndoActionPermanentError, but this would preclude it from impl'ing StdError directly, which means we can't use thiserror and can't use it like many other kinds of errors. (This isn't a non-starter, though.)
  • Anything that directly produces an ActionError and then uses ? doesn't need to do anything because that can be converted to UndoActionPermanentError. This is convenient for the various helper functions that Steno provides that produce ActionError (like lookup()).

We'd want to check that this change doesn't break compatibility with previously-serialized saga logs (or else that we don't care about that).


There are plenty of bigger questions here, too, like: should Steno first-class the idea of retryable failures? I want to defer this discussion from this PR, unless we feel it invalidates this step. There are a lot of tricky questions around doing this. (What policy do we use? How does Steno tell whether an error is retryable or not? How does it report on transient failures?) I think we can best approach this by prototyping with a helper function (i.e., in Omicron) that we use to wrap undo actions. If we get to the point where we've got a stable abstraction that's a clear win and it makes sense to put it into Steno, then we can do it.

@davepacheco
Copy link
Collaborator Author

I started updating Omicron in oxidecomputer/omicron#5908.

@davepacheco davepacheco changed the title undo action error type could be more explicit (incomplete) undo action error type could be more explicit [incomplete] Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant