undo action error type could be more explicit [incomplete] #283
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Following up on #138: this PR makes a few changes:
UndoActionError
(which was an enum with one variant) to a simpler struct representing only that case. I decided while working on this that the generality of an enum didn't seem worth it.UndoActionPermanentError
The goal here is to make it more obvious to authors of undo action functions that it's rather a big deal to return an error from an undo action and they should only do that for permanent errors. This isn't a change in behavior.
This is the sort of change where we'll definitely want to convert important consumers before committing to this (i.e., landing this PR). I've started converting Omicron, and I'm honestly not sure if this is worthwhile. There are over 50 undo actions, and each may have a few code paths that need adjustments, and I'm not sure enough about this change to go do that.
Some design notes about this:
impl From<anyhow::Error> for UndoActionPermanentError
, figuring that any existing consumers are already producinganyhow::Error
. This helps, but in practice, most of the undo actions in Omicron are not directly producinganyhow::Error
, but rather producingomicron_common::api::external::Error
, which impl'sstd::error::Error
, for whichanyhow::Error
implsFrom
. So in practice, quite a lot of call sites need to be updated anyway.From<StdError> for UndoActionPermanentError
, but this would preclude it from impl'ingStdError
directly, which means we can't usethiserror
and can't use it like many other kinds of errors. (This isn't a non-starter, though.)ActionError
and then uses?
doesn't need to do anything because that can be converted toUndoActionPermanentError
. This is convenient for the various helper functions that Steno provides that produceActionError
(likelookup()
).We'd want to check that this change doesn't break compatibility with previously-serialized saga logs (or else that we don't care about that).
There are plenty of bigger questions here, too, like: should Steno first-class the idea of retryable failures? I want to defer this discussion from this PR, unless we feel it invalidates this step. There are a lot of tricky questions around doing this. (What policy do we use? How does Steno tell whether an error is retryable or not? How does it report on transient failures?) I think we can best approach this by prototyping with a helper function (i.e., in Omicron) that we use to wrap undo actions. If we get to the point where we've got a stable abstraction that's a clear win and it makes sense to put it into Steno, then we can do it.