-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: errors per module [WPB-14354] #791
Conversation
Currently there are errors when running
~10 other errors. The set of failing cases changes each time I run the test command, and running these cases individually always seems to produce a passing output. So I'm going to mark this down to flakiness and not worry too much about fixing them. |
f966cbb
to
a14bad0
Compare
a14bad0
to
fc14d7b
Compare
Looks like the flaky cases have vanished after this rebase; presumably the work done in the interim made the difference. |
a62ed73
to
0971126
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #791 +/- ##
==========================================
- Coverage 75.15% 74.21% -0.94%
==========================================
Files 107 116 +9
Lines 19599 20729 +1130
==========================================
+ Hits 14729 15384 +655
- Misses 4870 5345 +475
Continue to review full report in Codecov by Sentry.
|
fc70892
to
7f3296d
Compare
By always returning a local error type, we open the door to removing the centralized error types entirely in favor of a more specific error stack. Note that this commit introduces a deprecated wrapper aroud `CryptoError` within `e2e_identity::error::Error`. This will be removed shortly, but it is necessary as an interim step, so that we don't have to rewrite actually everything, all at once. Note also that this affects some files within `mls`. This is required in order to keep everything buildable.
f8a756b
to
3dedc9e
Compare
.mls_client() | ||
.await | ||
.map_err(RecursiveError::root("getting mls client"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This repeated in quite a few places. Are we ever getting calling mls_client()
with another intention than "getting mls client"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question also applies for mls_provider()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but I don't see a better way to structure this.
The .mls_client()
method (and mls_provider()
as well) returns a root error: core_crypto::Error
, implemented in crypto/src/error/mod.rs
. But we call that method from all over the place, and we need to create all kinds of different errors from that. In this case what the function actually returns is a conversation error: core_crypto::mls::conversation::Error
, defined in crypto/src/mls/conversation/error.rs
.
You can see for yourself what's going on here: RecursiveError::root
returns a function which wraps the root error and adds the context line, and then the ?
operator uses the From<RecursiveError>
implementation (from the #[from]
annotation) to actually convert the error. But the point of the context field is to distinguish the cases. Quite likely, this "getting mls client" error is not the top-level error. It's common to get a MLS client and provider very near each other, and both operations will end up being mapped with RecursiveError::root
. Having this context allows us to distinguish which of those operations failed.
Maybe you're thinking that the answer would be to have the mls_client
method do its own wrapping: instead of returning Result<Client, core_crypto::Error>
, it might return Result<Client, E> where RecursiveError: Into<E>
. That would allow us to inline the context at least, avoid explicitly specifying that we're getting the mls client each time we get the mls client. Unfortunately, there are problems with that approach:
- It breaks the pattern that each group of module always returns the internal error type for that group of modules. Now we're returning something based on this internal error type, but we can't say exactly what until we know the context of our caller.
- It forces us to wrap everything everywhere in
RecursiveError
. That makes it more difficult, not less, to determine what the internal error type is. - It breaks the
?
operator. The reason for this is that this defines the returned type in terms ofInto
. But the?
operator also applies anInto
conversion. So desugaring everything looks like:But Rust can't infer what typelet client = match self.mls_client() { Ok(client) => client, Err(e) => return e.into(), };
e
should have there, because any number of types might potentially satisfy bothinto
conditions.
Maybe I'm missing something. But I do think that this somewhat-repetitive pattern is the best option we have for now.
(But it's worth mentioning that avoiding this kind of mess is a big driver of my desire to do a big refactoring that we can talk about next month! I don't have the domain knowledge yet to come into that week with a plan for what precisely we should do. But my hope is by the end of the week we will have designed a structure that lets avoid RecursiveError
entirely.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, impressive work :)
I'm wondering if we can find a better name for |
I would drop the two empty commits: to avoid confusing whoever comes across them later. |
…al error type Note that there are a number of test failures as of this commit, as the error types change. I am intentionally permitting this, with the intent to fix them all up once we have arrived at the final state of the error stacks.
Also eliminates module-internal references to the `CryptoError` type. Intentionally does not yet remove the `CryptoError` type, because it is likely that I've missed something in different build configurations than I am testing by default, so I want to keep it around until I have more confidence that it is really obsolete.
Also get rid of the entirely superfluous documentation requirements. Error variants are simple, and have explicit documentation anyway in the form of the display implementation.
…m various modules Certain error variants were duplicated across a variety of module-specific errors. This could lead to problems in case we wanted a special-case handler for one of those error variants. This commit moves leaf error types into a common error type which can be extracted from module-specific errors.
Three main objectives in this commit: - introduce a small module hierarchy for errors, so that it is possible to better navigate through the error types - reduce code duplication by factoring out common patterns - remove unnecessary error dynamism by utilizing existing error kind enums We're finally beginning to approach a situation I would consider satisfactory with regard to error handling: - There is a top-level `error` module containing general-purpose crate-level errors - It also contains specialty errors with which we can construct general-purpose wrapped error stacks - Each code module group (i.e. folder) also contains its own internal error type, which contains unique leaf errors and sufficient information to wrap other crate error types. I still need to analyze and eliminate leaf types which are never constructed, fix the tests, and get rid of the legacy `CryptoError` entirely, but finally the end of this work is in sight.
At this point the next failing case appears to be related to logic errors, not type errors. Which is unfortunate, as none of the logic was supposed to change.
Though running `cargo nextest run -p core-crypto --no-fail-fast` always produces approx 10 failing tests, going through that list is not profitable. The failing cases fall into two buckets: - `mls::conversation::leaf_node_validation::tests::stages::should_validate_leaf_node_when_adding`, which also appears to fail on `main` - the rest of the errors. This set varies each time I run a test job, and each case I've tried passes when run in isolation. So I have to assume that there is some kind of concurrency problem in the test suite, which is outside the scope of this PR. As such I am deciding to move on to broader testing instead of continuing to debug these test cases.
The error-handling status quo in `core_crypto` is wildly different now, so we needed to change quite a lot about how `core_crypto_ffi` used those errors. This commit causes `core-crypto-ffi` to build properly. Weirdly, this was _much_ simpler in wasm than doing so for non-wasm. I'll take it though!
These are _very sensitive_ to Cargo conditions though, on my machine at least. So now that they've passed once locally I am going to very carefully set them on the shelf and not breathe on them.
Basic error reporting and debug representations of things hide `Box`es. It's also not trivial to downcast into them, particularly in the context of a macro like this! Non-trivial does not mean impossible, though. Adding this capability to the macro means that we can avoid an ugly workaround in the test suite.
80ae45a
to
e93cee6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a heroic effort. Well done! 🍷
What's new in this PR
Refactors
core-crypto
such that each logical module defines its own internal error type, and adds context whenever wrapping another module's error type.core_crypto::e2e_identity
core_crypto::mls::client
core_crypto::mls::conversation
core_crypto::mls::credential
core_crypto::mls
core_crypto
Logical modules are assumed to be the set of peer modules at a particular level in the directory tree.
PR notes
Verbose
{context, source}
variantsThere's quite a lot of manual coding here, and the usage pattern is ok, but it could be improved. I put together a macro library (
context-err
) a while ago which is intended to improve the situation, but I don't recommend it for production use yet. Right now the macro is very verbose, but still restrictive: it doesn't allow for boxing or'static str
contexts. We should think about whether we want to take this more in that direction.General-purpose variants which should be collapsed
There are at least a few general-purpose
{context, source}
variants which could usefully be collapsed into more specific variants. I haven't yet done the analysis to identify all of these. In general the rule I will use is, if there are two or fewer distinct contexts in use, then they should just be variants of their own. If there are three or more, we should use the more general form.Overall factoring
I've factored down the error types as finely as I think I reasonably can. Usage is relatively straightforward, and we end up with nice error stacks with a lot of context.
The downside of this is that when you actually get into the weeds of it, the error model in this crate is now insanely complex. Lots of simple pieces attaching to each other and converting between each other in non-simple ways.
I think this is the best we can do for now, but if we decide to do a Big Refactor after the Whiteboard Week in January, then we should definitely refactor and simplify the error model at that time as well.
PR Submission Checklist for internal contributors
SQPIT-764
feat(conversation-list): Sort conversations by most emojis in the title #SQPIT-764
.