-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Error Chaining in a js-errors or @matrixai/errors package #304
Comments
In our root exception handler, or our |
Error chaining would be useful in a range of areas. Cause right now we are losing error context. One example of this is in the networking when an error event is emitted to the TLS socket, this gets printed out, but it's not clear what this actual error is and where it is truly coming from. |
Here's an example problem from: #321 (comment) In our GRPC However not all errors are going to be
In fact, this could be done for Meaning the resulting exception being thrown might actually be a specific exception like And it's the chain itself dispatched in |
One issue is the serialisation of the cause chain. The metadata doesn't allow any kind of nested structure. However all the metadata properties can be multi-values. This means that one could flatten the chain so that each error property goes into the array to represent the error chain. for (const e of errorChain) {
metadata.add('name', e.name);
metadata.add('message', e.message);
// This needs to be considered as well
if (e.data != null) {
metadata.add('data', JSON.stringify(e.data));
}
} |
Remember that |
I think this could work, and then in const errorName = e.metadata.get('name')[0] as string;
const errorMessage = e.metadata.get('message')[0] as string;
const errorData = e.metadata.get('data')[0] as string; We could just iterate through them all, returning an array of errors rather than a single one. I'm not sure if this would maintain the correct order though. It would then be up to #323 to decide how this array of errors is displayed. |
Yea that should maintain the chain order because the multivalues are iterated in order of insertion. |
Slightly different from the Another way is to use JSON encode for all error properties rather than spreading it into the GRPC metadata. This way we could use |
Will be done here: https://github.com/MatrixAI/js-errors |
From this: #357 (comment) It looks like it would be useful debugging to have a timestamp set for when an exception is constructed. Ideally also when it is thrown, but there's no hook for this in the JS runtime I can find. This is because there's the time when an exception is constructed, an arbitrary delay, when it was thrown, an arbitrary delay, and finally when an error is actually reported using the js-logger. We would need high precision for this, and not just unixtime. We can use |
Being resolved in MatrixAI/js-errors#1 |
Not resolved until |
Integrated into Still to do:
|
I'm noticing that the vaults service handlers aren't using |
We only want to log out server errors on the server side, so we now apply filters on errors before logging them. #304
While I was debugging some issues with the vaults tests I was able to see the new error chain - it was definitely helpful!
I realised though that in this case the original error is intended for the client, not the server, but since it's wrapped in an ErrorPolykeyRemote it doesn't get caught as a client error and gets logged out by the agent. The source of the error is invalid data coming from the client, but since it doesn't get picked up until it reaches a call on a remote agent maybe it could still be relevant to the server? Not sure on this one. |
So this is an error from client to agent 1 to agent 2. A validation error of agent 2 to agent 1. I argue that agent 1 should have validated the data before passing to agent 2. This means it would have been a client error. If agent 2 validates again and says it's wrong, then agent 1 is at fault and that's a server error. |
Agent 1 can't really validate the data, since it's possible to pass in either a Vault Name or a Vault Id. Since Agent 1 doesn't know the names of Agent 2's vaults it can't know whether a random string that is passed in is a valid vault name or not. It's not until it reaches Agent 2 and it can check that the string is not the name of any of its vaults that it can check if it's a valid Vault Id instead. |
Ok so it's more of the case that this vault ID is missing. This is because, we first check if there is a matching vault name on agent 2, and then check if it is a base58 encoded string. If it is not, then it's really just a missing vault. The exception should be changed to state that it is simply a missing vault. In that case, it is a client error, and therefore agent 1 should use the special filter to say that it is in fact a client error. Right now I believe it's just an array of exception classes. I imagine to do this check, you would need to change the array of exception classes to also optionally take a chain. Array<typeof Error | Array<typeof Error>> That way if it is a single error, you just check that, if there are multiple errors, you check it in a chain. isClientError(e, [ErrorOtherClientError, [ErrorPolykeyRemote, ErrorVaultsVaultMissing]]) Then you would first check if it is Alternatively... it maybe better that the service handler on agent 1 does a |
Rather than having to check against an array of errors, it might be better to just iterate through the chain of the error and check each cause against the set of client errors. Otherwise it seems too narrow of a check - if the underlying error is wrapped by more than one ErrorPolykeyRemote then the check would fail. |
Sometimes an error is only considered a client error when part of the chain of another error (for example ErrorPolykeyRemote) so it is now possible to specify this array. #304
The `data` POJO that was originally supplied in the constructor as the second parameter is now part of the `options` parameter and most places in the code have been updated to match this. #304
When any error is thrown as the result of another error occurring, the original error is now contained within the `cause` property of the new error. #304
…or chaining Our gRPC `toError` and `fromError` utils are now able to serialise and deserialise Polykey and non-Polykey errors (as well as non-errors), including the entire error chain if this exists. Also includes the ability to filter out sensitive data, for example when the error is being sent to another agent. Errors sent over the network in this way are now additionally wrapped on the receiving side in an `ErrorPolykeyRemote` to make the source of the error more clear. #304
All of the agent and client service handlers are now passed a shared logger that is used to log errors to stderr. #304
Sometimes an error is only considered a client error when part of the chain of another error (for example ErrorPolykeyRemote) so it is now possible to specify this array. #304
The `data` POJO that was originally supplied in the constructor as the second parameter is now part of the `options` parameter and most places in the code have been updated to match this. #304
When any error is thrown as the result of another error occurring, the original error is now contained within the `cause` property of the new error. #304
…or chaining Our gRPC `toError` and `fromError` utils are now able to serialise and deserialise Polykey and non-Polykey errors (as well as non-errors), including the entire error chain if this exists. Also includes the ability to filter out sensitive data, for example when the error is being sent to another agent. Errors sent over the network in this way are now additionally wrapped on the receiving side in an `ErrorPolykeyRemote` to make the source of the error more clear. #304
All of the agent and client service handlers are now passed a shared logger that is used to log errors to stderr. #304
Sometimes an error is only considered a client error when part of the chain of another error (for example ErrorPolykeyRemote) so it is now possible to specify this array. #304
The `data` POJO that was originally supplied in the constructor as the second parameter is now part of the `options` parameter and most places in the code have been updated to match this. #304
When any error is thrown as the result of another error occurring, the original error is now contained within the `cause` property of the new error. #304
…or chaining Our gRPC `toError` and `fromError` utils are now able to serialise and deserialise Polykey and non-Polykey errors (as well as non-errors), including the entire error chain if this exists. Also includes the ability to filter out sensitive data, for example when the error is being sent to another agent. Errors sent over the network in this way are now additionally wrapped on the receiving side in an `ErrorPolykeyRemote` to make the source of the error more clear. #304
All of the agent and client service handlers are now passed a shared logger that is used to log errors to stderr. #304
Sometimes an error is only considered a client error when part of the chain of another error (for example ErrorPolykeyRemote) so it is now possible to specify this array. #304
It's better to be explicit than implicit in this case though. At any case this is merged into staging now, so we should consider this to be done. Further improvements exists in structured logging. |
This is done when it was merged into staging. Further work in structured logging. |
Specification
Now that
@matrixai/errors
is ready, we can use it to incorporate error chaining into PK. This will involve a series of steps:Chaining
In all places where we are catching an error and then throwing a new error in its place, we need to be including the original error as the
cause
of the new error, e.g.Note that the
cause
should always be an exception/error. If not specified it defaults tounknown
. When a error is thrown, the top-level error should contain the full instances of the errors in its cause chain, as one big nested object. This data structure can be serialised and deserialised recursively, where every error in the chain has a single cause, and that cause is contained as a property within it.It is important to realise that the
replacer
will remove any entry if the returned value isundefined
. Unless it is an array, it will replace that value withnull
. This means that because ourcause
may beundefined
, it may not exist during deserialisation. Any usage offromJSON
must be aware that thecause
property may not exist.Client Server Error Architecture
It's important to realise:
ErrorPolykeyRemote
ErrorPolykeyRemote
should containnodeId: NodeId
,host
,port
and other connection information in its owndata
which can aid debugging which server responded with this errorErrorPolykeyRemote
should also have information about which call triggered this, and perhaps even request information, it can have a lot of information which can add later.data
property ofErrorPolykeyRemote
data
a more specific type likedata: ClientMetadata & POJO
so that it forces thatClientMetadata
must be availabletype ClientMetadata = { nodeId: NodeId; host: Host; port: Port }
. And additional information can be provided later. Like what the call was.ClientMetadata
type should be insrc/types.ts
to avoid import loops, since it has to use theNodeId
type fromnodes
andHost
andPort
from the thenetwork
domains.ErrorPolykeyRemote
together at the top levelsrc
.ErrorX
thatErrorX
may be any kind of error. The server must filter the information it sends to the client to ensure it is not leaking sensitive information. This can include thestack
since thestack
is rarely required by the client.ErrorUnknown
at the root of the JSON data. Any other unknown data should be returned as-is.ErrorPolykeyRemote
, but thecause
may beErrorUnknown
orErrorX
. WhereErrorX
may contain anything after that (as long as the runtime schema checking duringfromJSON
calls work)error
, in the future if we switch to using JSON RPC, this error JSON will probably be encoded as part of the JSON RPC protocol - see https://www.jsonrpc.org/specification#error_object and https://eth.wiki/json-rpc/json-rpc-error-codes-improvement-proposalUNKNOWN
. We use that to represent an "application" error. This basically means we don't use most of GRPC/HTTP2's standard error codes, as our "application layer errors" are on top. This is because GRPC/HTTP2's standard error codes were designed more for the HTTP layer. Although some of the HTTP error codes are still used internally by GRPC.JSON serialisation/deserialisation
Our gRPC
toError
andfromError
utilities will need to be modified to be able to serialise and deserialise the error chain when sending errors between clients/agents or agents/agents.{ type: 'ClassName', data: { ... } }
and also filter out sensitive data (the stack) when errors are being sent to an agent (as opposed to a client) - it works top to bottom, creating a structure first and filtering out unwanted fields afterwardsErrorPolykeyUnknown
, with the unknown data in thedata
fieldname
/message
/data
and this is now just one data structure - this can just be oneerror
fieldstack
and other special data. This would a case by case basis for specific exception classes. That is, thereplacer
will need to execute a filter by checking against specific exception classes and filtering them out. Filter rules should be acquired fromsrc/client
andsrc/agent
.Logging of Errors at the Service Handlers
src/client
andsrc/server
(code duplication is fine here), since each service can define their own set of what is considered a "client error".Reporting the Error to the user at the Root Exception Handler
We have 2 root exception handlers, one at the client and one at the agent. This addresses the client side, however unless things are different, the same applies to the agent side.
On the client side, this is done in
src/bin/polykey.ts
.When an exception is received, we must interpret 3 things:
cause
chainexitCode
data
According to issue #323, the
--format json
doesn't currently affect the STDERR. This needs to be done now because the errors being reported are quite complex, and during testing, we expectation utilities should be parsing JSON to make it easier to test.Therefore
binUtils.outputFormatter
will need to ensure thaterror
type is a human formatted JSON, whilejson
type can now be the JSON output for exceptions. This is doable now due to thetoJSON
utility function inherited fromAbstractError
.Furthermore in order to acquire the options passed into the command, use
rootCommand.opts()
, this method was added into toPolykeyCommand
to enable us to acquire the options.The desired format for human format should be something like:
Note the usage of
\t
for separation while spaces are used for indentation. We can change this format later.The
exitCode
is currently a bit ambiguous, it was originally intended to mean that if this exception was the last exception caught by the process, this code should be used for the process exit code.It seems more ideal to allow the process to decide what the exit code should be based on the family of exceptions. But since we have built our exit code this way, we need to continue to use it like this.
However now that we have a
cause
chain, we have to decide how to deal with the exit code when we are gettingErrorPolykeyRemote
. Because of this scenario:We must differentiate exceptions that originate from the client, or the first agent or the second agent. If an exception comes from A2, we may see
ErrorPolykeyRemote
wrapping anotherErrorPolykeyRemote
.So right now a policy can be made:
cause
property that is notErrorPolykeyRemote
, and use thatexitCode
exitCode
of the first exception.This only works because
ErrorPolykeyRemote
's cause type is limited toErrorPolykey
, it cannot be undefined or anything else.Additional context
Tasks
CustomError
to newAbstractError<T>
from@matrixai/errors
errCode
cause
fromError
andtoError
functions to properly handle chained errorsjs-errors
to1.1.0
to use thefromJSON
andtoJSON
src/client
andsrc/agent
. Can use DI in the client service vs the agent service to change the serialisation/deserialisation.src/bin/polykey.ts
andsrc/bin/polykey-agent.ts
AbstractError.fromJSON
can handle non-existentcause
propertyThe text was updated successfully, but these errors were encountered: