Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3848: Introduce errcodes for specific event sending failures #3848

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions proposals/3848-errcode-specific-failure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# MSC3848: Introduce errcodes for specific event sending failures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this MSC is trying to move faster than safety allows for. While implementations can (and should) move faster than the spec, and the spec should not arbitrarily delay process, this is the sort of MSC which needs time to settle and think about the problem at hand.

Specifically, I'm concerned that this MSC hasn't taken into consideration backwards compatibility of the ecosystem: we have a very long tail of old clients which will not have had a chance to upgrade, and may never will. To help move the spec along faster than the eternal tide of old clients, we'd effectively have to consider this a breaking change.

Breaking changes to endpoints are relatively easy, but also a bit messy in implementation: we'd bump the endpoint version and declare that the error codes are only valid on the new endpoint version. Thus, clients opt-in to using the error codes. We would also deprecate the now-old endpoints to force clients to consider using the new endpoints. Eventually, we'd remove the old ones.

The complexity for implementations is that clients would need to version check the server (or possibly rely on 404-like semantics), and servers would need to specifically handle their error codes per endpoint version (it would not be legal to return the new error codes on the old endpoint version).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention isn't to move particularly fast (and doesn't have an urgent project backing it), but while there is spare bandwidth going it felt wise to keep moving on an MSC before it gets forgotten about. That said, happy to have this sit for a bit and have implementations play with it so we can get a wider understanding of whether the MSC works.

I can happily include a passage about requiring v4 (or vNext) endpoints for this errcode to be returned, otherwise servers will continue to respond with M_FORBIDDEN.

Wrt implementations, would it be better for us to continue to have an experimental errcode key returned (like the proposal currently suggests) or use a unstable version prefix like /_matrix/client/org.matrix.msc3848/room/{rId}/invite which would probably match the reality of what clients have to do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MSC would likely claim v4, and if for some reason something else beats it to it then we might consider bundling the two MSCs into one version, or ask that this MSC use v5 or whatever. For now though, it's much simpler to assume v4 will be the thing.

I'd encourage both implementations, I think. Not at the same time mind you, but it can be worth seeing which of the approaches works "better" for clients and what their concerns are. We might also explore alternative error code formats, as suggested in the SCT office room (array of error codes?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - I don't think we can, in good conscience, change the error codes on existing endpoints. It's frustrating, because this is very much unspecced behaviour that servers should be within their rights to change, but our continual failure to spec useful error codes means that the existing error codes become part of the de-facto spec.

I'd be more relaxed about replacing M_UNKNOWN. Anyone attaching semantic meaning to that deserves to keep both pieces when it breaks.

So yeah, new endpoint versions here, please, unless you can justify any particular change as being ok.

(OTOH: I very much applaud efforts to make the error codes more useful.)


When performing an action on the C-S API, sometimes requests will fail with
a generic `M_FORBIDDEN` error with implementations providing a more meaningful
context in the `error` field. While some client implementations have taken it
upon themselves to scan these error fields for probable causes, this isn't a
particularly safe or spec-complaint way to react to errors.

Some examples of failures which require more standardized error information
include already-joined errors when a user tries to invite another user to a room,
or being unable to send an event into a room due to lacking the required power level.

For this reason, this proposal suggests including new `errcode` definitions
which provide more specific information about the failure.

## Proposal

New `errcode` would be introduced into the error body of a response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds like we're adding a whole new property, which I don't think is the case?

(https://spec.matrix.org/v1.3/client-server-api/#standard-error-response).

`M_ALREADY_JOINED` would be fired when a membership action fails when the authenticated user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you maybe structure this so that it's clearer which text applies to which error code? Maybe subsection headers?

is already joined to the room.
This would cover endpoints:
- [POST /_matrix/client/v3/rooms/{roomId}/invite](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3roomsroomidinvite)
- [POST /_matrix/client/v3/rooms/knock/{roomIdOrAlias}](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3knockroomidoralias)
- [PUT /_matrix/client/v3/rooms/{roomId}/state/{eventType}/{stateKey}](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3roomsroomidstateeventtypestatekey)
Note that it would not cover endpoints where trying to join a room when the
user is already joined would no-op, like `POST /_matrix/client/v3/join/{roomIdOrAlias}`.

`M_INSUFFICIENT_POWER` would be when the authenticated user does not have the specific required power level to
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved
perform an action in the room.
`M_NOT_JOINED` would be when the authenticated user is not joined to a room, but attempts to perform
an action in it.
Both errcodes would cover endpoints:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synapse also returns not in room when /leaveing a room you're not in (I'm not sure if that should actually just be a no-op without erroring, but if it returns an error, it should be included in endpoints that return M_NOT_JOINED)

Copy link
Contributor Author

@Half-Shot Half-Shot Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on Synapse and found:

  • Leaving a room that doesn't exist on the host returns "M_UNKNOWN" -> Not a known room
  • Leaving a room that I was previously in will always no-op with a 200.
  • Leaving a room on the host that I was never in will result in a M_FORBIDDEN.

I think we should consistently define the behavior for trying to leave a room you are not in, rather than having 3 different error responses (differing messages are fine) for effectively the same state. My gut feeling is that we should probably be a no-op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- [POST /_matrix/client/v3/rooms/{roomId}/invite](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3roomsroomidinvite)
- [POST /_matrix/client/v3/rooms/knock/{roomIdOrAlias}](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3knockroomidoralias)
- [POST /_matrix/client/v3/rooms/{roomId}/unban](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3roomsroomidban)
- [POST /_matrix/client/v3/rooms/{roomId}/ban](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3roomsroomidban)
- [POST /_matrix/client/v3/rooms/{roomId}/kick](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3roomsroomidkick)
- [PUT /_matrix/client/v3/rooms/{roomId}/state/{eventType}/{stateKey}](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3roomsroomidstateeventtypestatekey)
- [PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3roomsroomidsendeventtypetxnid)

Half-Shot marked this conversation as resolved.
Show resolved Hide resolved

## Potential issues

Changing long-established error codes in Matrix will be fraught with risk, as many
clients will need updating to support the new error types. Failure to do so might lead
to unexpected behaviours or confusing error messages. For this reason, the unstable implementation
will continue to provide the old errcode in the body of the error while providing the
new proposed errcode under it's own field. This gives clients a chance to adapt to the
Half-Shot marked this conversation as resolved.
Show resolved Hide resolved
new errcode / ensure their behaviours with unexpected errcodes are acceptable.

## Alternatives
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is making errcode an array of errors, which the client can pick from based on what errcodes it's aware of.

The drawbacks are that this is still a breaking change today, and potentially over-engineering the problem. If we expect errcodes to change infrequently, it might be better to continue to version bump the spec. The other problem is maintaining a potentially long tail of errcodes for older and older generations of the spec.


We could introduce a second field to the error body for more specific errors, but this would likely make
error handling in clients much more complicated. There is precedence already in the spec for specific
error codes for specific failures, so there is little need to subclass.

## Security considerations

None.

## Unstable prefix

While this MSC is not considered stable for implementation, implementations should use `org.matrix.unstable.errcode`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikjohnston raised the point in the Synapse PR that we should still namespace this field, so implementations will need to adapt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A generic unstable errcode field would be nice so each MSC improving errors wouldn't need a different one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counterpoint made was that other MSCs can re-use this MSC's key matrix-org/synapse#13343 (comment)

Copy link
Member

@tulir tulir Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that makes sense, hopefully we remember to use that key (although a generic key could be forgot too, so namespacing makes sense either way)

as a prefix to the fields on the error body. `M_FORBIDDEN` will still be emitted as a `errcode` while the
MSC is unstable, and will be replaced when the spec stabilizes.

The errcodes will be have the prefix `M_` replaced with `ORG.MATRIX.MSC3848.`.

## Dependencies

None.