Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] MSC2438: Local and Federated User Erasure Requests #2438

Draft
wants to merge 4 commits into
base: old_master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions proposals/2438-local-and-federated-erasure-requests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# MSC2438: Local and Federated User Erasure Requests

When communicating across Matrix, it's not uncommon for user data and
metadata to be strewn across many different servers and services. Given this,
it is necessary to have a mechanism for removing as much personal data as
possible across the ecosystem upon user request.

This proposal specifies a best-effort method for erasing one's presence
across a Matrix federation, beginning with your own homeserver.

This proposal will mention 'personal data', however it intentionally leaves
the definition vague on purpose. Implementations SHOULD remove as much
identifying information about a user as they can.

## Proposal

Changes across multiple APIs are necessary to communicate requests of user
data erasure across all the different bits and pieces of the Matrix
ecosystem. We start with the initial erasure request from a user to their
homeserver.

A new parameter to the
[`/account/deactivate`](https://matrix.org/docs/spec/client_server/r0.6.0#post-matrix-client-r0-account-deactivate)
Client-Server API endpoint will be added, called `erase`, which is a boolean
that specifies whether the homeserver MUST attempt to erase all personal
data pertaining to the user off of the homeserver and as much of the rest of
the federation as it can.
Comment on lines +22 to +27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ended up implementing erase long ago, but without the federation propagation bit. #4025 covers the history here (and may affect how this proposal operates).


Example request:

```
POST /_matrix/client/r0/account/deactivate

{
"auth": {
"type": "example.type.foo",
"session": "xxxxx",
"example_credential": "verypoorsharedsecret"
},
"erase": true
}
```

Example response:

```
{
"id_server_unbind_result": "success",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda sucks that we have feedback for identity servers, but not for other services. It would be helpful if the API supported some granularity how successful erase has been.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is a hack anyways. Granular responses is probably something for a different MSC, unless this MSC can somehow make it easy and not a nightmare to deal with. There's so much more UX to consider with granularity when the user probably only cares about it working or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this MSC is only concerned with getting the request out there, not getting feedback to the user. I'm still inclined to remove this field and just have the absence of an HTTP error mean success.

"erased": true
}
```

The `erased` field in the response is to allow the client to know whether the
erasure was successful in relation to the deactivation. At this time the
author is unsure about this due to:

* Non-clarity to the client about whether this means erasure was successful on
the user's homeserver, or across the global federation
* Whether we should just fail the request entirely if local user erase was
unsuccessful
Comment on lines +59 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to introduce a custom error code for this then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup


A call to this endpoint from the user kicks off the erasure flow. From this
point, we would like to communicate the erasure request to:

* Other homeservers
* Application services
* Identity Servers
* Any other service in the matrix ecosystem

which may have data (e.g. messages) pertaining to this user.

Upon receiving this request, the homeserver should forward it to every
homeserver it believes could also contain that user's data. How it does so is
left as an implementation detail. Once it's decided, the request will be
communicated over a new Federation API, `/_matrix/federation/v1/user/erase`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're on v2 now:

Suggested change
communicated over a new Federation API, `/_matrix/federation/v1/user/erase`.
communicated over a new Federation API, `/_matrix/federation/v2/user/erase`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my understanding that we'd start new APIs with v1, and only introduce a v2 for them if they need to be updated with breaking changes. Is this not right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any formal standard for this, but using the identity server as an example we added /v2 endpoints without /v1 counterparts, implying that we ratchet the whole version and not on a per-endpoint basis.

Copy link
Contributor

@babolivier babolivier Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a difference here which is that in the IS spec every existing endpoint got a v2 (even those that haven't changed in the transition), which imho means vX is the version of the API, whereas in the S2S spec most existing endpoints are only v1 and we've only been bumping per endpoint when we need to change it, which imho means vX is the version of the endpoint.

Argh all this is confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #2475 to settle this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vX is the version of the API,

no. The API doesn't have a single version. It just sounds like we happened to bump all the IS APIs at once.


Example request:

```
POST /_matrix/federation/v1/user/erase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're on v2 now:

Suggested change
POST /_matrix/federation/v1/user/erase
POST /_matrix/federation/v2/user/erase


{
"user_id": "@bob:example.com"
}
```

Example response:

```
{}
```

It should be noted here that erasure requests for a given user should only be
allowed from the homeserver the user belongs to. If this isn't the case, the
other homeserver should respond with a `403 M_FORBIDDEN`.

For all application services, a new API endpoint will be added on the
application service: `POST /_matrix/app/v1/users/erase`. It contains a
single, required field `user_id`, which is the user ID to erase identifying
data of.

Example request:

```
{
"user_id": "@someone:example.com"
}
```

At this point, the application service SHOULD try to erase as much
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that this proposal send a 200 on acknowledging but not if it's successful, not successful or even possible. It would be nice if the appservice could feedback in these different cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we don't plan to send feedback to the user about which application services out there couldn't process the request.

And I'm not sure what the homeserver would do in this case. It could keep retrying, but if the application service couldn't delete it the first time, why would a second request make any difference?

identifying information about this user as possible. Upon successfully
acknowledging the request, the application service should return a `200 OK`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected response if the application service fails to remove the user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this MSC is designed to just fan out the request, without returning any success/failure back to the user on whether the data was actually deleted. Thus here the 200 OK is just so that the AS acknowledged the request.

with an empty JSON body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected response if the application service hasn't ever seen the user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 200 OK, as it'd be the same result as the AS deleting all data of a known user.


Example response:

```
{}
```

For identity servers... (is reusing unbind enough, or do we need a separate
endpoint to delete the db rows?).

## Potential issues

As we live in an open federation, other services have the right to refuse
erasure request. (XXX: Does this mean anything legally?). It is not the
responsibility of the user's homeserver to ensure absolutely that all data
about this user across the federation has been deleted, which is impossible.
It simply needs to make its best attempt to request data erasure from all
necessary sources.

## Alternatives

This proposal relies on sending a federation request to another homeserver
(ideally retrying for a while if the other homeserver is currently offline),
which could potentially fail if the other homeserver doesn't come back on for
a long time period.

Alternative solutions have been considered:

* homeservers could maintain a public list of Matrix IDs that other
servers/services could poll periodically.

While this solves the problem with servers/ASes which are offline at the
point of the request, but instead gives us a "how often to poll" problem.
It's also slightly questionable to maintain a publicly-available list of
"everybody who has asked to be erased" - if nothing else, it seems counter to
the spirit of GDPR.

* An `m.room.erasure` state event could be sent that contains the erased user's
Matrix ID.

This works and uses existing mechanisms for reliable communication, however
comes with the same awkward public-list scenario as the above solution, as
well as adds yet more state to large rooms, not to mention state event
permission considerations.


## Security considerations

Malicious server admins can send out erasure requests for their own users
across the federation. However users already include their own homeserver in
their trust model, so this is a non-issue.