-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Encrypted appservices: how to? #10653
Comments
Do we have a use case for this? Reading the to device messages doesn't let you decrypt them without knowing the secrets generated by one of the clients. |
Not at present, but there is potential for it to be used in some of the customer sectors. Device messages don't have to be encrypted, and custom events are likely to be involved. |
Just an update on this: I think as a first cut I can bludgeon my way through enough of the project to get something testable and measure performance from that. Then from there we can see just how bad it actually is, similar to the conclusions found in #8903 |
I'm not quite clear what messages should go into each pile. Is the explicit one for to-device messages towards users that are in the appservice's namespace, whereas the implicit is just for that users that share rooms? |
It'd be tied to the An example of an implicit appservice is the communities v2 proxy we ended up writing to prototype an early version of Spaces: it needed to impersonate the caller, so had a |
@turt2live have you got what you need from the backend team to be getting on with? |
@erikjohnston Note that I've now taken over the Synapse side of this work. |
@turt2live One optimisation we could make is deciding which AS should receive a particular to-device message upon its creation/receiving it, and storing those mappings as a table somewhere in the database. This would be in contrast to calculating it all after the fact. That would be more efficient to pull out, but it does have limitations for some usecases. One I can think of is:
Is deciding AS destinations when to-device messages come in a problem anywhere, or would that generally be "fine"? |
I think it'd be fine. We also have the appservice flag on the user account to know if the user account was created by the appservice, which should be a good metric for deciding if the appservice has authority over the device messages. |
@turt2live @reivilibre and I had a conversation about how sending one-time key counts and fallback key information to application services would work. For a bit of background, devices claim one-time keys from other devices when starting a new encryption session with them. These one-time keys are uploaded by a device to the homeserver. Typically for clients, the count of how many one-time keys you have left (per algorithm type) is returned in the response from Fallback keys are pretty much what they say on the tin. In the case of running out of one-time keys on the server, a fallback key will be returned instead of a one-time key when another device tries to claim it. These fallback keys are initially generated by the device, and can be useful when the device goes offline for a long time while other devices encrypt messages to it. When the device comes back online, they'll be told via The problem is that the current design 1. always returns one-time key counts and 2. always returns the state of whether your fallback has been used in the response for each sync request. This would be very noisy for the many, many potential users an appservice could have interest in. To help mitigate this, the three of us came up with a design to help limit the amount of necessary traffic, while still conveying the same information. For each EDU and PDU sent to an application service, the AS users that are the recipients of each EDU and PDU in the transaction should have their one-time key counts included for all of their devices. The same goes for fallback keys - however fallback key information is only included when a fallback key has been used (a client's device doesn't need to take action otherwise). A user is defined as a "recipient" of a PDU/EDU if that message would normally be received by them down Note that limiting the sending of these counts, rather than sending them for all users on every AS transaction, is currently considered an implementation optimisation, rather than something that should be baked into the spec - thus why I'm discussing it here rather than on MSC3202. The application service is expected to use One outstanding question: should one-time key and fallback key information for |
Always including the |
That matches up with what I understood; thanks for writing it up!
I believe the spec only says that they need be sent when they change; as such, sending them whenever a message is inbound for the user is actually chattier than the spec requires. |
I've updated MSC3202 with the above to explicitly write down how implementations might want to take this approach, and why it's not too bad. |
It's worth noting a few points for clarity, sorry if this is regurgitating stuff already said. The OTK pool a user can be exhausted without the user ever receiving an event or to-device message (though this would usually be due to a malicious actor). I think its probably fine as we a) have fallback keys and b) the OTK counts would get sent the next the AS user got a message. I wonder if we should only include counts for AS users receiving a to-device message? That is the only real reason for an OTK to get consumed. Though means it'll take longer for the AS to see updated counts if a malicious actor has drained the OTK pool. Has any thought gone into how to handle auto provisioning AS users? Right now a real user can start a DM with an AS user and the AS will auto-provision it if it doesn't already exist. That will no longer work if the AS now has to upload keys before an AS user can receive a message. I'm wondering if a different solution here would be to proxy requests for OTKs to the AS itself? That way the AS can auto-provision things, and we can skip all the logic for sending OTK counts. |
Auto-provisioning should be fine? The request being blocked for user creation can wait a little bit longer while the appservice uploads a handful of keys (instead of all 50 or whatever it ends up wanting to generate). The remainder of the keys can be uploaded in the background.
It's possible for the counts to diminish without a device message, so the theory was that being involved in an encrypted conversation is the next best point to consider including counts. |
The end goal of this adventure is to have an appservice which can participate in encrypted rooms. While this doesn't necessarily provide end-to-end encryption, it does mean that encrypted messages can be decrypted by application services. The driving usecase is largely private context (DM/ping me internally), but is effectively a high-traffic single-user highly reliable bot using the appservice API (sync won't keep up). Other usecases like bridges are worth considering, even if not driving this work, due to the fact that someone will try it despite warnings.
There's 3-4 major pieces in order to get this support working:
to-device
messages to appservices for their interested users. This has security context to consider.Other considerations, like setting up key backups, are already largely solved due to the masquerading support on the endpoints. Account data changes might need to make it down to appservices, but this can be considered future work for the purposes of this conversation.
Sending device messages (
to-device
)This is effectively sending ephemeral events to appservices, which Synapse already supports for read receipts, typing, and presence. Currently the streams appear to operate off a
from
andto
token system, grabbing a batch of events between those tokens with some filter criteria, however for appservices the filter criteria is a bit more nuanced. Specifically, the appservice can easily get to millions of users under its namespace which means millions of inboxes to check. We could add an array of user IDs to the DB fetch function for inboxes to check, however this can, again, be millions of entries long: this would be bad for the DB server.Appservices already have to care about implicit versus explicit user reservations and will likely have to do additional filtering based upon that, so the proposal I have is largely that the appservice handler grab all device messages for all inboxes between the stream tokens, then filter that down to interested users in code. The function would build two piles: implicit interest and explicit interest. Explicit interest would cause the function to delete those messages from the inbox as they'll shortly be "delivered". Implicit interest means those messages won't be deleted, but will still be sent to the appservice. This is where the security context comes into play: this provides a way for appservices to theoretically intercept device messages without the user knowing. However, the server admin will have had to approve the appservice implicitly by installing it, so it might be fine. This might need more discussion.
An issue with deleting the device messages is reliability: where the code appears to build the transaction and where that transaction is sent to the appservice is disjointed. This isn't important for things like presence, typing, or read receipts (to a degree), however for device messages if the server were to be restarted while an undelivered transaction is in the queue then the messages will be lost, leading to UISIs (in this use case).
If Synapse's streams send events and not just stream positions over replication then the appservice handler might be able to just queue the device messages into the transaction straight from there, thus not having to bother with any stream ID nonsense. However, this assumes that replication is reliable and that the events are in fact sent over replication.
Sending device list changes
The current proposal for this is to send device list changes through at the top level of the transaction: matrix-org/matrix-spec-proposals#3202
Filtering changes is relatively easy as the logic should already exist for presence. Determining when a device list has left the appservice's point of view is a bit more of a challenge, I think, as it'd mean tying the appservice handling to membership events and running a fairly expensive loop of checking all the rooms the user was in prior to the
leave
and matching those rooms against appservice interest.For the purposes of determining interest, usually this sort of thing would be checked against whether the appservice was interested in the room or not, however given the intended use case of device lists it seems fair to ensure the appservice has a relevant member in the room which would trigger interest instead. This changes the check to determine if the appservice no longer shares a room with the user after the
leave
event, which might be even more expensive but involve less spam?This also has the potential to be the same problem as device messages: reliability during downtime of the appservice is diminished.
OTKs & fallback keys
This is an unsolved problem as of writing. The MSC has a thread to say that this information shouldn't be sent over in a similar fashion to
/sync
(which always presents it for the user) due to performance concerns: if it always included the information, there'd be millions of objects in the JSON to worry about. Instead, the suggestion is to include the field when the counts/fallback key usage changes. I have no idea if this is even possible to detect in Synapse, but the cheap solution would be to include the fields if the relevant user is receiving a device message.Possible alternatives
We could invest even more heavily in
/sync
, however that doesn't help the encrypted bridges use case. A million sync streams sounds worse than an expensive appservice loop, but neither is going to perform well anyways.The text was updated successfully, but these errors were encountered: