Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sender Data: Fetch and store sender data when we receive a new megolm session #3543

Closed
andybalaam opened this issue Jun 13, 2024 · 1 comment
Assignees

Comments

@andybalaam
Copy link
Contributor

andybalaam commented Jun 13, 2024

Update: the algorithm below describes background tasks but after discussion on a PR for this, we think we can avoid that, and instead trigger a retry when receive_keys_query_response runs. I updated #3546 to specify that we should do this during receive_keys_query_response as well as on a schedule.

When we receive a to-device message containing a megolm session, fetch and store the sender data for it.

Part of #3544 which is part of Invisible Crypto. Depends on #3542 because it holds the information in the data structures created there.

This task includes adding a SessionManager to prevent clashes between concurrent tasks.

Preventing clashes

Cross-process, we are protected by the cross-process lock - the entire OlmMachine will be reloaded if some other process takes the lock. But we need a way to prevent 2 tasks both updating a session at the same time.

Possibly something like this (but could do with some validation from the Rust team):

struct SessionManager {
    sessions_being_processed: HashSet<OwnedSessionId>
}

impl SessionManager {
    fn try_lock(session_id: &SessionId) -> Option<SessionGuard>;  // If None, give up on this session
}

Pass the SessionGuard in to any async tasks you spawn. I.e. you keep hold of the lock even across the async boundary.

Algorithm

When we receive a to-device message establishing a megolm session:

A (start)

  • [take the lock] TODO: if we fail to get it, can we just bail out here, dropping the information in this to-device message? What if it contained device info that we need? How will this work if someone maliciously sent us a duplicate of someone else's session id?

  • Does the to-device message contain the device_keys property from MSC4147? Yes->D No->B

B (no device info in to-device message)

We need to find the device details. If we have them in the store, we should use them immediately (rather than waiting for a background task to pick up the session for further processing).

  • Does the locally-cached (in the store) devices list contain a device with the curve key of the sender of the to-device message? Yes->D No->C

C (no device info locally)

  • Save this session into the store with no device info, marked as not-legacy, next_retry_time_ms = now (in case the app gets killed) and retry_count = 0.

  • ↗️ Return, and kick off an async task [keep the lock]: run OlmMachine::get_user_devices (which waits for /keys/query to complete, then fetches all device info for the user.) Then it should find a device with the curve key we know we used to decrypt the to-device message (same as in get_verification_state. Probably we want to move the impl of get_verification_state into another function we call now, and get_verification_state will look up what we stored instead of calculating it at the time it is called).

If the device is there, -> D
If we still don’t have the device info, -> 😴 Wait to see whether we get device info later. Increment retry_count and set next_retry_time_ms per backoff algorithm; let the background job pick it up [drop the lock]

D (we have device info)

  • Is the device info cross-signed?

No -> 😴 Wait to see if the device becomes cross-signed soon. Increment retry_count and set next_retry_time_ms per backoff algorithm; let the background job pick it up [drop the lock]
Yes -> E

E (we have cross-signed device info)

  • Do we have the cross-signing key for this user? Yes -> G No -> F

F (we have cross-signed device info, but no cross-signing keys)

  • Upsert the session with the (cross-signed) device info we have, still marked as not-legacy. Set next_retry_time_ms = now and retry_count = 0.
  • ↗️ Return, and kick off an async task [keep the lock]: run OlmMachine::get_identity (which waits for /keys/query to complete, then fetches this user's cross-signing key from the store.) If we still don’t have a cross-signing key -> 😴 Wait to see if we get one soon. Do nothing; let the background job pick it up [drop the lock]

G (we have cross-signing key)

  • Does the cross-signing key match that used to sign the device info?

Yes -> H
No -> 😴 Wait to see if the cross-signing key is updated soon. Increment retry_count and set next_retry_time_ms per backoff algorithm; let the background job pick it up [drop the lock]

H (cross-signing key matches that used to sign the device info!)

  • Is the signature in the device info (ed25519:<ssk_id>) valid (SelfSigningPubKey::verify_device_keys)?

Yes -> J
No -> ❗Session is invalid: drop it from the store and forget it (also the device???)

J (device info is verified by matching cross-signing key)

  • Look up the MXID and MSK for the user sending the to-device message.
  • Decide the MSK trust level based on whether we have verified this user (matrix_sdk_crypto::identities::user::UserIdentity::is_verified).
  • Upsert the session including the MXID, MSK and trust level. Remove the device info and retries since we don't need them.
  • Add this information to the sender_data.
  • [drop the lock]

Note: the sender data may become out-of-date if we later verify the user. We have no plans to update it if so.

@andybalaam andybalaam self-assigned this Jun 17, 2024
andybalaam added a commit that referenced this issue Jul 16, 2024
Part of #3543.
Builds on top of #3556

Implements the "fast lane" as described in
#3544

This will begin to populate `InboundGroupSession`s with the new
`SenderData` struct introduced in
#3556 but it will only
do it when the information is already available in the store. Future PRs
for this issue will query Matrix APIs using spawned async tasks.

Future issues will do retries and migration of old sessions.

---------

Signed-off-by: Andy Balaam <mail@artificialworlds.net>
Co-authored-by: Damir Jelić <poljar@termina.org.uk>
@andybalaam
Copy link
Contributor Author

Given that the background task stuff is no longer relevant, this is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant