-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Recheck if remote device is cached before requesting it #16252
Conversation
This fixes a bug where we could get stuck re-requesting the device over replication again and again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK, but...I have questions.
user_result, user_failed = await self._user_device_resync_returning_failed( | ||
user_id | ||
) | ||
async with self._resync_linearizer.queue(user_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is queueing the right thing here? Should we instead share the results via an ObservableDeferred
? (If you have two requests for the same user in a row there's no reason to service the second if the first is ongoing?)
Edit: It looks like this is essentially what we're doing by checking the cache inside of _user_device_resync_returning_failed
?
Will this make it harder to batch these? (Maybe not the immediate concern, but don't want to design ourselves into a corner.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these weren't device lists I would be more gung ho about batching these up. However, we need to make sure that we don't break any semantics. Specifically I'm worried about the situations where we get two requests for a remote users device lists, one that happens before we get poked about a new device and one after, if we coalesce those two remote calls we need to make sure that we don't return the old set of devices (retrieved by the first request) to the second request. IYSWIM....
I don't think this backs us in to a corner, we can always change this down the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're not talking about the same batching -- I was talking about batching multiple users (based on the comment a few lines above this), but maybe that's tricky.
# Check that we haven't gone and fetched the devices since we last | ||
# checked if we needed to resync these device lists. | ||
if await self.store.get_users_whose_devices_are_cached([user_id]): | ||
cached = await self.store.get_cached_devices_for_user(user_id) | ||
return cached, False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is safe because if we linearized on user ID, then request the same user again but have not received a request to invalidate that data (via a resync request) then it must still be valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we already do do this check before entering this block, so this is safe. The flow is:
- Client reader receives a requests for keys.
- Client reader checks if keys are cached, if so returns them.
- If not, client reader queries master to fetch the keys.
- New Master linearizes requests for a users keys
- New Master checks if we now have the keys in the cache, if so returns.
- Fetches keys from the remote server, and caches them if we share a room with the remote user.
So this is step 5, which is safe because we're basically re-doing a check we have previously done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough explanation. Looks like you need to merge in develop
though!
No significant changes since 1.93.0rc1. The following issues are fixed in 1.93.0 (and RCs). - [GHSA-4f74-84v3-j9q5](GHSA-4f74-84v3-j9q5) / [CVE-2023-41335](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-41335) — Low Severity Temporary storage of plaintext passwords during password changes. - [GHSA-7565-cq32-vx2x](GHSA-7565-cq32-vx2x) / [CVE-2023-42453](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-42453) — Low Severity Improper validation of receipts allows forged read receipts. See the advisories for more details. If you have any questions, email security@matrix.org. - Add automatic purge after all users have forgotten a room. ([\matrix-org#15488](matrix-org#15488)) - Restore room purge/shutdown after a Synapse restart. ([\matrix-org#15488](matrix-org#15488)) - Support resolving homeservers using `matrix-fed` DNS SRV records from [MSC4040](matrix-org/matrix-spec-proposals#4040). ([\matrix-org#16137](matrix-org#16137)) - Add the ability to use `G` (GiB) and `T` (TiB) suffixes in configuration options that refer to numbers of bytes. ([\matrix-org#16219](matrix-org#16219)) - Add span information to requests sent to appservices. Contributed by MTRNord. ([\matrix-org#16227](matrix-org#16227)) - Add the ability to enable/disable registrations when using CAS. Contributed by Aurélien Grimpard. ([\matrix-org#16262](matrix-org#16262)) - Allow the `/notifications` endpoint to be routed to workers. ([\matrix-org#16265](matrix-org#16265)) - Enable users to easily unsubscribe to notifications emails via the `List-Unsubscribe` header. ([\matrix-org#16274](matrix-org#16274)) - Report whether a user is `locked` in the [List Accounts admin API](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html#list-accounts), and exclude locked users by default. ([\matrix-org#16328](matrix-org#16328)) - Fix a long-standing bug where multi-device accounts could cause high load due to presence. ([\matrix-org#16066](matrix-org#16066), [\matrix-org#16170](matrix-org#16170), [\matrix-org#16171](matrix-org#16171), [\matrix-org#16172](matrix-org#16172), [\matrix-org#16174](matrix-org#16174)) - Fix a long-standing bug where appservices using [MSC2409](matrix-org/matrix-spec-proposals#2409) to receive `to_device` messages would only get messages for one user. ([\matrix-org#16251](matrix-org#16251)) - Fix bug when using workers where Synapse could end up re-requesting the same remote device repeatedly. ([\matrix-org#16252](matrix-org#16252)) - Fix long-standing bug where we kept re-requesting a remote server's key repeatedly, potentially causing delays in receiving events over federation. ([\matrix-org#16257](matrix-org#16257)) - Avoid temporary storage of sensitive information. ([\matrix-org#16272](matrix-org#16272)) - Fix bug introduced in Synapse 1.49.0 when using dehydrated devices ([MSC2697](matrix-org/matrix-spec-proposals#2697)) and refresh tokens. Contributed by Hanadi. ([\matrix-org#16288](matrix-org#16288)) - Fix a long-standing bug where invalid receipts would be accepted. ([\matrix-org#16327](matrix-org#16327)) - Use standard name for UTF-8 charset in emails. ([\matrix-org#16329](matrix-org#16329)) - Don't try refetching device lists for users on remote hosts that are marked as "down". ([\matrix-org#16298](matrix-org#16298)) - Fix typos in the documentation. ([\matrix-org#16282](matrix-org#16282)) - Link to the Alpine Linux community package for Synapse. ([\matrix-org#16304](matrix-org#16304)) - Use string for `federation_client_minimum_tls_version` documentation examples. Contributed by @jcgruenhage. ([\matrix-org#16353](matrix-org#16353)) - Allow modules to delete rooms. ([\matrix-org#15997](matrix-org#15997)) - Add GCC and GNU Make to the Nix flake development environment so that `ruff` can be compiled. ([\matrix-org#16090](matrix-org#16090), [\matrix-org#16263](matrix-org#16263)) - Fix type checking when using the new version of Twisted. ([\matrix-org#16235](matrix-org#16235)) - Delete device messages asynchronously and in staged batches using the task scheduler. ([\matrix-org#16240](matrix-org#16240), [\matrix-org#16311](matrix-org#16311), [\matrix-org#16312](matrix-org#16312), [\matrix-org#16313](matrix-org#16313)) - Bump minimum supported Rust version to 1.61.0. ([\matrix-org#16248](matrix-org#16248)) - Update rust to version 1.71.1 in the nix development environment. ([\matrix-org#16260](matrix-org#16260)) - Simplify server key storage. ([\matrix-org#16261](matrix-org#16261)) - Reduce CPU overhead of change password endpoint. ([\matrix-org#16264](matrix-org#16264)) - Stop purging from tables slated for removal. ([\matrix-org#16273](matrix-org#16273)) - Improve type hints. ([\matrix-org#16276](matrix-org#16276), [\matrix-org#16301](matrix-org#16301), [\matrix-org#16325](matrix-org#16325), [\matrix-org#16326](matrix-org#16326)) - Raise `setuptools_rust` version cap to 1.7.0. ([\matrix-org#16277](matrix-org#16277)) - Fix using the new task scheduler causing lots of CPU to be used. ([\matrix-org#16278](matrix-org#16278)) - Upgrade CI run of Python 3.12 from rc1 to rc2. ([\matrix-org#16280](matrix-org#16280)) - Include values in SQL debug when using `execute_values` with Postgres. ([\matrix-org#16281](matrix-org#16281)) - Enable additional linting checks. ([\matrix-org#16283](matrix-org#16283)) - Refactor `receipts_graph` Postgres transactions to stop error messages. ([\matrix-org#16299](matrix-org#16299)) - Small improvements to logging in replication code. ([\matrix-org#16309](matrix-org#16309)) - Remove a reference cycle in background processes. ([\matrix-org#16314](matrix-org#16314)) - Only use literal strings for background process names. ([\matrix-org#16315](matrix-org#16315)) - Refactor `get_user_by_id`. ([\matrix-org#16316](matrix-org#16316)) - Speed up task to delete to-device messages. ([\matrix-org#16318](matrix-org#16318)) - Avoid patching code in tests. ([\matrix-org#16349](matrix-org#16349)) - Test against PostgreSQL 16. ([\matrix-org#16351](matrix-org#16351)) * Bump mypy from 1.4.1 to 1.5.1. ([\matrix-org#16300](matrix-org#16300)) * Bump black from 23.7.0 to 23.9.1. ([\matrix-org#16295](matrix-org#16295)) * Bump docker/build-push-action from 4 to 5. ([\matrix-org#16336](matrix-org#16336)) * Bump docker/login-action from 2 to 3. ([\matrix-org#16339](matrix-org#16339)) * Bump docker/metadata-action from 4 to 5. ([\matrix-org#16337](matrix-org#16337)) * Bump docker/setup-qemu-action from 2 to 3. ([\matrix-org#16338](matrix-org#16338)) * Bump furo from 2023.8.19 to 2023.9.10. ([\matrix-org#16340](matrix-org#16340)) * Bump gitpython from 3.1.32 to 3.1.35. ([\matrix-org#16267](matrix-org#16267), [\matrix-org#16279](matrix-org#16279)) * Bump mypy-zope from 1.0.0 to 1.0.1. ([\matrix-org#16291](matrix-org#16291)) * Bump pillow from 10.0.0 to 10.0.1. ([\matrix-org#16344](matrix-org#16344)) * Bump regex from 1.9.4 to 1.9.5. ([\matrix-org#16233](matrix-org#16233)) * Bump ruff from 0.0.286 to 0.0.290. ([\matrix-org#16342](matrix-org#16342)) * Bump serde_json from 1.0.105 to 1.0.107. ([\matrix-org#16296](matrix-org#16296), [\matrix-org#16345](matrix-org#16345)) * Bump twisted from 22.10.0 to 23.8.0. ([\matrix-org#16235](matrix-org#16235)) * Bump types-pillow from 10.0.0.2 to 10.0.0.3. ([\matrix-org#16293](matrix-org#16293)) * Bump types-setuptools from 68.0.0.3 to 68.2.0.0. ([\matrix-org#16292](matrix-org#16292)) * Bump typing-extensions from 4.7.1 to 4.8.0. ([\matrix-org#16341](matrix-org#16341))
No significant changes since 1.93.0rc1. The following issues are fixed in 1.93.0 (and RCs). - [GHSA-4f74-84v3-j9q5](GHSA-4f74-84v3-j9q5) / [CVE-2023-41335](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-41335) — Low Severity Temporary storage of plaintext passwords during password changes. - [GHSA-7565-cq32-vx2x](GHSA-7565-cq32-vx2x) / [CVE-2023-42453](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-42453) — Low Severity Improper validation of receipts allows forged read receipts. See the advisories for more details. If you have any questions, email security@matrix.org. - Add automatic purge after all users have forgotten a room. ([\matrix-org#15488](matrix-org#15488)) - Restore room purge/shutdown after a Synapse restart. ([\matrix-org#15488](matrix-org#15488)) - Support resolving homeservers using `matrix-fed` DNS SRV records from [MSC4040](matrix-org/matrix-spec-proposals#4040). ([\matrix-org#16137](matrix-org#16137)) - Add the ability to use `G` (GiB) and `T` (TiB) suffixes in configuration options that refer to numbers of bytes. ([\matrix-org#16219](matrix-org#16219)) - Add span information to requests sent to appservices. Contributed by MTRNord. ([\matrix-org#16227](matrix-org#16227)) - Add the ability to enable/disable registrations when using CAS. Contributed by Aurélien Grimpard. ([\matrix-org#16262](matrix-org#16262)) - Allow the `/notifications` endpoint to be routed to workers. ([\matrix-org#16265](matrix-org#16265)) - Enable users to easily unsubscribe to notifications emails via the `List-Unsubscribe` header. ([\matrix-org#16274](matrix-org#16274)) - Report whether a user is `locked` in the [List Accounts admin API](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html#list-accounts), and exclude locked users by default. ([\matrix-org#16328](matrix-org#16328)) - Fix a long-standing bug where multi-device accounts could cause high load due to presence. ([\matrix-org#16066](matrix-org#16066), [\matrix-org#16170](matrix-org#16170), [\matrix-org#16171](matrix-org#16171), [\matrix-org#16172](matrix-org#16172), [\matrix-org#16174](matrix-org#16174)) - Fix a long-standing bug where appservices using [MSC2409](matrix-org/matrix-spec-proposals#2409) to receive `to_device` messages would only get messages for one user. ([\matrix-org#16251](matrix-org#16251)) - Fix bug when using workers where Synapse could end up re-requesting the same remote device repeatedly. ([\matrix-org#16252](matrix-org#16252)) - Fix long-standing bug where we kept re-requesting a remote server's key repeatedly, potentially causing delays in receiving events over federation. ([\matrix-org#16257](matrix-org#16257)) - Avoid temporary storage of sensitive information. ([\matrix-org#16272](matrix-org#16272)) - Fix bug introduced in Synapse 1.49.0 when using dehydrated devices ([MSC2697](matrix-org/matrix-spec-proposals#2697)) and refresh tokens. Contributed by Hanadi. ([\matrix-org#16288](matrix-org#16288)) - Fix a long-standing bug where invalid receipts would be accepted. ([\matrix-org#16327](matrix-org#16327)) - Use standard name for UTF-8 charset in emails. ([\matrix-org#16329](matrix-org#16329)) - Don't try refetching device lists for users on remote hosts that are marked as "down". ([\matrix-org#16298](matrix-org#16298)) - Fix typos in the documentation. ([\matrix-org#16282](matrix-org#16282)) - Link to the Alpine Linux community package for Synapse. ([\matrix-org#16304](matrix-org#16304)) - Use string for `federation_client_minimum_tls_version` documentation examples. Contributed by @jcgruenhage. ([\matrix-org#16353](matrix-org#16353)) - Allow modules to delete rooms. ([\matrix-org#15997](matrix-org#15997)) - Add GCC and GNU Make to the Nix flake development environment so that `ruff` can be compiled. ([\matrix-org#16090](matrix-org#16090), [\matrix-org#16263](matrix-org#16263)) - Fix type checking when using the new version of Twisted. ([\matrix-org#16235](matrix-org#16235)) - Delete device messages asynchronously and in staged batches using the task scheduler. ([\matrix-org#16240](matrix-org#16240), [\matrix-org#16311](matrix-org#16311), [\matrix-org#16312](matrix-org#16312), [\matrix-org#16313](matrix-org#16313)) - Bump minimum supported Rust version to 1.61.0. ([\matrix-org#16248](matrix-org#16248)) - Update rust to version 1.71.1 in the nix development environment. ([\matrix-org#16260](matrix-org#16260)) - Simplify server key storage. ([\matrix-org#16261](matrix-org#16261)) - Reduce CPU overhead of change password endpoint. ([\matrix-org#16264](matrix-org#16264)) - Stop purging from tables slated for removal. ([\matrix-org#16273](matrix-org#16273)) - Improve type hints. ([\matrix-org#16276](matrix-org#16276), [\matrix-org#16301](matrix-org#16301), [\matrix-org#16325](matrix-org#16325), [\matrix-org#16326](matrix-org#16326)) - Raise `setuptools_rust` version cap to 1.7.0. ([\matrix-org#16277](matrix-org#16277)) - Fix using the new task scheduler causing lots of CPU to be used. ([\matrix-org#16278](matrix-org#16278)) - Upgrade CI run of Python 3.12 from rc1 to rc2. ([\matrix-org#16280](matrix-org#16280)) - Include values in SQL debug when using `execute_values` with Postgres. ([\matrix-org#16281](matrix-org#16281)) - Enable additional linting checks. ([\matrix-org#16283](matrix-org#16283)) - Refactor `receipts_graph` Postgres transactions to stop error messages. ([\matrix-org#16299](matrix-org#16299)) - Small improvements to logging in replication code. ([\matrix-org#16309](matrix-org#16309)) - Remove a reference cycle in background processes. ([\matrix-org#16314](matrix-org#16314)) - Only use literal strings for background process names. ([\matrix-org#16315](matrix-org#16315)) - Refactor `get_user_by_id`. ([\matrix-org#16316](matrix-org#16316)) - Speed up task to delete to-device messages. ([\matrix-org#16318](matrix-org#16318)) - Avoid patching code in tests. ([\matrix-org#16349](matrix-org#16349)) - Test against PostgreSQL 16. ([\matrix-org#16351](matrix-org#16351)) * Bump mypy from 1.4.1 to 1.5.1. ([\matrix-org#16300](matrix-org#16300)) * Bump black from 23.7.0 to 23.9.1. ([\matrix-org#16295](matrix-org#16295)) * Bump docker/build-push-action from 4 to 5. ([\matrix-org#16336](matrix-org#16336)) * Bump docker/login-action from 2 to 3. ([\matrix-org#16339](matrix-org#16339)) * Bump docker/metadata-action from 4 to 5. ([\matrix-org#16337](matrix-org#16337)) * Bump docker/setup-qemu-action from 2 to 3. ([\matrix-org#16338](matrix-org#16338)) * Bump furo from 2023.8.19 to 2023.9.10. ([\matrix-org#16340](matrix-org#16340)) * Bump gitpython from 3.1.32 to 3.1.35. ([\matrix-org#16267](matrix-org#16267), [\matrix-org#16279](matrix-org#16279)) * Bump mypy-zope from 1.0.0 to 1.0.1. ([\matrix-org#16291](matrix-org#16291)) * Bump pillow from 10.0.0 to 10.0.1. ([\matrix-org#16344](matrix-org#16344)) * Bump regex from 1.9.4 to 1.9.5. ([\matrix-org#16233](matrix-org#16233)) * Bump ruff from 0.0.286 to 0.0.290. ([\matrix-org#16342](matrix-org#16342)) * Bump serde_json from 1.0.105 to 1.0.107. ([\matrix-org#16296](matrix-org#16296), [\matrix-org#16345](matrix-org#16345)) * Bump twisted from 22.10.0 to 23.8.0. ([\matrix-org#16235](matrix-org#16235)) * Bump types-pillow from 10.0.0.2 to 10.0.0.3. ([\matrix-org#16293](matrix-org#16293)) * Bump types-setuptools from 68.0.0.3 to 68.2.0.0. ([\matrix-org#16292](matrix-org#16292)) * Bump typing-extensions from 4.7.1 to 4.8.0. ([\matrix-org#16341](matrix-org#16341)) # -----BEGIN PGP SIGNATURE----- # # iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmUS8iEQHGVyaWtAbWF0 # cml4Lm9yZwAKCRClQuTtGw+sCXFgB/912+T+BydS290UECCXp9kpRB5xo3aWe8mX # NCx9Oor1TRLBpLhlQWk786gP1Q9JAQpmA4z6kovjKaLG1b4oLbZNjbPG4hEYc8ow # /rVzGor52pfyS7uS5GW+rRmapcw4AYND6hA9XGELupf2joC8LXioSCEVG4cxwD8E # IgIbLc87C7KpaUkNbDEz3jzZ3/BVRGcIYyhF3zTK2ZApvH2qsegq8wKYx4EYJnfh # 87DXtTCNwA+bW6XZYPtUwPKjZ+TGB11IizxmQySGLbAxvH+GUan8X8TizGyxaqaA # FDk3yMBbUo0R7ljDgL5YsZXT6qsZz+IBz/bsMzSbZ39f/yEUqHak # =1/pL # -----END PGP SIGNATURE----- # gpg: Signature made Tue Sep 26 16:00:49 2023 BST # gpg: using RSA key 053191DFF4670330465227F7A542E4ED1B0FAC09 # gpg: issuer "erik@matrix.org" # gpg: Can't check signature: No public key # Conflicts: # .github/workflows/docker.yml # .github/workflows/push_complement_image.yml # .github/workflows/release-artifacts.yml # .github/workflows/tests.yml # poetry.lock # synapse/appservice/scheduler.py # synapse/handlers/pagination.py # synapse/handlers/room.py # synapse/rest/client/account_data.py # tests/rest/client/test_receipts.py
This fixes a bug where we could get stuck re-requesting the device over replication again and again.