-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix serialization errors when rotating notifications #13118
Conversation
a3c49e8
to
5583dca
Compare
5583dca
to
e8c71de
Compare
@@ -150,7 +175,7 @@ def _mark_read(stream: int, depth: int) -> None: | |||
|
|||
_assert_counts(1, 0) | |||
|
|||
_mark_read(7, 7) | |||
_mark_read(6, 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a change in behavior then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is just making sure we're inject a read receipt at an actual event (which we have to), rather than at some arbitrary place.
(Side note: we need to rewrite this test to not poke things directly into the DB, and instead create real rooms, events and receipts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I don't understand wtf is going on here. Happy to hop on a call to talk about it, but also I'd generally like to see much clearer comments and docstrings about what we're doing and why.
Major concern: if we're now only processing read receipts every 30 seconds, does that mean it takes 30 seconds for the unread indicator to disappear?
@@ -800,6 +811,18 @@ async def _rotate_notifs(self) -> None: | |||
self._doing_notif_rotation = True | |||
|
|||
try: | |||
# First we handle any new receipts that have happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "handle" mean here? have happened since when?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded
-- are up to date after a new read receipt has been sent. Null means that we can | ||
-- skip that check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does null mean that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have expanded
# We always update `event_push_summary_last_receipt_stream_id` to | ||
# ensure that we don't rescan the same receipts for remote users. | ||
# | ||
# This requires repeatable read to be safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to explain why it needs repeatable read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added some words.
# If `last_receipt_stream_ordering` is null then that means its up to | ||
# date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does null mean "up to date" ?
No, it means that for those 30s the
|
ok, makes sense. it would be great to include that sort of explanation in the comments in the code! |
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
49def3b
to
12efcd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit clearer now. I'm still not entirely grokking it though. At some point I think it would be good to write down how event_push_summary
actually works, but that is probably a different PR.
# that. In the latter case we simply ignore `event_push_summary` counts | ||
# and do a manual count of rows in `event_push_actions` table below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still don't get why this is an ok thing to do. Haven't we deleted rows from event_push_actions
, so relying on it (rather than including event_push_summary
) will give numbers that are too low?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We delete rows at the same time as we update the event_push_summary
. After a user sends a receipt in the room there are two cases:
- The receipt has been processed in the
_handle_new_receipts_for_notifs_txn
background task, and so we have deleted the push actions but updated theevent_push_summary
. - The receipt has not been processed, so
event_push_summary
is stale but we haven't yet deleted the push actions so we can correctly calculate the count fromevent_push_actions
.
Maybe:
and do a manual count of rows in
event_push_actions
table below.
This gives the correct result as we won't have deleted the push actions yet.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, but suppose:
- a user has 5 unread messages in a room (say E1...E5)
- A RR arrives for E2. We process it, update
event_push_summary
to say there are 3 remaining unread messages, and delete all the entries fromevent_push_actions
. - A RR arrives for E4, but before we process it fully, we do this query. We ignore
event_push_summary
and there is nothing left inevent_push_actions
. So we return 0 unread messages instead of 1?
Or am I fundamentally misunderstanding how this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We delete all entries from event_push_actions
from before E2, so there will still be the three push actions for the three unread messages in there. So in the final step we'd still have push actions E3, E4 and E5 in the event_push_actions
table
# We always update `event_push_summary_last_receipt_stream_id` to | ||
# ensure that we don't rescan the same receipts for remote users. | ||
# | ||
# This requires repeatable read to be safe, as we need the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be quite nice to do an assertion that we have the right isolation level, but that looks a bit fiddly and probably a thing for another time.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense.
# we haven't yet updated the counts in `event_push_summary` to reflect | ||
# that; in that case we simply ignore `event_push_summary` counts | ||
# and do a manual count of all of the rows in the `event_push_actions` table | ||
# for this user/room. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some edge case where this will go wrong if the previous RR was over 24 hours ago (so we've started removing things from event_push_actions
)? Or do we not really care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we don't care. The previous behaviour is that if you send a read receipt that is over 24 hours old then the updated counts will only include unread messages less than 24 hours old. I think this keeps that same behaviour.
Synapse 1.62.0 (2022-07-05) =========================== No significant changes since 1.62.0rc3. Authors of spam-checker plugins should consult the [upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.62/docs/upgrade.md#upgrading-to-v1620) to learn about the enriched signatures for spam checker callbacks, which are supported with this release of Synapse. Synapse 1.62.0rc3 (2022-07-04) ============================== Bugfixes -------- - Update the version of the [ldap3 plugin](https://github.com/matrix-org/matrix-synapse-ldap3/) included in the `matrixdotorg/synapse` DockerHub images and the Debian packages hosted on `packages.matrix.org` to 0.2.1. This fixes [a bug](matrix-org/matrix-synapse-ldap3#163) with usernames containing uppercase characters. ([\matrix-org#13156](matrix-org#13156)) - Fix a bug introduced in Synapse 1.62.0rc1 affecting unread counts for users on small servers. ([\matrix-org#13168](matrix-org#13168)) Synapse 1.62.0rc2 (2022-07-01) ============================== Bugfixes -------- - Fix unread counts for users on large servers. Introduced in v1.62.0rc1. ([\matrix-org#13140](matrix-org#13140)) - Fix DB performance when deleting old push notifications. Introduced in v1.62.0rc1. ([\matrix-org#13141](matrix-org#13141)) Synapse 1.62.0rc1 (2022-06-28) ============================== Features -------- - Port the spam-checker API callbacks to a new, richer API. This is part of an ongoing change to let spam-checker modules inform users of the reason their event or operation is rejected. ([\matrix-org#12857](matrix-org#12857), [\matrix-org#13047](matrix-org#13047)) - Allow server admins to customise the response of the `/.well-known/matrix/client` endpoint. ([\matrix-org#13035](matrix-org#13035)) - Add metrics measuring the CPU and DB time spent in state resolution. ([\matrix-org#13036](matrix-org#13036)) - Speed up fetching of device list changes in `/sync` and `/keys/changes`. ([\matrix-org#13045](matrix-org#13045), [\matrix-org#13098](matrix-org#13098)) - Improve URL previews for sites which only provide Twitter Card metadata, e.g. LWN.net. ([\matrix-org#13056](matrix-org#13056)) Bugfixes -------- - Update [MSC3786](matrix-org/matrix-spec-proposals#3786) implementation to check `state_key`. ([\matrix-org#12939](matrix-org#12939)) - Fix a bug introduced in Synapse 1.58 where Synapse would not report full version information when installed from a git checkout. This is a best-effort affair and not guaranteed to be stable. ([\matrix-org#12973](matrix-org#12973)) - Fix a bug introduced in Synapse 1.60 where Synapse would fail to start if the `sqlite3` module was not available. ([\matrix-org#12979](matrix-org#12979)) - Fix a bug where non-standard information was required when requesting the `/hierarchy` API over federation. Introduced in Synapse v1.41.0. ([\matrix-org#12991](matrix-org#12991)) - Fix a long-standing bug which meant that rate limiting was not restrictive enough in some cases. ([\matrix-org#13018](matrix-org#13018)) - Fix a bug introduced in Synapse 1.58 where profile requests for a malformed user ID would ccause an internal error. Synapse now returns 400 Bad Request in this situation. ([\matrix-org#13041](matrix-org#13041)) - Fix some inconsistencies in the event authentication code. ([\matrix-org#13087](matrix-org#13087), [\matrix-org#13088](matrix-org#13088)) - Fix a long-standing bug where room directory requests would cause an internal server error if given a malformed room alias. ([\matrix-org#13106](matrix-org#13106)) Improved Documentation ---------------------- - Add documentation for how to configure Synapse with Workers using Docker Compose. Includes example worker config and docker-compose.yaml. Contributed by @Thumbscrew. ([\matrix-org#12737](matrix-org#12737)) - Ensure the [Poetry cheat sheet](https://matrix-org.github.io/synapse/develop/development/dependencies.html) is available in the online documentation. ([\matrix-org#13022](matrix-org#13022)) - Mention removed community/group worker endpoints in upgrade.md. Contributed by @olmari. ([\matrix-org#13023](matrix-org#13023)) - Add instructions for running Complement with `gotestfmt`-formatted output locally. ([\matrix-org#13073](matrix-org#13073)) - Update OpenTracing docs to reference the configuration manual rather than the configuration file. ([\matrix-org#13076](matrix-org#13076)) - Update information on downstream Debian packages. ([\matrix-org#13095](matrix-org#13095)) - Remove documentation for the Delete Group Admin API which no longer exists. ([\matrix-org#13112](matrix-org#13112)) Deprecations and Removals ------------------------- - Remove the unspecced `DELETE /directory/list/room/{roomId}` endpoint, which hid rooms from the [public room directory](https://spec.matrix.org/v1.3/client-server-api/#listing-rooms). Instead, `PUT` to the same URL with a visibility of `"private"`. ([\matrix-org#13123](matrix-org#13123)) Internal Changes ---------------- - Add tests for cancellation of `GET /rooms/$room_id/members` and `GET /rooms/$room_id/state` requests. ([\matrix-org#12674](matrix-org#12674)) - Report login failures due to unknown third party identifiers in the same way as failures due to invalid passwords. This prevents an attacker from using the error response to determine if the identifier exists. Contributed by Daniel Aloni. ([\matrix-org#12738](matrix-org#12738)) - Merge the Complement testing Docker images into a single, multi-purpose image. ([\matrix-org#12881](matrix-org#12881), [\matrix-org#13075](matrix-org#13075)) - Simplify the database schema for `event_edges`. ([\matrix-org#12893](matrix-org#12893)) - Clean up the test code for client disconnection. ([\matrix-org#12929](matrix-org#12929)) - Remove code generating comments in configuration. ([\matrix-org#12941](matrix-org#12941)) - Add `Cross-Origin-Resource-Policy: cross-origin` header to content repository's thumbnail and download endpoints. ([\matrix-org#12944](matrix-org#12944)) - Replace noop background updates with `DELETE` delta. ([\matrix-org#12954](matrix-org#12954), [\matrix-org#13050](matrix-org#13050)) - Use lower isolation level when inserting read receipts to avoid serialization errors. Contributed by Nick @ Beeper. ([\matrix-org#12957](matrix-org#12957)) - Reduce the amount of state we pull from the DB. ([\matrix-org#12963](matrix-org#12963)) - Enable testing against PostgreSQL databases in Complement CI. ([\matrix-org#12965](matrix-org#12965), [\matrix-org#13034](matrix-org#13034)) - Fix an inaccurate comment. ([\matrix-org#12969](matrix-org#12969)) - Remove the `delete_device` method and always call `delete_devices`. ([\matrix-org#12970](matrix-org#12970)) - Use a GitHub form for issues rather than a hard-to-read, easy-to-ignore template. ([\matrix-org#12982](matrix-org#12982)) - Move [MSC3715](matrix-org/matrix-spec-proposals#3715) behind an experimental config flag. ([\matrix-org#12984](matrix-org#12984)) - Add type hints to tests. ([\matrix-org#12985](matrix-org#12985), [\matrix-org#13099](matrix-org#13099)) - Refactor macaroon tokens generation and move the unsubscribe link in notification emails to `/_synapse/client/unsubscribe`. ([\matrix-org#12986](matrix-org#12986)) - Fix documentation for running complement tests. ([\matrix-org#12990](matrix-org#12990)) - Faster joins: add issue links to the TODO comments in the code. ([\matrix-org#13004](matrix-org#13004)) - Reduce DB usage of `/sync` when a large number of unread messages have recently been sent in a room. ([\matrix-org#13005](matrix-org#13005), [\matrix-org#13096](matrix-org#13096), [\matrix-org#13118](matrix-org#13118)) - Replaced usage of PyJWT with methods from Authlib in `org.matrix.login.jwt`. Contributed by Hannes Lerchl. ([\matrix-org#13011](matrix-org#13011)) - Modernize the `contrib/graph/` scripts. ([\matrix-org#13013](matrix-org#13013)) - Remove redundant `room_version` parameters from event auth functions. ([\matrix-org#13017](matrix-org#13017)) - Decouple `synapse.api.auth_blocking.AuthBlocking` from `synapse.api.auth.Auth`. ([\matrix-org#13021](matrix-org#13021)) - Add type annotations to `synapse.storage.databases.main.devices`. ([\matrix-org#13025](matrix-org#13025)) - Set default `sync_response_cache_duration` to two minutes. ([\matrix-org#13042](matrix-org#13042)) - Rename CI test runs. ([\matrix-org#13046](matrix-org#13046)) - Increase timeout of complement CI test runs. ([\matrix-org#13048](matrix-org#13048)) - Refactor entry points so that they all have a `main` function. ([\matrix-org#13052](matrix-org#13052)) - Refactor the Dockerfile-workers configuration script to use Jinja2 templates in Synapse workers' Supervisord blocks. ([\matrix-org#13054](matrix-org#13054)) - Add headers to individual options in config documentation to allow for linking. ([\matrix-org#13055](matrix-org#13055)) - Make Complement CI logs easier to read. ([\matrix-org#13057](matrix-org#13057), [\matrix-org#13058](matrix-org#13058), [\matrix-org#13069](matrix-org#13069)) - Don't instantiate modules with keyword arguments. ([\matrix-org#13060](matrix-org#13060)) - Fix type checking errors against Twisted trunk. ([\matrix-org#13061](matrix-org#13061)) - Allow MSC3030 `timestamp_to_event` calls from anyone on world-readable rooms. ([\matrix-org#13062](matrix-org#13062)) - Add a CI job to check that schema deltas are in the correct folder. ([\matrix-org#13063](matrix-org#13063)) - Avoid rechecking event auth rules which are independent of room state. ([\matrix-org#13065](matrix-org#13065)) - Reduce the duplication of code that invokes the rate limiter. ([\matrix-org#13070](matrix-org#13070)) - Add a Subject Alternative Name to the certificate generated for Complement tests. ([\matrix-org#13071](matrix-org#13071)) - Add more tests for room upgrades. ([\matrix-org#13074](matrix-org#13074)) - Pin dependencies maintained by matrix.org to [semantic version](https://semver.org/) bounds. ([\matrix-org#13082](matrix-org#13082)) - Correctly report prometheus DB stats for `get_earliest_token_for_stats`. ([\matrix-org#13085](matrix-org#13085)) - Fix a long-standing bug where a finished logging context would be re-started when Synapse failed to persist an event from federation. ([\matrix-org#13089](matrix-org#13089)) - Simplify the alias deletion logic as an application service. ([\matrix-org#13093](matrix-org#13093)) - Add type annotations to `tests.test_server`. ([\matrix-org#13124](matrix-org#13124))
Fixes #13117
The idea is that we include the last read receipt in
event_push_summary
when calculating push, and only use the counts if it matches the current read receipt. This allows us to move deleting of push actions out of sending a receipt and into a background task.