Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliding Sync: Speed up background updates to populate Sliding Sync tables #17676

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Sep 6, 2024

Speed up background updates to populate Sliding Sync tables

Follow-up to #17512

  • Read from the room_stats_state table for room_type, name, and is_encrypted
  • For joins, if the state hasn't changed since the last snapshot, use the last membership snapshot we inserted
  • concurrently_execute processing and inserting
  • Use batch/bulk SQL queries like simple_upsert_many_txn where we can

Dev notes

The quick and dirty script that Erik wrote to run on matrix.org, https://github.com/erikjohnston/synapse-fast-bg-update/blob/main/src/main.rs

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

can_use_last_snapshot = False
# Once we find one relevant state event that changed, no need to
# look any further
break
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is potential for a further optimization of using what we can from the last snapshot and fetching what's changed on top.

# they can't be confused.
*,
insertion_value_names: Collection[str] = [],
insertion_value_values: Collection[Iterable[Any]] = [],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding support for insertion only values on upsert.

Same as we already have for simple_upsert(insertion_values={})

@MadLittleMods MadLittleMods marked this pull request as ready for review September 10, 2024 04:54
@MadLittleMods MadLittleMods requested a review from a team as a code owner September 10, 2024 04:54
@MadLittleMods MadLittleMods removed the request for review from a team September 10, 2024 18:18
@MadLittleMods MadLittleMods requested a review from a team September 10, 2024 19:23
…und-updates

Conflicts:
	synapse/storage/databases/main/events_bg_updates.py
	tests/storage/test_sliding_sync_tables.py
@@ -1927,13 +2023,13 @@ def _find_memberships_to_update_txn(
LEFT JOIN rooms AS r ON (c.room_id = r.room_id)
WHERE (c.room_id, c.user_id) > (?, ?)
AND (m.user_id IS NULL OR c.event_id != m.membership_event_id)
ORDER BY c.room_id ASC, c.user_id ASC
ORDER BY c.room_id ASC, e.stream_ordering ASC, c.user_id ASC
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still allow the background update to finish when there are too many members in a room to fit in a single batch?

We keep track of progress in the initial_phase on last_room_id/last_user_id. Given that each stream_ordering position will only correspond to a single membership/user_id, I think this works fine. I think this probably works even if we drop the c.user_id ASC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do this, as this sorting must match the "bounds" we're using (i.e. (c.room_id, c.user_id) > (?, ?)), otherwise we could skup users, etc.

"Current state was updated after we gathered it to update "
+ "`sliding_sync_joined_rooms` in the background update. "
+ "Raising exception so we can just try again."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If current state has changed, won't we have updated the tables with the new information in the persist events path? If so we can just skip this room since we know its now up to date?

One thing we have to be careful of is if a background update fails 5 times in a row we will stop retrying the update.

@@ -1927,13 +2023,13 @@ def _find_memberships_to_update_txn(
LEFT JOIN rooms AS r ON (c.room_id = r.room_id)
WHERE (c.room_id, c.user_id) > (?, ?)
AND (m.user_id IS NULL OR c.event_id != m.membership_event_id)
ORDER BY c.room_id ASC, c.user_id ASC
ORDER BY c.room_id ASC, e.stream_ordering ASC, c.user_id ASC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do this, as this sorting must match the "bounds" we're using (i.e. (c.room_id, c.user_id) > (?, ?)), otherwise we could skup users, etc.

to_token=RoomStreamToken(
stream=membership_event_stream_ordering,
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could a large range of stuff. I think instead of looking at the state deltas, instead we can just look at the topological ordering of the info in room_stats_state and compare that to the membership info. i.e. if the room name was set at topological_ordering 50 then we can assume that any membership events with a higher topological ordering can reuse that room name.

I'm thinking of something like: "take max of topological ordering across the current state events for the state we care about, then compare against each topological ordering". We don't need this logic to be 100% accurate, as a) if we pull out the state groups for more events than strictly necessary that's fine, and b) there is a small edge case where this picks up a room name that had been state reset (but I think we can live with that). However, this will be a lot faster than trying to pull out loads of state deltas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually wondering if we should just skip calculating these values for joins, and have the "joined rooms" background update fill these values out when it gets to them? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants