Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Faster joins: Refactor handling of servers in room #14954

Merged
merged 4 commits into from
Feb 3, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/14954.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Faster room joins: Refactor internal handling of servers in room to never store an empty list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fixing a bug? A potential bug? Or just clean-up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhere in between fixing a potential bug and a clean up. Probably closer to a clean up.

28 changes: 18 additions & 10 deletions synapse/federation/federation_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,8 @@ class SendJoinResult:
# True if 'state' elides non-critical membership events
partial_state: bool

# if 'partial_state' is set, a list of the servers in the room (otherwise empty)
# If 'partial_state' is set, a list of the servers in the room (otherwise empty).
# Always contains the server we joined off.
servers_in_room: List[str]


Expand Down Expand Up @@ -1152,23 +1153,30 @@ async def _execute(pdu: EventBase) -> None:
% (auth_chain_create_events,)
)

if response.members_omitted and not response.servers_in_room:
raise InvalidResponseError(
"members_omitted was set, but no servers were listed in the room"
)
servers_in_room = response.servers_in_room
if response.members_omitted:
if not servers_in_room:
raise InvalidResponseError(
"members_omitted was set, but no servers were listed in the room"
)

if response.members_omitted and not partial_state:
raise InvalidResponseError(
"members_omitted was set, but we asked for full state"
)
if destination not in servers_in_room:
# `servers_in_room` is supposed to be a complete list.
# Fix things up if the remote homeserver is badly behaved.
servers_in_room = [destination] + servers_in_room
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be making this a set? What if the remote server returns the same thing for servers_in_room 100 times?

Is it possible for destination to no longer be in the room at this point? (I'm guessing no because we just got a response from it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. We'll fail to insert the list into the database, because there's a unique constraint. And the join will fail.

There's nothing stopping destination from leaving the room immediately after our join really.
If destination leaves the room, we'll try syncing state from other servers in the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Representing the server list as a set sounds like a good idea, so I did it in 163c68c.


if not partial_state:
raise InvalidResponseError(
"members_omitted was set, but we asked for full state"
)
Comment on lines +1167 to +1170
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend moving this above the destination check to have all the error checking up-front.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 163c68c.


return SendJoinResult(
event=event,
state=signed_state,
auth_chain=signed_auth,
origin=destination,
partial_state=response.members_omitted,
servers_in_room=response.servers_in_room or [],
servers_in_room=servers_in_room or [],
)

# MSC3083 defines additional error codes for room joins.
Expand Down
2 changes: 1 addition & 1 deletion synapse/federation/sender/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,7 @@ async def handle_event(event: EventBase) -> None:
)
)

if len(partial_state_destinations) > 0:
if partial_state_destinations is not None:
Copy link
Contributor Author

@squahtx squahtx Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought there was a bug here, where we would block on the full state of the room (below) when the server we joined off returned an empty list of servers in the room. But while writing this PR to fix the bug, I discovered that we validate that the list is truthy and so the bug can't happen.

destinations = partial_state_destinations

if destinations is None:
Expand Down
1 change: 1 addition & 0 deletions synapse/handlers/device.py
Original file line number Diff line number Diff line change
Expand Up @@ -859,6 +859,7 @@ async def handle_room_un_partial_stated(self, room_id: str) -> None:
known_hosts_at_join = await self.store.get_partial_state_servers_at_join(
room_id
)
assert known_hosts_at_join is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error? Usually assertions are used for programming errors, but this seem to be input validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert can't (shouldn't?) ever be hit unless we've introduced a bug. This method is only run when we're about to finish syncing state. At that point the room is still partial stated, which implies we have a list of servers in it.

potentially_changed_hosts.difference_update(known_hosts_at_join)

potentially_changed_hosts.discard(self.server_name)
Expand Down
3 changes: 2 additions & 1 deletion synapse/storage/controllers/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -569,10 +569,11 @@ async def get_current_hosts_in_room_or_partial_state_approximation(
is arbitrary for rooms with partial state.
"""
# We have to read this list first to mitigate races with un-partial stating.
# This will be empty for rooms with full state.
hosts_at_join = await self.stores.main.get_partial_state_servers_at_join(
room_id
)
if hosts_at_join is None:
hosts_at_join = ()

hosts_from_state = await self.stores.main.get_current_hosts_in_room(room_id)

Expand Down
24 changes: 18 additions & 6 deletions synapse/storage/databases/main/room.py
Original file line number Diff line number Diff line change
Expand Up @@ -1192,16 +1192,26 @@ def get_rooms_for_retention_period_in_range_txn(
get_rooms_for_retention_period_in_range_txn,
)

@cached(iterable=True)
async def get_partial_state_servers_at_join(self, room_id: str) -> Sequence[str]:
async def get_partial_state_servers_at_join(
self, room_id: str
) -> Optional[Sequence[str]]:
"""Gets the list of servers in a partial state room at the time we joined it.

Returns:
The `servers_in_room` list from the `/send_join` response for partial state
rooms. May not be accurate or complete, as it comes from a remote
homeserver.
An empty list for full state rooms.
`None` for full state rooms.
"""
servers_in_room = await self._get_partial_state_servers_at_join(room_id)

if len(servers_in_room) == 0:
return None

return servers_in_room

@cached(iterable=True)
async def _get_partial_state_servers_at_join(self, room_id: str) -> Sequence[str]:
return await self.db_pool.simple_select_onecol(
"partial_state_rooms_servers",
keyvalues={"room_id": room_id},
Expand Down Expand Up @@ -1956,11 +1966,13 @@ async def store_partial_state_room(

Args:
room_id: the ID of the room
servers: other servers known to be in the room
servers: other servers known to be in the room. must include `joined_via`.
device_lists_stream_id: the device_lists stream ID at the time when we first
joined the room.
joined_via: the server name we requested a partial join from.
"""
assert joined_via in servers

await self.db_pool.runInteraction(
"store_partial_state_room",
self._store_partial_state_room_txn,
Expand Down Expand Up @@ -1997,7 +2009,7 @@ def _store_partial_state_room_txn(
)
self._invalidate_cache_and_stream(txn, self.is_partial_state_room, (room_id,))
self._invalidate_cache_and_stream(
txn, self.get_partial_state_servers_at_join, (room_id,)
txn, self._get_partial_state_servers_at_join, (room_id,)
)

async def write_partial_state_rooms_join_event_id(
Expand Down Expand Up @@ -2408,7 +2420,7 @@ def _clear_partial_state_room_txn(
)
self._invalidate_cache_and_stream(txn, self.is_partial_state_room, (room_id,))
self._invalidate_cache_and_stream(
txn, self.get_partial_state_servers_at_join, (room_id,)
txn, self._get_partial_state_servers_at_join, (room_id,)
)

DatabasePool.simple_insert_txn(
Expand Down