Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add some miscellaneous comments around sync #13474

Merged
merged 6 commits into from
Aug 10, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13474.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add some miscellaneous comments to document sync, especially around `compute_state_delta`.
48 changes: 42 additions & 6 deletions synapse/handlers/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,17 @@
# limitations under the License.
import itertools
import logging
from typing import TYPE_CHECKING, Any, Dict, FrozenSet, List, Optional, Set, Tuple
from typing import (
TYPE_CHECKING,
Any,
Dict,
FrozenSet,
List,
Optional,
Sequence,
Set,
Tuple,
)

import attr
from prometheus_client import Counter
Expand Down Expand Up @@ -89,7 +99,7 @@ class SyncConfig:
@attr.s(slots=True, frozen=True, auto_attribs=True)
class TimelineBatch:
prev_batch: StreamToken
events: List[EventBase]
events: Sequence[EventBase]
limited: bool
# A mapping of event ID to the bundled aggregations for the above events.
# This is only calculated if limited is true.
Expand Down Expand Up @@ -852,24 +862,35 @@ async def compute_state_delta(
now_token: StreamToken,
full_state: bool,
) -> MutableStateMap[EventBase]:
"""Works out the difference in state between the start of the timeline
and the previous sync.
"""Works out the difference in state between the end of the previous sync and
the start of the timeline.

Args:
room_id:
batch: The timeline batch for the room that will be sent to the user.
sync_config:
since_token: Token of the end of the previous batch. May be None.
since_token: Token of the end of the previous batch. May be `None`.
now_token: Token of the end of the current batch.
full_state: Whether to force returning the full state.
`lazy_load_members` still applies when `full_state` is `True`.

Returns:
The state to return in the sync response for the room.

Clients will overlay this onto the state at the end of the previous sync to
arrive at the state at the start of the timeline.

Clients will then overlay state events in the timeline to arrive at the
state at the end of the timeline, in preparation for the next sync.
"""
# TODO(mjark) Check if the state events were received by the server
# after the previous sync, since we need to include those state
# updates even if they occurred logically before the previous event.
# TODO(mjark) Check for new redactions in the state events.

with Measure(self.clock, "compute_state_delta"):

# The memberships needed for events in the timeline.
# Only calculated when `lazy_load_members` is on.
members_to_fetch = None

lazy_load_members = sync_config.filter_collection.lazy_load_members()
Expand Down Expand Up @@ -897,12 +918,20 @@ async def compute_state_delta(
else:
state_filter = StateFilter.all()

# The contribution to the room state from state events in the timeline.
# Only contains the last event for any given state key.
timeline_state = {
(event.type, event.state_key): event.event_id
for event in batch.events
if event.is_state()
}

# Now calculate the state to return in the sync response for the room.
# This is more or less the change in state between the end of the previous
# sync's timeline and the start of the current sync's timeline.
# See the docstring above for details.
state_ids: StateMap[str]

if full_state:
if batch:
current_state_ids = (
Expand Down Expand Up @@ -1010,6 +1039,13 @@ async def compute_state_delta(
),
)

# At this point, `state_ids` includes the memberships of all event senders
# in the timeline when `lazy_load_members` is enabled. This is because we
# may not have sent the memberships in a previous sync.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# At this point, `state_ids` includes the memberships of all event senders
# in the timeline when `lazy_load_members` is enabled. This is because we
# may not have sent the memberships in a previous sync.
# At this point, if `lazy_load_members` is enabled, `state_ids` includes
# the memberships of all event senders in the timeline. This is because we
# may not have sent the memberships in a previous sync.

I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true for all three sync types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so?

Assuming the docstring for the return value is right, state_ids (which is almost the return value) has to have that property after all three branches, otherwise lazy loading would be broken (this reasoning is a little backwards).


It's a lot more confusing to try to reason forwards:

For an initial sync, state_ids must include the memberships of event senders, otherwise it's not an initial sync?
Less handwavey:
current_state_ids contains the memberships of event senders at the end of the timeline, and state_ids initially contains the memberships of event senders as seen at the start of the timeline. The branch for empty timelines can be ignored because there are no event senders in that case.
_calculate_state returns (timeline end state | timeline start state) - (previous sync state - memberships in timeline start) - state in timeline. The previous sync state is {}, so _calculate_state returns the memberships of event senders from the state at the start of the timeline.

For a gappy sync:
state_at_timeline_start contains the memberships of event senders at the start of the timeline
state_at_previous_sync contains all memberships at the end of the previous sync
current_state_ids contains all memberships at the end of the timeline.
_calculate_state returns (timeline end state | timeline start state) - (previous sync state - memberships in timeline start) - state in timeline.
The previous sync state - memberships in timeline start term excludes the memberships of event senders.
(this branch is a mess)

For an incremental sync, we fetch the memberships of event senders explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat tempted to rename some of these variables...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lot more confusing to try to reason forwards:

ok, yes, that sounds right - thank you for ploughing through that!


# When `include_redundant_members` is on, we send all the lazy-loaded
# memberships of event senders. Otherwise we make an effort to prune the set
# of memberships we send:
squahtx marked this conversation as resolved.
Show resolved Hide resolved
if lazy_load_members and not include_redundant_members:
cache_key = (sync_config.user.to_string(), sync_config.device_id)
cache = self.get_lazy_loaded_members_cache(cache_key)
Expand Down
4 changes: 2 additions & 2 deletions synapse/visibility.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ async def filter_events_for_client(
* the user is not currently a member of the room, and:
* the user has not been a member of the room since the given
events
always_include_ids: set of event ids to specifically
include (unless sender is ignored)
always_include_ids: set of event ids to specifically include, if present
in events (unless sender is ignored)
filter_send_to_client: Whether we're checking an event that's going to be
sent to a client. This might not always be the case since this function can
also be called to check whether a user can see the state at a given point.
Expand Down