Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add developer documentation to explain room DAG concepts like outliers and state_groups #10464

Merged
merged 15 commits into from
Aug 3, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/10464.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add developer FAQ to explain `outliers` and `state_groups`.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@
- [Code Style](code_style.md)
- [Git Usage](dev/git.md)
- [Testing]()
- [Developer FAQ](development/faq.md)
- [OpenTracing](opentracing.md)
- [Database Schemas](development/database_schema.md)
- [Synapse Architecture]()
Expand Down
17 changes: 17 additions & 0 deletions docs/development/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Developer FAQ
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

## What is an `outlier`?

An `outlier` is an arbitrary floating event in the DAG (as opposed to being
inline with the current DAG). It also means that we don't have the state events
backfilled on the homeserver and we trust the events *claimed* auth events rather
than those we calculate and verify to be correct.

An event can be unmarked as an `outlier` once we fetch all of its `prev_events` (you will see some `ex_outlier` code around this).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's slightly different. The thing that makes an outlier an outlier is that we haven't figured out the state for the room at that point in the DAG.

We won't necessarily have its prev_events in the database, but it's entirely possible that we will. Conversely, it's possible for us not to have all the prev_events for non-outlier event (in which case, that event would be a "backward extremity").

The process of transforming an outlier to a non-outlier involves figuring out the room state at that point in the DAG - either by having the state at all the prev_events and resolving the state between them, or simply by asking another server.

auth events are a whole extra level of complexity, but even for an outlier we should have all the events in the auth chain.

Normally, our calculated auth_events based on the state of the room
at the event's position in the DAG, though occasionally (eg if the
event is an outlier), may be the auth events claimed by the remote
server.
is quite a specific bit of code relating to one part of processing incoming events (and to be honest, I'm not convinced it does the right thing). I would leave auth events out of this explanation, tbh.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all info @richvdh 😀

These seem so closely coupled and aren't mutually exclusive in real life situations so it's hard for me to make a clear description 🤯

I've updated with something more inline with your description but feel free to rewrite.

I've mainly been working with floating outliers so I don't think I had the complete picture.


I feel like I don't understand where an outlier becomes an ex_outlier.

Say we fetch a missing auth event for an event, we mark that auth event as an outlier. Then at what point later, do we get the event again as a non-outlier? From your statement, it's not when we have fetched all of the prev_events as those could already be all persisted in the db.

There is _update_outliers_txn which handles ex_outlier stuff but it's confusing where/when we will come across the same outlier event again while persisting.

Based on the DAG below, as a remote federating server, my assumption is that we process event D which fetches the state_event as an outlier. Then as we backfill more, and process state_event directly, it becomes an ex_outlier.

Mermaid live editor playground link

I'm still a bit wary on a situation where we could have all of the prev_events for the state_event but still mark it as an outlier. Perhaps with gaps in the DAG and the event hits it just right 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mainly been working with floating outliers so I don't think I had the complete picture.

I think I might have misled you here. From Synapse's point of view, all outliers are considered to be floating, and in the majority of cases we indeed won't have their prev events. However, it's possible for the stars to align in a way that we do actually have their prev events in the database - at least as outliers themselves.

To take a simple example:

Two of the auth events for a join event in an invite-only room are the join_rules event and an invite event for the joiner. So let's suppose we fetch both of those events as outliers. Let's also suppose that the invite was issued immediately after changing the join rules - so the join_rules is the only prev-event of the invite event.

join_rules <-  invite
   ^             ^
   ............................. join

In that case, it so happens that we have all the prev-events of the invite event - but we're still considering it an outlier.

So I don't want you to imagine there is a distinction between "floating outliers" and "regular outliers" - they are all just outliers in terms of how we handle them in Synapse.

Say we fetch a missing auth event for an event, we mark that auth event as an outlier. Then at what point later, do we get the event again as a non-outlier? From your statement, it's not when we have fetched all of the prev_events as those could already be all persisted in the db.

right. It becomes a non-outlier in the situation where we process it as a regular event as part of the timeline - normally via backfill.

Based on the DAG below, as a remote federating server, my assumption is that we process event D which fetches the state_event as an outlier. Then as we backfill more, and process state_event directly, it becomes an ex_outlier.

exactly so, yes.

I'm still a bit wary on a situation where we could have all of the prev_events for the state_event but still mark it as an outlier. Perhaps with gaps in the DAG and the event hits it just right 🤷

yup. Imagine there is another fork in the DAG pointing to A:

When we receive Y, we'll backfill A. So then we have state_event's prev_events - but state_event is still an outlier at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification here. I've added a paragraph explaining that there is no distinction.

I like these examples and edge cases. I'm tempted to add them into the docs but I think I'll save that for another iteration.


MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

## What is a `state_group`?

For every non-outlier event we need to know the state at that event. Instead of storing the full state for each event in the DB (i.e. a `event_id -> state` mapping), which is *very* space inefficient when state doesn't change, we instead assign each different set of state a "state group" and then have mappings of `event_id -> state_group` and `state_group -> state`.

MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved