Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Inconsistent timeline - server was offline for a few hours #11499

Closed
chagai95 opened this issue Dec 3, 2021 · 8 comments
Closed

Inconsistent timeline - server was offline for a few hours #11499

chagai95 opened this issue Dec 3, 2021 · 8 comments

Comments

@chagai95
Copy link
Contributor

chagai95 commented Dec 3, 2021

Description

Timeline is inconsistent. I showcased here:
element-hq/element-web#20024

Steps to reproduce

  • Server was offline for a few hours

Version information

  • Homeserver:

2 servers involved:
mx.chagai.website:
{"server_version":"1.43.0","python_version":"3.8.12"}
matrix-client.matrix.org:
{"server_version":"1.47.1 (b=matrix-org-hotfixes,10e34aaa6)","python_version":"3.7.3"}

  • Version:

  • Install method:

Ansible

  • Platform:

docker systemd centOS 7

@reivilibre
Copy link
Contributor

reivilibre commented Dec 9, 2021

So as far as I know, you're right in saying this is Synapse's 'fault' :)

With that said, I can't remember if this is actually a desirable feature for some reason.

(One obvious note is that it makes it hard to miss the messages — though it does seem like it can be confusing if messages arrive out of context.)

Will find out if there's anything we should be doing about this. It may well be something that the spec should grow to be able to tell clients that an event is out of context....? Just a thought.

@reivilibre
Copy link
Contributor

Hi again,

Some more context is that the decision of how events should be ordered has some trade-offs. (Trying to flatten the history of pretty much any distributed system into a straight line is challenging, because some events may not have any causal ordering between them. That's probably why Matrix is built on a DAG, I would guess).

On one hand, ordering the messages by time of receipt (basically how it's done now) is simple and means that users get a chance to notice messages which could otherwise get missed by being slotted into the history (which may be well off-screen by that point).

The spec might have some more to say about why events should be ordered by time of receipt on your homeserver, but thinking about it makes one wonder what the alternatives could be and what trade-offs would have to be made if using them?

As an example, sorting by timestamp is troublesome because different servers will have different clocks for generating time (the clock drift problem).
Even doing something more complicated might be prone to issues like malicious homeservers trying to insert new events at an older point in the timeline. At least the current approach is good for ensuring that you know the order of receipt on your own homeserver (loosely: the only one that you can trust!), because it's the same as the order of display.
(I can't off the top of my head think of any schemes that are perfect, and I doubt they exist.)

In one of your issues you make a point about eventual consistency; it's worth bearing in mind that the DAG is still eventually consistent, as is the state at each of the events in the DAG.

I will note for some degree of completeness that there is an unspecced endpoint for clients to get a more 'raw' view of the events and the DAG, which may allow an interested client to define their own ordering. If this turned out to be useful, it could perhaps even be specced, or perhaps some other way of communicating the information needed to clients could be specced in order to soften the flaws in the current ordering (mostly this problem of messages arriving out of context).

@chagai95
Copy link
Contributor Author

Hey, thanks for the extensive answer, didn't understand all the detail but I still appreciate the detail. How about flagging the messages as messages that arrived when the server was offline? Then you'd at least notice that something went wrong?

@reivilibre
Copy link
Contributor

Hey, thanks for the extensive answer, didn't understand all the detail but I still appreciate the detail. How about flagging the messages as messages that arrived when the server was offline? Then you'd at least notice that something went wrong?

That sounds like a sensible idea to me.
If you look at your timeline screenshot, you notice that the timestamps are out of order; maybe clients can already do something to make it a bit more obvious that the timeline isn't in order.

Unsure if there's enough information or not for the clients to mark them reliably and consistently — would need a client dev's perspective on whether we ought to think about a clearer way to tell clients about this case (which would need protocol changes/specification work).

@callahad
Copy link
Contributor

I'm going to go ahead and close this since it's not a bug in Synapse per se, but a subjective decision made in grappling with how to present changes in the event graph to clients.

If we were to consider doing anything differently, I'd want that to arise in response to a specific push from client developers.

@chagai95
Copy link
Contributor Author

The timeline content & order is provided by the server. During server outages, old messages may come down /sync as new ones and the client has no way to tell them apart.

element-hq/element-web#20024 (comment)

@t3chguy @reivilibre not sure where you prefer to discuss this but it seems like the server devs think the client devs might have enough info and the client devs think they don't.

@callahad
Copy link
Contributor

We should also check whether this behavior is defined by the Matrix Spec, per @reivilibre 's comment:

The spec might have some more to say about why events should be ordered by time of receipt on your homeserver

@callahad
Copy link
Contributor

When discussing /sync, the spec does state: "Events are ordered in this API according to the arrival time of the event on the homeserver." (cite)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants