-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
An event was sent twice down /sync
(and down application service txns?) consistently across clients; ghost notifications
#14539
Comments
Thank you for your thorough report. On ghost notifications: it's difficult to say what could be the cause without timestamps or logs to start digging into. I think if someone edits a message you are named or On event duplication: it's not immediately obvious to me that something has gone wrong in image 1. What makes you say that the two messages from 10th October 2022 are duplicated, rather than appearing out of order? Are you seeing the same message/state event take place with two different event IDs? It's possible that
I also note that Element-Web and the mobile client (Element Android?) show a consistent timeline---which suggests that Synapse is presenting the same timeline via /sync to the two clients. I don't fully follow what's happened in image 2 either. Are
That's very unfortunate. Please reach out to us if you end up in that situation; it's generally very unsafe to manually operate on Synapse's database (I hope you found all the tables that reference a room ID). |
Before I get into detail, I want to thank you for your time and effort. I also wanted to mention that, I think, the issues "duplicates" and "ghost notifications" are somehow related and have the same origin. This is why I only created one issue.
That was my first thought, too. But with the option to show hidden events in timeline and developer mode enabled, I should see a
I should have done that, right. I'll remember it for the next time.
"Duplicated" is probably the wrong word, as those events have the same event ids as those shown previously in the timeline. I used the term "duplicated" because they were shown multiple times at different locations in the timeline, at the same time.
Yes, this is correct. bekon232 and Szwendacz are on the same homeserver, but not on mine. I don't think it should make a difference, but it is maybe worth mentioning, there are > 10k other users in the room from 524 different servers. The room is not bridged. As far as I know, these are the only messages bekon232 and Szwendacz created in any room, anyone of my users is in. "As far as I know" because I have retention enabled in the
I haven't checked the logs specifically for their homeserver, but I do regular updates, and I have updated and restarted synapse during that process.
I haven't looked into the details of federation on matrix yet, but I can say, the first occurrence of those messages in the timeline were fairly close to the
Yes, the timeline seemed to be consistent between element-web, element-desktop and element-android (yes, the mobile client was element-android) and hydrogen. Even after clearing the cache on element-web and the mobile client, I could still see those messages in the timeline, at least in the case image 1 was taken. But I have also seen that those messages vanish after clearing the cache.
I should have mentioned that. Michael (me) and Administrator (mjolnir, moderation bot) and the person who made the screenshot (also me, from the "Michael Account") are on the same homeserver. They are in another room with 6 more users, some from other homeservers. Sunday, which was one or two days before the screenshot was made, Michael sent a message at 13:24 to the room: "mjolnir room remove [...]" and Administrator answered with the message: "⚠ | Failed to leave [...]". Almost fifty minutes later, I restarted the Administrator (mjolnir) and it sent four additional messages to the room at 14:09.
I think so. Everything is up and running again, and I could join the room again afterwards. Besides some missing |
/sync
(and down application service txns?) consistently across clients; ghost notifications
That's an amber flag: retention is known to be buggy (e.g. element-hq/element-web#13476). Are you still seeing the duplicate event pattern today (even if sporadically?) If so I'd suggest disabling retention to see if that stops the problem.
That is extremely surprising (and somewhat alarming). Are you able to share the room ID so we can cross-reference with matrix.org's logs and database? I can be reached privately at
It has a |
I've noticed minor issues pretty quickly after I turned it on a year ago or so.
I've thought about that, too. However, I first wanted to try out first whether the Purge History API could fix/mitigate the problem, like it usually does with retention related issues, which it actually did. Since I purged the room history down to the second last event, I haven't noticed any of the issues, I described.
Sure. The room in image 1 is About the room in image 2: While going through the room history on
Oh, right, it has... This is embarrassing now 😄. Either it was added after I've added the functionality to my CLI tool, or I've missed it twice in a row. Thanks for the hint! |
I've been following this thread since it was opened. Both of my servers are experiencing the same issue. I'm happy to share log data as needed. I do have retention enabled on both servers with server-specified policy and room policies.
Will continue to be a fly on the wall while y'all work through this thread. |
Related to #11447 |
I've handed in another After sending the
I think the console logs are in the zip file. If not, I've saved them before reloading the client. Room: |
This problem seems to be getting worse as time goes by - I opened up a few rooms hosted on a servers affected by this issue (using the Element web app to open the rooms), and proceeded to get hit with approximately 30 notifications per chat message that I was receiving notifications for. And it was sending me notifications for quite a few chat messages even those these messages had been sent quite a bit ago, resulting in probably in excess of 200 notifications having to be closed. (And for some reason Chrome made me close every single notification manually or wait for them to disappear in small batches by themselves, but that's a Chrome problem or a problem with my setup.) |
Might be related to element-hq/element-web#897 and element-hq/element-meta#1491. |
This has gotten bad lately. I'm getting this very frequently when I'm mentioned (at least, in one room in particular, not sure if it's happening in all rooms yet) |
Description
Ghost notifications
Since synapse v1.70 or v1.71, I've noticed randomly occurring "ghost notification", happening on element-web, element-desktop and element-android, hydrogen. They usually come in small, fast bursts of two to four notifications (I hear an audible sound telling me about new events in a room). When I check for new events, there aren't any. Nothing has changed. I have the options
Developer mode
andShow hidden events in timeline
enabled in element, so I would assume, I'd see any changes to a room.Event duplication
Around the same Time, I've noticed some events, some from weeks or months ago, from being randomly duplicated and pushed as new ones. This happens independently of the "ghost notification issue". At first, I've only noticed this in one particular room, where the same messages were duplicated a few times over a couple of days (Example: see "Image 1" below), possibly leading to an unstable room (see "Possible implications). Today, the same thing happened in our mjolnir control room, where mjolnir (configured as a client and not as an appservice) executed the duplicated command (Example: see "Image 2" below). Actually, I'm actually not certain, if my homeserver is the one, the issues originate from, as the duplicated events are visible on other instances too, including
matrix.org
and maybe others as well.Image 1: Duplicated messages in the python room
Image 2: Mjolnir bot reacts to duplicated events
The "failed to leave" error from mjolnir is unrelated (and already fixed). The duplication errors happened before the room, it was trying to leave, had issues, too, and might have led to the issue, where the room was becoming unstable on my homeserver.
Possible implications
After this happened a few times, in the Python room
#python:matrix.org
, the room became unstable on my end, and I was unable to send events to that room any more because of missing state events. I also wasn't able to remove the room with the admin API, so, I manually had to remove the room from the database to gain access again:(Because I had to roll back a few times, while I was trying to figure out, how I can remove the room, I accidentally deleted the logs as well.)
Steps to reproduce
Homeserver
matrix.org; michaelsasser.org
Synapse Version
matrixdotorg/synapse; tags: v1.71.0, v1.72.0, possibly v1.70.0
Installation Method
Docker (matrixdotorg/synapse)
Database
postgres:15.0-alpine on same machine, It was recently restored to upgrade to v15, not sure, if it happeded before; recently restored while trying to removing room, issues happened before
Workers
Multiple workers
Platform
Environment: CX31 Hetzner Cloud
Using the Ansible playbook: spantaleev/matrix-docker-ansible-deploy
Debian: Linux matrix 5.10.0-19-amd64 element-hq/element-web#1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux
Docker: Docker version 20.10.21, build baeda1f
Containerd: containerd containerd.io 1.6.10 770bd0108c32f3fb5c73ae1264f7e503fe7b2661
Configuration
Playbook
Using the Ansible playbook: [spantaleev/matrix-docker-ansible-deploy], contains multiple files (result below):
Playbook/config
Results in:
homeserver.yaml
Workers
Relevant log output
Anything else that would be useful to know?
Full log during the time the duplication, see "Image 2" happened: message_duplication.log
All times & dates are UTC+1.
The text was updated successfully, but these errors were encountered: