Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Unexpectedly found that events don't have chain IDs #10083

Closed
daenney opened this issue May 27, 2021 · 8 comments
Closed

Unexpectedly found that events don't have chain IDs #10083

daenney opened this issue May 27, 2021 · 8 comments
Assignees
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@daenney
Copy link
Contributor

daenney commented May 27, 2021

Description

synapse.storage.databases.main.event_federation - 369 - INFO - persist_events-17250 - Unexpectedly found that events don't have chain IDs in room !OGEhHVWSdvArJzumhm:matrix.org: {'$e3j8FpK-lcibZO57q50gANeLSK0dptpF6fCUmBF3rKU', '$uIBJZEYyEtATjznpOiBVPlazSR_PhVKurtsLpTCEmDA', '$Om8XNzoCfBmCG36Zx24Y9WFHRy2RPVQDDBbN8CvSq74', '$7_ZXqJstZZrXU1S-2jgKaEOpgrvWEXdU_NxQ6mxC4_8'} 

And logs like:

PUT-638717-!OGEhHVWSdvArJzumhm:matrix.org-$OOjkuWpVS-iTT7joyJUx3UDTpKyPDPbdg2YvyfltkWY - auth_event id $GJ4kBt8ivplflUSXy6NB_tYANh98Pi_5hZ2skUeq0v4 for event $uIBJZEYyEtATjznpOiBVPlazSR_PhVKurtsLpTCEmDA is missing

Steps to reproduce

  • Look in the logs 😞

Version information

  • Homeserver: Other, g33k.se

If not matrix.org:

  • Version: 1.34.0

  • Install method: Debian packages from Matrix.org

  • Platform: Ubuntu 20.04

@daenney
Copy link
Contributor Author

daenney commented May 27, 2021

Other folks noticed this too like @Half-Shot @MatMaul and a similar issue has been reported for EMS hosts too by @benbz. @richvdh mentioned it's likely related to #9595

@anoadragon453 anoadragon453 added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jun 1, 2021
@richvdh
Copy link
Member

richvdh commented Jun 7, 2021

This also causes send_join requests to take tens of minutes of CPU time.

@richvdh richvdh added S-Major Major functionality / product severely impaired, no satisfactory workaround. and removed S-Minor Blocks non-critical functionality, workarounds exist. labels Jun 7, 2021
@richvdh
Copy link
Member

richvdh commented Aug 4, 2021

@daenney could you do me a favour and run select * from rejections where event_id='$GJ4kBt8ivplflUSXy6NB_tYANh98Pi_5hZ2skUeq0v4';

That should help confirm if it's due to #9595.

@daenney
Copy link
Contributor Author

daenney commented Aug 5, 2021

0 rows retrieved in 63 ms is what that query got me.

@richvdh
Copy link
Member

richvdh commented Aug 5, 2021

hrm, so it's a slightly different cause to #9595 :/

@bdelwood
Copy link

bdelwood commented Aug 5, 2021

I am seeing the same issue that's killing Matrix HQ for me.

Here are some of the relevant log lines

2021-08-01 17:41:22,351 - synapse.storage.databases.main.event_federation - 386 - INFO - GET-6464 - Unexpectedly found that events don't have chain IDs in room !OGEhHVWSdvArJzumhm:matrix.org: {'$RrDaHIy4PU1_btzUtywDCxPLYlRGu91Bvr3QUJEyTYM', '$Om8XNzoCfBmCG36Zx24Y9WFHRy2RPVQDDBbN8CvSq74', '$7_ZXqJstZZrXU1S-2jgKaEOpgrvWEXdU_NxQ6mxC4_8', '$e3j8FpK-lcibZO57q50gANeLSK0dptpF6fCUmBF3rKU', '$uIBJZEYyEtATjznpOiBVPlazSR_PhVKurtsLpTCEmDA'}
2021-08-01 17:42:19,602 - synapse.state.v2 - 534 - WARNING - GET-6464 - auth_event id $GJ4kBt8ivplflUSXy6NB_tYANh98Pi_5hZ2skUeq0v4 for event $uIBJZEYyEtATjznpOiBVPlazSR_PhVKurtsLpTCEmDA is missing

That query gave me

synapse=# select * from rejections where event_id='$uIBJZEYyEtATjznpOiBVPlazSR_PhVKurtsLpTCEmDA';
                   event_id                   |   reason   |  last_check
----------------------------------------------+------------+---------------
 $uIBJZEYyEtATjznpOiBVPlazSR_PhVKurtsLpTCEmDA | auth_error | 1625870179553

Version: Synapse 1.39.0
Install method: matrixdotorg/synapse:latest; Docker v20.10.8
Platform: Arch, kernel 5.13.7

@daenney
Copy link
Contributor Author

daenney commented Aug 6, 2021

Oh, right. My HS doesn't have anyone in HQ any more (everyone moved out during the spam thing). I wonder if that might've lead to my query returning nothing.

richvdh added a commit that referenced this issue Oct 18, 2021
Currently, when we receive an event whose auth_events differ from those we expect, we state-resolve between the two state sets, and check that the event passes auth based on the resolved state.

This means that it's possible for us to accept events which don't pass auth at their declared auth_events (or where the auth events themselves were rejected), leading to problems down the line like #10083.

This change means we will:

 * ignore any events where we cannot find the auth events
 * reject any events whose auth events were rejected
 * reject any events which do not pass auth at their declared auth_events.

Together with a whole raft of previous work, this is a partial fix to #9595.

Fixes #6643.

Based on #11009.
@richvdh
Copy link
Member

richvdh commented Oct 28, 2021

I'm reasonably confident that the root cause of this is #9595, so am going to close it.

As noted on #9595, hopefully the bleeding is stopped, but existing databases may still have problems.

@richvdh richvdh closed this as completed Oct 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

4 participants