Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Postgres port script breaks the database after synapse 1.26, doesn't allow creating new DMs. #9382

Closed
krithin opened this issue Feb 11, 2021 · 5 comments · Fixed by #9449
Closed
Assignees
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@krithin
Copy link
Contributor

krithin commented Feb 11, 2021

Description

I upgraded from sqlite to postgres using synapse_port_db. The script ran successfully, except for the issues I mentioned in #9344. A few days after that, however, a new user on a different HS tried to initiate a DM with me, and I found myself unable to accept the invite with a "Failed to join room: Internal Server Error". My synapse logs indicate that the problem is duplicate key value violates unique constraint "event_auth_chains_c_seq_index".

Searching the history of #synapse:matrix.org for event_auth_chains_c_seq_index turned up a couple more cases where people saw that error and were unable to create new rooms or DMs after migrating to postgres, so I know it's not just me. In the past the advice they had been given was to wipe their server and start fresh, but I think that kind of a data loss bug (or a db state bug for which the only remedy is data loss) is pretty unacceptable for a messaging service.

This is particularly pernicious because the postgres migration script completes successfully, and it's only possibly a few days later, when someone tries to create a new room, that you find the database is borked. This would be less of a problem if there were a clear, documented workaround (like there is for the bugs in #9344), but for this bug the crowd in #synapse:matrix.org does not know of a nondestructive fix for the problem.

Steps to reproduce

  • upgrade from sqlite to postgres on synapse 1.26
  • have another user try to start a new DM room with you.
  • Observe that this fails, with an error

Version information

  • Homeserver:

If not matrix.org:

  • Version: {"server_version":"1.26.0","python_version":"3.8.5"}

  • Install method: apt: matrix-synapse-py3/unknown,now 1.26.0+focal1 amd64 [installed]

  • Platform: Ubuntu 20.04.2 LTS

@krithin
Copy link
Contributor Author

krithin commented Feb 11, 2021

Reading through the code introduced in #8868 I'm starting to suspect this is an effect of another improperly-seeded sequence in the postgres migration script; unfortunately the checks that flagged other problem sequences in the migration script didn't catch this one.

If that hunch is right, I think running this query by hand should fix this problem:

select setval('event_auth_chain_id', (select max(chain_id) from event_auth_chains));

Would anyone who knows the event auth chain schema better than I do like to confirm that fiddling with the database directly there won't mess something else up?

@erikjohnston
Copy link
Member

Thanks for digging into this, looks like we've forgotten to .check_consistency on the sequence at startup, so none of the checks are happening 🤦

I can confirm that is safe to increase the values of any of our sequences (while the server is offline).

@erikjohnston erikjohnston added S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Feb 11, 2021
@richvdh
Copy link
Member

richvdh commented Feb 11, 2021

related: #9344

@krithin
Copy link
Contributor Author

krithin commented Feb 11, 2021

Ran that setval query to update the sequence and creating new rooms works fine now. Thanks for your help!

@ewsandor
Copy link

I experienced this issue recently on 1.28.0 and reverted to a backup of my sqlite db.

I recently retried with 1.29.0 as the release notes mentioned both #9449 and #9470. This time, I received an error for 'event_auth_chain_id' on incremental calls to the synapse_port_db. To me, this suggests the #9470 PR appears to be working, but there may still be a case not covered with the #9449 PR.

@krithin's work around did seem to work and I have since migrated.

Has anyone else experienced this after updating 1.29.0?

My installation is on Debian 10.8 via the Debian package.

2021-03-14 19:08:31,659 - synapse.storage.util.sequence - 179 - WARNING - Postgres sequence event_auth_chain_id is behind table event_auth_chains: 41232 < 41233
2021-03-14 19:08:31,660 - synapse_port_db - 713 - ERROR - 
Traceback (most recent call last):
  File "/usr/bin/synapse_port_db", line 582, in run
    self.hs_config.get_single_database()
  File "/usr/bin/synapse_port_db", line 531, in build_db_store
    store = Store(DatabasePool(hs, db_config, engine), db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/client_ips.py", line 35, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/deviceinbox.py", line 542, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/devices.py", line 912, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/events_bg_updates.py", line 58, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/media_repository.py", line 45, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/registration.py", line 1053, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/registration.py", line 79, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/cache.py", line 43, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/room.py", line 963, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/roommember.py", line 853, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/search.py", line 93, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/state/bg_updates.py", line 184, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/state.py", line 321, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/roommember.py", line 55, in __init__
    super().__init__(database, db_conn, hs)
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/databases/main/events_worker.py", line 172, in __init__
    id_column="chain_id",
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/util/sequence.py", line 289, in build_sequence_generator
    positive=positive,
  File "/opt/venvs/matrix-synapse/lib/python3.7/site-packages/synapse/storage/util/sequence.py", line 183, in check_consistency
    % {"seq": self._sequence_name, "table": table, "max_id_sql": table_sql}
synapse.storage.engines._base.IncorrectDatabaseSetup: 
Postgres sequence 'event_auth_chain_id' is inconsistent with associated
table 'event_auth_chains'. This can happen if Synapse has been downgraded and
then upgraded again, or due to a bad migration.

To fix this error, shut down Synapse (including any and all workers)
and run the following SQL:

    SELECT setval('event_auth_chain_id', (
        SELECT GREATEST(MAX(chain_id), 0) FROM event_auth_chains
    ));

See docs/postgres.md for more information.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants