Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

restarting multiple workers at once risks applying database migrations multiple times #8006

Closed
richvdh opened this issue Jul 31, 2020 · 14 comments
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)

Comments

@richvdh
Copy link
Member

richvdh commented Jul 31, 2020

each worker independently checks if the schema is up-to-date and applies the migrations if not. At best this fails with exceptions; at worst it could result in data corruption

@richvdh
Copy link
Member Author

richvdh commented Jul 31, 2020

I thought we used to have a thing that stopped migrations from running on anything other than the main process. In any case empirically it doesn't work any more.

@richvdh
Copy link
Member Author

richvdh commented Jul 31, 2020

if the solution to this ends up being any sort of database-level locking, it would be nice to consider #6467 at the same time.

@erikjohnston erikjohnston added z-bug (Deprecated Label) z-p2 (Deprecated Label) labels Aug 4, 2020
@erikjohnston
Copy link
Member

I think we should be able to just not run the prepare database step if we're on not-master?

@clokep
Copy link
Member

clokep commented Aug 4, 2020

I think we should be able to just not run the prepare database step if we're on not-master?

Is there a risk of those workers starting and then erroring because the migrations haven't finished yet?

@erikjohnston
Copy link
Member

Hmm, true. We (matrix.org) do try and ensure master starts up first, but I'm not sure that's documented anywhere

@richvdh
Copy link
Member Author

richvdh commented Aug 4, 2020

Is there a risk of those workers starting and then erroring because the migrations haven't finished yet?

that would be preferable to the current situation imho. But I also wouldn't object to the other workers going into a sleep/check loop until the db got upgraded.

Hmm, true. We (matrix.org) do try and ensure master starts up first, but I'm not sure that's documented anywhere

I'm also not convinced it's true that we wait for the master to restart before we restart the workers on the other server.

@erikjohnston
Copy link
Member

Hmm, true. We (matrix.org) do try and ensure master starts up first, but I'm not sure that's documented anywhere

I'm also not convinced it's true that we wait for the master to restart before we restart the workers on the other server.

We have historically said that we restart the server with master on before the other servers when upgrading. That advice has probably gotten a bit lost over time too

@richvdh
Copy link
Member Author

richvdh commented Aug 20, 2020

@reivilibre suggests maybe we could lock the schema version table while we do the migration

@erikjohnston
Copy link
Member

applied_schema_deltas has a unique index so applying the same delta concurrently should cause one of the transactions to fail (and thus rollback the delta changes).

@richvdh
Copy link
Member Author

richvdh commented Aug 20, 2020

applied_schema_deltas has a unique index so applying the same delta concurrently should cause one of the transactions to fail (and thus rollback the delta changes).

well this might be true if we actually made the upgrades respect transactions correctly (cf #6467)

@erikjohnston
Copy link
Member

Well that feels like a prerequisite to this if we're not correctly wrapping things in transactions?

@richvdh
Copy link
Member Author

richvdh commented Aug 20, 2020

depends, but sure it would be a good thing to fix :)

@richvdh
Copy link
Member Author

richvdh commented Sep 6, 2020

Looking at this more closely, because psycopg starts a new transaction for every statement, concurrent attempts to upgrade the database will probably fail for the reason Erik gave previously.

I still think it's confusing, though.

@richvdh
Copy link
Member Author

richvdh commented Sep 7, 2020

fixed by #8266

@richvdh richvdh closed this as completed Sep 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

3 participants