-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
restarting multiple workers at once risks applying database migrations multiple times #8006
Comments
I thought we used to have a thing that stopped migrations from running on anything other than the main process. In any case empirically it doesn't work any more. |
if the solution to this ends up being any sort of database-level locking, it would be nice to consider #6467 at the same time. |
I think we should be able to just not run the prepare database step if we're on not-master? |
Is there a risk of those workers starting and then erroring because the migrations haven't finished yet? |
Hmm, true. We (matrix.org) do try and ensure master starts up first, but I'm not sure that's documented anywhere |
that would be preferable to the current situation imho. But I also wouldn't object to the other workers going into a sleep/check loop until the db got upgraded.
I'm also not convinced it's true that we wait for the master to restart before we restart the workers on the other server. |
We have historically said that we restart the server with master on before the other servers when upgrading. That advice has probably gotten a bit lost over time too |
@reivilibre suggests maybe we could lock the schema version table while we do the migration |
|
well this might be true if we actually made the upgrades respect transactions correctly (cf #6467) |
Well that feels like a prerequisite to this if we're not correctly wrapping things in transactions? |
depends, but sure it would be a good thing to fix :) |
Looking at this more closely, because psycopg starts a new transaction for every statement, concurrent attempts to upgrade the database will probably fail for the reason Erik gave previously. I still think it's confusing, though. |
fixed by #8266 |
each worker independently checks if the schema is up-to-date and applies the migrations if not. At best this fails with exceptions; at worst it could result in data corruption
The text was updated successfully, but these errors were encountered: