Move event stream handling out of slave store. #7491

erikjohnston · 2020-05-13T14:20:10Z

This allows us to have the logic on both master and workers, which is necessary to move event persistence off master.

We also combine the instantiation of ID generators from DataStore and slave stores to the base worker stores. This allows us to select which process writes events independently of the master/worker splits.

~~Based on #7490~~

richvdh

it seems generally plausible, but unfortunately I appear to have Opinions.

richvdh · 2020-05-15T11:28:46Z

synapse/storage/data_stores/main/push_rule.py

+        if hs.config.worker.worker_app is None:
+            self._push_rules_stream_id_gen = ChainedIdGenerator(
+                self._stream_id_gen, db_conn, "push_rules_stream", "stream_id"
+            )  # type: Union[ChainedIdGenerator, SlavedIdTracker]


seems like if this is a thing we need, we should consider declaring a protocol or interface type for it

Yeah, I briefly thought about it then screamed a bit. We could make a protocol that encompasses both but then I'm not sure that is that helpful if half the functions are stubbed out in both. In general the typing here is a bit of a mess and I think we should do something properly about it

(I'm happy to add a protocol here, just not sure its usefulness in practice)

fair enough. I do think there needs to be some sort of class here, if only because we'll need it to get mypy working properly. However, I'm happy for it to be punted for now.

Agreed, though I think we'll need to do some refactoring of some form to make the types make sense, possibly.

richvdh · 2020-05-15T11:36:11Z

synapse/storage/util/id_generators.py

+    def advance(self, token: int):
+        """Stub implementation for advancing the token when receiving updates
+        over replication; raises an exception as this instance should be the
+        only source of updates.
+        """
+
+        raise Exception(
+            "Attempted to advance token on source for table %r", self._table
+        )


again: it feels like there should be a Protocol or abstract base class that we're implementing here.

synapse/storage/data_stores/main/push_rule.py

richvdh · 2020-05-15T12:09:17Z

synapse/storage/data_stores/main/push_rule.py

+        if hs.config.worker.worker_app is None:
+            self._push_rules_stream_id_gen = ChainedIdGenerator(
+                self._stream_id_gen, db_conn, "push_rules_stream", "stream_id"
+            )  # type: Union[ChainedIdGenerator, SlavedIdTracker]
+        else:
+            self._push_rules_stream_id_gen = SlavedIdTracker(
+                db_conn, "push_rules_stream", "stream_id"
+            )


it's not obvious to me why this code needs to move (at all, but particularly in the same PR as moving the events stream)?

It needed to move either way (as we hadn't called super().__init__ previously), and broadly it felt easier/better to move it here than to below the super().__init__ call.

richvdh · 2020-05-15T12:11:19Z

synapse/storage/data_stores/main/events_worker.py

@@ -74,6 +76,26 @@ class EventsWorkerStore(SQLBaseStore):
    def __init__(self, database: Database, db_conn, hs):
        super(EventsWorkerStore, self).__init__(database, db_conn, hs)

+        if hs.config.worker_app is None:


I kinda hate this pattern; it's everywhere and it feels magical. It feels like there should be an hs.am_I_the_source_for_the_events_stream() method or something. maybe that's one to punt to a different PR where we can try to kill off other instances of the same thing though.

(or maybe you have a better plan for this anyway)

This is how we currently do it, but yes configuration for this sort of thing is incoming :)

synapse/storage/data_stores/main/events_worker.py

richvdh · 2020-05-15T12:20:12Z

synapse/storage/data_stores/main/cache.py

@@ -66,7 +71,24 @@ def get_all_updated_caches_txn(txn):
        )

    def process_replication_rows(self, stream_name, instance_name, token, rows):
-        if stream_name == "caches":
+        if stream_name == "events":
+            self._stream_id_gen.advance(token)


CacheInvalidationWorkerStore seems like a funny place for this. Shouldn't it be in EventsWorkerStore, if that's where _stream_id_gen lives?

(similarly for _backfill_id_gen)

Woops, yes. Have moved those. I've just moved the advancing of the tokens, but not totally sure where the cache invalidation should live.

richvdh · 2020-05-15T12:24:02Z

synapse/storage/data_stores/main/cache.py

+        if stream_name == "events":
+            self._stream_id_gen.advance(token)
+            for row in rows:
+                self._process_event_stream_row(token, row)


moving this in here runs the risk of changing the order in which we carry out the operations we do when processing a replication row (since it basically relies on the MRO of process_replication_rows. Have you had a think about that?

All the store stuff should just be invalidating caches or advancing tokens so the order doesn't matter (certainly I don't think we've thought about MRO before). The store process_replication_rows gets called before we otherwise handle rows too, so I think that is fine

richvdh

lgtm then

This allows us to have the logic on both master and workers, which is necessary to move event persistence off master. We also combine the instantiation of ID generators from DataStore and slave stores to the base worker stores. This allows us to select which process writes events independently of the master/worker splits.

erikjohnston force-pushed the erikj/store_shuffle_2 branch 2 times, most recently from 2408b58 to 0054b10 Compare May 13, 2020 15:02

erikjohnston requested a review from a team May 13, 2020 15:08

erikjohnston added 6 commits May 14, 2020 17:09

Move repliction event stream handling out of slave store

d67a8b5

Move events ID gens to EventWorkerStore

bc3fc39

Move push rules ID gen to push rules worker

342796d

Newsfile

41f558c

Fix typing and add assertion.

208ab7b

Fix lint

e7f5ac4

erikjohnston force-pushed the erikj/store_shuffle_2 branch from 0054b10 to e7f5ac4 Compare May 14, 2020 16:10

richvdh suggested changes May 15, 2020

View reviewed changes

erikjohnston added 3 commits May 15, 2020 14:04

Inherit from event worker store

10fa143

Token advancing should happen in event store

d864cff

Comments

0d078d8

erikjohnston requested a review from richvdh May 15, 2020 14:51

richvdh approved these changes May 15, 2020

View reviewed changes

erikjohnston merged commit 1f36ff6 into develop May 15, 2020

erikjohnston deleted the erikj/store_shuffle_2 branch May 15, 2020 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move event stream handling out of slave store. #7491

Move event stream handling out of slave store. #7491

erikjohnston commented May 13, 2020 •

edited

Loading

richvdh left a comment

richvdh May 15, 2020

erikjohnston May 15, 2020

erikjohnston May 15, 2020

richvdh May 15, 2020

erikjohnston May 15, 2020

richvdh May 15, 2020

richvdh May 15, 2020

erikjohnston May 15, 2020

richvdh May 15, 2020

richvdh May 15, 2020

erikjohnston May 15, 2020

richvdh May 15, 2020

erikjohnston May 15, 2020

richvdh May 15, 2020

erikjohnston May 15, 2020

richvdh left a comment

Move event stream handling out of slave store. #7491

Move event stream handling out of slave store. #7491

Conversation

erikjohnston commented May 13, 2020 • edited Loading

richvdh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richvdh left a comment

Choose a reason for hiding this comment

erikjohnston commented May 13, 2020 •

edited

Loading