Add experimental support for sharding event persister. #8170

erikjohnston · 2020-08-26T09:34:35Z

This is not ready for production yet. Caveats:

We should write some tests...
The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.

Probably worth looking at this commit by commit. There is a FIXME in Implement config... that is dealt with by the federation handler refactor,

clokep

I didn't see anything glaringly incorrect, but did leave quite a few comments. I'm guessing documentation won't be updated until this is not experimental?

synapse/config/_base.py

synapse/config/workers.py

clokep · 2020-09-01T18:02:21Z

synapse/handlers/federation.py

@@ -923,7 +923,8 @@ async def backfill(self, dest, room_id, limit, extremities):
                )
            )

-        await self._handle_new_events(dest, ev_infos, backfilled=True)
+        if ev_infos:


Is this if-statement just an optimization?

Oh err, somewhere along the lines I think something got very unhappy about empty lists. I forget the details now or if its even necessary 😕

synapse/handlers/federation.py

clokep · 2020-09-01T18:09:43Z

synapse/handlers/federation.py

+        instance = self.config.worker.events_shard_config.get_instance(room_id)
+        if instance != self._instance_name:


Edit: I wrote the below before realizing that instance is also used in the _send_events call below, I thought it was worth leaving though in case it jiggles a good idea loose:

It looks like this pattern is used quite a bit, in the comments for ShardedWorkerHandlingConfig it says to prefer should_handle, which seems like it could be used here:

if self.config.worker.events_shard_config.should_handle(self._instance_name, room_id):

Although I think some of this could be simplified more if ShardedWorkerHandlingConfig knew what the current instance was (maybe should_handle wouldn't need the instance passed in?)

nods. The problem with giving ShardedWorkerHandlingConfig the current instance name is that I don't think we know what the instance name is during config parsing (outside of the worker config parsing).

Actually, I think we do need to use should_handle first to technically conform to the docs of ShardedWorkerHandlingConfig

Though I guess the fact that we go on to do a HTTP replication hit means that get_instance has to work.

I think it is OK to use get_instance here, that and should_handle should have matching logic after all!

synapse/handlers/room.py

synapse/storage/util/id_generators.py

synapse/storage/databases/main/events_worker.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

erikjohnston · 2020-09-02T10:47:37Z

I'm guessing documentation won't be updated until this is not experimental?

Yeah, I really really don't want people using this at all. Not even in tests (I have a branch with working sytests but they go so slowly)

clokep · 2020-09-02T11:44:25Z

@erikjohnston So I think this looks good! Not sure if you're looking for a ✅ or want to add tests?

clokep · 2020-09-02T12:19:35Z

I think this is related to #7986.

erikjohnston · 2020-09-02T12:32:27Z

@erikjohnston So I think this looks good! Not sure if you're looking for a or want to add tests?

So I was waiting for SyTests to land, but actually it turns out that with this current implementation they take forever. This is because of the get_persisted_position not always advancing due to gaps in the events stream, which can be fixed by making each worker periodically send out updated positions, but a) that makes the tests slow and b) not really something I necessarily want to add.

Given this is completely undocumented I'm tempted to merge as is, and then require tests for the next phase (which is augmenting the get_persisted_position stuff with something a bit smarter)

This reverts commit 82c1ee1.

…" (#8242) * Revert "Add experimental support for sharding event persister. (#8170)" This reverts commit 82c1ee1. * Changelog

This is *not* ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.

* commit '0d4f614fd': Refactor `_get_e2e_device_keys_for_federation_query_txn` (#8225) Add experimental support for sharding event persister. (#8170) Add /user/{user_id}/shared_rooms/ api (#7785) Do not try to store invalid data in the stats table (#8226) Convert the main methods run by the reactor to async. (#8213)

* commit '9f8abdcc3': Revert "Add experimental support for sharding event persister. (#8170)" (#8242)

erikjohnston force-pushed the erikj/add_stream_token_type branch 7 times, most recently from 0df5060 to e151ff2 Compare September 1, 2020 12:56

erikjohnston requested a review from a team September 1, 2020 13:12

erikjohnston force-pushed the erikj/add_stream_token_type branch 2 times, most recently from 9c8eb15 to 609bc17 Compare September 1, 2020 14:45

erikjohnston added 4 commits September 1, 2020 16:39

Add multiwriter for events

67bfbb6

Implement config and routing for multiple event writers

164450e

Newsfile

25e78f7

Thread through room_id in federation handler

ac494a8

erikjohnston force-pushed the erikj/add_stream_token_type branch from 609bc17 to ac494a8 Compare September 1, 2020 15:40

clokep reviewed Sep 1, 2020

View reviewed changes

erikjohnston and others added 2 commits September 2, 2020 10:29

Apply suggestions from code review

da98a2b

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

Add/modify comments

d54d6e3

erikjohnston requested a review from clokep September 2, 2020 10:46

clokep approved these changes Sep 2, 2020

View reviewed changes

erikjohnston merged commit 82c1ee1 into develop Sep 2, 2020

erikjohnston deleted the erikj/add_stream_token_type branch September 2, 2020 14:48

babolivier added a commit that referenced this pull request Sep 3, 2020

Revert "Add experimental support for sharding event persister. (#8170)"

3576c87

This reverts commit 82c1ee1.

babolivier mentioned this pull request Sep 3, 2020

Revert "Add experimental support for sharding event persister. (#8170)" #8242

Merged

richvdh mentioned this pull request Sep 3, 2020

Newly left rooms appear in the leave section of gapped sync seems flakey matrix-org/sytest#948

Closed

babolivier added a commit that referenced this pull request Sep 4, 2020

Revert "Add experimental support for sharding event persister. (#8170)…

9f8abdc

…" (#8242) * Revert "Add experimental support for sharding event persister. (#8170)" This reverts commit 82c1ee1. * Changelog

erikjohnston mentioned this pull request Sep 11, 2020

Add experimental support for sharding event persister. Again. #8294

Merged

babolivier pushed a commit that referenced this pull request Sep 1, 2021

Merge commit '9f8abdcc3' into anoa/dinsic_release_1_21_x

581445c

* commit '9f8abdcc3': Revert "Add experimental support for sharding event persister. (#8170)" (#8242)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental support for sharding event persister. #8170

Add experimental support for sharding event persister. #8170

erikjohnston commented Aug 26, 2020 •

edited

Loading

clokep left a comment

clokep Sep 1, 2020

erikjohnston Sep 2, 2020

clokep Sep 1, 2020

erikjohnston Sep 2, 2020

erikjohnston Sep 2, 2020 •

edited

Loading

erikjohnston Sep 2, 2020

clokep Sep 2, 2020

erikjohnston commented Sep 2, 2020

clokep commented Sep 2, 2020

clokep commented Sep 2, 2020

erikjohnston commented Sep 2, 2020

		instance = self.config.worker.events_shard_config.get_instance(room_id)
		if instance != self._instance_name:

Add experimental support for sharding event persister. #8170

Add experimental support for sharding event persister. #8170

Conversation

erikjohnston commented Aug 26, 2020 • edited Loading

clokep left a comment

Choose a reason for hiding this comment

clokep Sep 1, 2020

Choose a reason for hiding this comment

erikjohnston Sep 2, 2020

Choose a reason for hiding this comment

clokep Sep 1, 2020

Choose a reason for hiding this comment

erikjohnston Sep 2, 2020

Choose a reason for hiding this comment

erikjohnston Sep 2, 2020 • edited Loading

Choose a reason for hiding this comment

erikjohnston Sep 2, 2020

Choose a reason for hiding this comment

clokep Sep 2, 2020

Choose a reason for hiding this comment

erikjohnston commented Sep 2, 2020

clokep commented Sep 2, 2020

clokep commented Sep 2, 2020

erikjohnston commented Sep 2, 2020

erikjohnston commented Aug 26, 2020 •

edited

Loading

erikjohnston Sep 2, 2020 •

edited

Loading