Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Make event persisters periodically announce position over replication. #8499

Merged
merged 10 commits into from
Oct 12, 2020

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Oct 8, 2020

Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. get_current_token()) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress.

This PR changes it so that the MultiWriterIDGenerator keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.

Based on #8496.

@erikjohnston erikjohnston requested review from a team and removed request for a team October 8, 2020 11:26
@erikjohnston erikjohnston force-pushed the erikj/periodic_poke branch 2 times, most recently from 7a8091e to 0061bca Compare October 9, 2020 09:58
This allows us to send a `POSITION` command to inform instances that the
stream position has advanced without instances thinking they've missed
updates.

This is so that sharded event persisters can update their local
positions in the events stream to the lastest position and periodically
tell other instances about it.

This helps ensure that the calculated minimum persisted position of the
events stream advances even if only one event persister is writing.
Background processes that read off the event stream use the minimum
persisted position, and so keeping that up to date ensures that those
background processes don't get stuck.
@erikjohnston erikjohnston requested a review from a team October 9, 2020 14:07
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally familiar with everything replication, but I get the concept of this PR after some discussion in our room.

docs/tcp_replication.md Outdated Show resolved Hide resolved
synapse/replication/tcp/commands.py Outdated Show resolved Hide resolved
Comment on lines 180 to 182
logger.info(
"Sending position: %s -> %s", stream.NAME, current_token
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something for this PR, but I wonder whether these at INFO are a bit noisy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably, its just really useful to figure out when things go wrong.

Comment on lines +188 to +191
# XXX: We only do this for the EventStream as it
# turns out that e.g. account data streams share
# their "current token" with each other, meaning
# that it is *not* safe to send a POSITION.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly is it not safe to share positions with each other? Can you only have one worker advancing the account data stream?

Are you only doing the eventstream here out of caution? Would this be a problem for streams other than account?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take the account data streams as an example, say we persist a bunch of room account data rows at 3, 4, 5, then if before we handle the room account data stream we send a POSITION global_account_data 2 5. The workers will update the shared account data current token to 5, so when they finally do receive RDATA room_account_data 3 ... etc they will compare the stream ID with the current token (i.e. 3 vs 5) and drop the RDATA as old.

Really, we should have one replication stream per stream ID generator, and I plan to do that but didn't want to change the world in this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense. For the event persisters this is fine as an event only needs to be persisted once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeeeah, specifically it's fine as we're using an ID gen that supports sharded streams (rather than reusing the same ID gen across different replication streams)

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
@erikjohnston erikjohnston merged commit 8de3703 into develop Oct 12, 2020
@erikjohnston erikjohnston deleted the erikj/periodic_poke branch October 12, 2020 14:51
erikjohnston added a commit that referenced this pull request Oct 22, 2020
Synapse 1.22.0rc1 (2020-10-22)
==============================

Features
--------

- Add a configuration option for always using the "userinfo endpoint" for OpenID Connect. This fixes support for some identity providers, e.g. GitLab. Contributed by Benjamin Koch. ([\#7658](#7658))
- Add ability for `ThirdPartyEventRules` modules to query and manipulate whether a room is in the public rooms directory. ([\#8292](#8292), [\#8467](#8467))
- Add support for olm fallback keys ([MSC2732](matrix-org/matrix-spec-proposals#2732)). ([\#8312](#8312), [\#8501](#8501))
- Add support for running background tasks in a separate worker process. ([\#8369](#8369), [\#8458](#8458), [\#8489](#8489), [\#8513](#8513), [\#8544](#8544), [\#8599](#8599))
- Add support for device dehydration ([MSC2697](matrix-org/matrix-spec-proposals#2697)). ([\#8380](#8380))
- Add support for [MSC2409](matrix-org/matrix-spec-proposals#2409), which allows sending typing, read receipts, and presence events to appservices. ([\#8437](#8437), [\#8590](#8590))
- Change default room version to "6", per [MSC2788](matrix-org/matrix-spec-proposals#2788). ([\#8461](#8461))
- Add the ability to send non-membership events into a room via the `ModuleApi`. ([\#8479](#8479))
- Increase default upload size limit from 10M to 50M. Contributed by @Akkowicz. ([\#8502](#8502))
- Add support for modifying event content in `ThirdPartyRules` modules. ([\#8535](#8535), [\#8564](#8564))

Bugfixes
--------

- Fix a longstanding bug where invalid ignored users in account data could break clients. ([\#8454](#8454))
- Fix a bug where backfilling a room with an event that was missing the `redacts` field would break. ([\#8457](#8457))
- Don't attempt to respond to some requests if the client has already disconnected. ([\#8465](#8465))
- Fix message duplication if something goes wrong after persisting the event. ([\#8476](#8476))
- Fix incremental sync returning an incorrect `prev_batch` token in timeline section, which when used to paginate returned events that were included in the incremental sync. Broken since v0.16.0. ([\#8486](#8486))
- Expose the `uk.half-shot.msc2778.login.application_service` to clients from the login API. This feature was added in v1.21.0, but was not exposed as a potential login flow. ([\#8504](#8504))
- Fix error code for `/profile/{userId}/displayname` to be `M_BAD_JSON`. ([\#8517](#8517))
- Fix a bug introduced in v1.7.0 that could cause Synapse to insert values from non-state `m.room.retention` events into the `room_retention` database table. ([\#8527](#8527))
- Fix not sending events over federation when using sharded event writers. ([\#8536](#8536))
- Fix a long standing bug where email notifications for encrypted messages were blank. ([\#8545](#8545))
- Fix increase in the number of `There was no active span...` errors logged when using OpenTracing. ([\#8567](#8567))
- Fix a bug that prevented errors encountered during execution of the `synapse_port_db` from being correctly printed. ([\#8585](#8585))
- Fix appservice transactions to only include a maximum of 100 persistent and 100 ephemeral events. ([\#8606](#8606))

Updates to the Docker image
---------------------------

- Added multi-arch support (arm64,arm/v7) for the docker images. Contributed by @maquis196. ([\#7921](#7921))
- Add support for passing commandline args to the synapse process. Contributed by @samuel-p. ([\#8390](#8390))

Improved Documentation
----------------------

- Update the directions for using the manhole with coroutines. ([\#8462](#8462))
- Improve readme by adding new shield.io badges. ([\#8493](#8493))
- Added note about docker in manhole.md regarding which ip address to bind to. Contributed by @maquis196. ([\#8526](#8526))
- Document the new behaviour of the `allowed_lifetime_min` and `allowed_lifetime_max` settings in the room retention configuration. ([\#8529](#8529))

Deprecations and Removals
-------------------------

- Drop unused `device_max_stream_id` table. ([\#8589](#8589))

Internal Changes
----------------

- Check for unreachable code with mypy. ([\#8432](#8432))
- Add unit test for event persister sharding. ([\#8433](#8433))
- Allow events to be sent to clients sooner when using sharded event persisters. ([\#8439](#8439), [\#8488](#8488), [\#8496](#8496), [\#8499](#8499))
- Configure `public_baseurl` when using demo scripts. ([\#8443](#8443))
- Add SQL logging on queries that happen during startup. ([\#8448](#8448))
- Speed up unit tests when using PostgreSQL. ([\#8450](#8450))
- Remove redundant database loads of stream_ordering for events we already have. ([\#8452](#8452))
- Reduce inconsistencies between codepaths for membership and non-membership events. ([\#8463](#8463))
- Combine `SpamCheckerApi` with the more generic `ModuleApi`. ([\#8464](#8464))
- Additional testing for `ThirdPartyEventRules`. ([\#8468](#8468))
- Add `-d` option to `./scripts-dev/lint.sh` to lint files that have changed since the last git commit. ([\#8472](#8472))
- Unblacklist some sytests. ([\#8474](#8474))
- Include the log level in the phone home stats. ([\#8477](#8477))
- Remove outdated sphinx documentation, scripts and configuration. ([\#8480](#8480))
- Clarify error message when plugin config parsers raise an error. ([\#8492](#8492))
- Remove the deprecated `Handlers` object. ([\#8494](#8494))
- Fix a threadsafety bug in unit tests. ([\#8497](#8497))
- Add user agent to user_daily_visits table. ([\#8503](#8503))
- Add type hints to various parts of the code base. ([\#8407](#8407), [\#8505](#8505), [\#8507](#8507), [\#8547](#8547), [\#8562](#8562), [\#8609](#8609))
- Remove unused code from the test framework. ([\#8514](#8514))
- Apply some internal fixes to the `HomeServer` class to make its code more idiomatic and statically-verifiable. ([\#8515](#8515))
- Factor out common code between `RoomMemberHandler._locally_reject_invite` and `EventCreationHandler.create_event`. ([\#8537](#8537))
- Improve database performance by executing more queries without starting transactions. ([\#8542](#8542))
- Rename `Cache` to `DeferredCache`, to better reflect its purpose. ([\#8548](#8548))
- Move metric registration code down into `LruCache`. ([\#8561](#8561), [\#8591](#8591))
- Replace `DeferredCache` with the lighter-weight `LruCache` where possible. ([\#8563](#8563))
- Add virtualenv-generated folders to `.gitignore`. ([\#8566](#8566))
- Add `get_immediate` method to `DeferredCache`. ([\#8568](#8568))
- Fix mypy not properly checking across the codebase, additionally, fix a typing assertion error in `handlers/auth.py`. ([\#8569](#8569))
- Fix `synmark` benchmark runner. ([\#8571](#8571))
- Modify `DeferredCache.get()` to return `Deferred`s instead of `ObservableDeferred`s. ([\#8572](#8572))
- Adjust a protocol-type definition to fit `sqlite3` assertions. ([\#8577](#8577))
- Support macOS on the `synmark` benchmark runner. ([\#8578](#8578))
- Update `mypy` static type checker to 0.790. ([\#8583](#8583), [\#8600](#8600))
- Re-organize the structured logging code to separate the TCP transport handling from the JSON formatting. ([\#8587](#8587))
- Remove extraneous unittest logging decorators from unit tests. ([\#8592](#8592))
- Minor optimisations in caching code. ([\#8593](#8593), [\#8594](#8594))
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Oct 30, 2020
Synapse 1.22.0 (2020-10-27)
===========================

No significant changes.


Synapse 1.22.0rc2 (2020-10-26)
==============================

Bugfixes
--------

- Fix bugs where ephemeral events were not sent to appservices. Broke in v1.22.0rc1. ([\#8648](matrix-org/synapse#8648), [\#8656](matrix-org/synapse#8656))
- Fix `user_daily_visits` table to not have duplicate rows per user/device due to multiple user agents. Broke in v1.22.0rc1. ([\#8654](matrix-org/synapse#8654))

Synapse 1.22.0rc1 (2020-10-22)
==============================

Features
--------

- Add a configuration option for always using the "userinfo endpoint" for OpenID Connect. This fixes support for some identity providers, e.g. GitLab. Contributed by Benjamin Koch. ([\#7658](matrix-org/synapse#7658))
- Add ability for `ThirdPartyEventRules` modules to query and manipulate whether a room is in the public rooms directory. ([\#8292](matrix-org/synapse#8292), [\#8467](matrix-org/synapse#8467))
- Add support for olm fallback keys ([MSC2732](matrix-org/matrix-spec-proposals#2732)). ([\#8312](matrix-org/synapse#8312), [\#8501](matrix-org/synapse#8501))
- Add support for running background tasks in a separate worker process. ([\#8369](matrix-org/synapse#8369), [\#8458](matrix-org/synapse#8458), [\#8489](matrix-org/synapse#8489), [\#8513](matrix-org/synapse#8513), [\#8544](matrix-org/synapse#8544), [\#8599](matrix-org/synapse#8599))
- Add support for device dehydration ([MSC2697](matrix-org/matrix-spec-proposals#2697)). ([\#8380](matrix-org/synapse#8380))
- Add support for [MSC2409](matrix-org/matrix-spec-proposals#2409), which allows sending typing, read receipts, and presence events to appservices. ([\#8437](matrix-org/synapse#8437), [\#8590](matrix-org/synapse#8590))
- Change default room version to "6", per [MSC2788](matrix-org/matrix-spec-proposals#2788). ([\#8461](matrix-org/synapse#8461))
- Add the ability to send non-membership events into a room via the `ModuleApi`. ([\#8479](matrix-org/synapse#8479))
- Increase default upload size limit from 10M to 50M. Contributed by @Akkowicz. ([\#8502](matrix-org/synapse#8502))
- Add support for modifying event content in `ThirdPartyRules` modules. ([\#8535](matrix-org/synapse#8535), [\#8564](matrix-org/synapse#8564))


Bugfixes
--------

- Fix a longstanding bug where invalid ignored users in account data could break clients. ([\#8454](matrix-org/synapse#8454))
- Fix a bug where backfilling a room with an event that was missing the `redacts` field would break. ([\#8457](matrix-org/synapse#8457))
- Don't attempt to respond to some requests if the client has already disconnected. ([\#8465](matrix-org/synapse#8465))
- Fix message duplication if something goes wrong after persisting the event. ([\#8476](matrix-org/synapse#8476))
- Fix incremental sync returning an incorrect `prev_batch` token in timeline section, which when used to paginate returned events that were included in the incremental sync. Broken since v0.16.0. ([\#8486](matrix-org/synapse#8486))
- Expose the `uk.half-shot.msc2778.login.application_service` to clients from the login API. This feature was added in v1.21.0, but was not exposed as a potential login flow. ([\#8504](matrix-org/synapse#8504))
- Fix error code for `/profile/{userId}/displayname` to be `M_BAD_JSON`. ([\#8517](matrix-org/synapse#8517))
- Fix a bug introduced in v1.7.0 that could cause Synapse to insert values from non-state `m.room.retention` events into the `room_retention` database table. ([\#8527](matrix-org/synapse#8527))
- Fix not sending events over federation when using sharded event writers. ([\#8536](matrix-org/synapse#8536))
- Fix a long standing bug where email notifications for encrypted messages were blank. ([\#8545](matrix-org/synapse#8545))
- Fix increase in the number of `There was no active span...` errors logged when using OpenTracing. ([\#8567](matrix-org/synapse#8567))
- Fix a bug that prevented errors encountered during execution of the `synapse_port_db` from being correctly printed. ([\#8585](matrix-org/synapse#8585))
- Fix appservice transactions to only include a maximum of 100 persistent and 100 ephemeral events. ([\#8606](matrix-org/synapse#8606))


Updates to the Docker image
---------------------------

- Added multi-arch support (arm64,arm/v7) for the docker images. Contributed by @maquis196. ([\#7921](matrix-org/synapse#7921))
- Add support for passing commandline args to the synapse process. Contributed by @samuel-p. ([\#8390](matrix-org/synapse#8390))


Improved Documentation
----------------------

- Update the directions for using the manhole with coroutines. ([\#8462](matrix-org/synapse#8462))
- Improve readme by adding new shield.io badges. ([\#8493](matrix-org/synapse#8493))
- Added note about docker in manhole.md regarding which ip address to bind to. Contributed by @maquis196. ([\#8526](matrix-org/synapse#8526))
- Document the new behaviour of the `allowed_lifetime_min` and `allowed_lifetime_max` settings in the room retention configuration. ([\#8529](matrix-org/synapse#8529))


Deprecations and Removals
-------------------------

- Drop unused `device_max_stream_id` table. ([\#8589](matrix-org/synapse#8589))


Internal Changes
----------------

- Check for unreachable code with mypy. ([\#8432](matrix-org/synapse#8432))
- Add unit test for event persister sharding. ([\#8433](matrix-org/synapse#8433))
- Allow events to be sent to clients sooner when using sharded event persisters. ([\#8439](matrix-org/synapse#8439), [\#8488](matrix-org/synapse#8488), [\#8496](matrix-org/synapse#8496), [\#8499](matrix-org/synapse#8499))
- Configure `public_baseurl` when using demo scripts. ([\#8443](matrix-org/synapse#8443))
- Add SQL logging on queries that happen during startup. ([\#8448](matrix-org/synapse#8448))
- Speed up unit tests when using PostgreSQL. ([\#8450](matrix-org/synapse#8450))
- Remove redundant database loads of stream_ordering for events we already have. ([\#8452](matrix-org/synapse#8452))
- Reduce inconsistencies between codepaths for membership and non-membership events. ([\#8463](matrix-org/synapse#8463))
- Combine `SpamCheckerApi` with the more generic `ModuleApi`. ([\#8464](matrix-org/synapse#8464))
- Additional testing for `ThirdPartyEventRules`. ([\#8468](matrix-org/synapse#8468))
- Add `-d` option to `./scripts-dev/lint.sh` to lint files that have changed since the last git commit. ([\#8472](matrix-org/synapse#8472))
- Unblacklist some sytests. ([\#8474](matrix-org/synapse#8474))
- Include the log level in the phone home stats. ([\#8477](matrix-org/synapse#8477))
- Remove outdated sphinx documentation, scripts and configuration. ([\#8480](matrix-org/synapse#8480))
- Clarify error message when plugin config parsers raise an error. ([\#8492](matrix-org/synapse#8492))
- Remove the deprecated `Handlers` object. ([\#8494](matrix-org/synapse#8494))
- Fix a threadsafety bug in unit tests. ([\#8497](matrix-org/synapse#8497))
- Add user agent to user_daily_visits table. ([\#8503](matrix-org/synapse#8503))
- Add type hints to various parts of the code base. ([\#8407](matrix-org/synapse#8407), [\#8505](matrix-org/synapse#8505), [\#8507](matrix-org/synapse#8507), [\#8547](matrix-org/synapse#8547), [\#8562](matrix-org/synapse#8562), [\#8609](matrix-org/synapse#8609))
- Remove unused code from the test framework. ([\#8514](matrix-org/synapse#8514))
- Apply some internal fixes to the `HomeServer` class to make its code more idiomatic and statically-verifiable. ([\#8515](matrix-org/synapse#8515))
- Factor out common code between `RoomMemberHandler._locally_reject_invite` and `EventCreationHandler.create_event`. ([\#8537](matrix-org/synapse#8537))
- Improve database performance by executing more queries without starting transactions. ([\#8542](matrix-org/synapse#8542))
- Rename `Cache` to `DeferredCache`, to better reflect its purpose. ([\#8548](matrix-org/synapse#8548))
- Move metric registration code down into `LruCache`. ([\#8561](matrix-org/synapse#8561), [\#8591](matrix-org/synapse#8591))
- Replace `DeferredCache` with the lighter-weight `LruCache` where possible. ([\#8563](matrix-org/synapse#8563))
- Add virtualenv-generated folders to `.gitignore`. ([\#8566](matrix-org/synapse#8566))
- Add `get_immediate` method to `DeferredCache`. ([\#8568](matrix-org/synapse#8568))
- Fix mypy not properly checking across the codebase, additionally, fix a typing assertion error in `handlers/auth.py`. ([\#8569](matrix-org/synapse#8569))
- Fix `synmark` benchmark runner. ([\#8571](matrix-org/synapse#8571))
- Modify `DeferredCache.get()` to return `Deferred`s instead of `ObservableDeferred`s. ([\#8572](matrix-org/synapse#8572))
- Adjust a protocol-type definition to fit `sqlite3` assertions. ([\#8577](matrix-org/synapse#8577))
- Support macOS on the `synmark` benchmark runner. ([\#8578](matrix-org/synapse#8578))
- Update `mypy` static type checker to 0.790. ([\#8583](matrix-org/synapse#8583), [\#8600](matrix-org/synapse#8600))
- Re-organize the structured logging code to separate the TCP transport handling from the JSON formatting. ([\#8587](matrix-org/synapse#8587))
- Remove extraneous unittest logging decorators from unit tests. ([\#8592](matrix-org/synapse#8592))
- Minor optimisations in caching code. ([\#8593](matrix-org/synapse#8593), [\#8594](matrix-org/synapse#8594))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants