Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix federation stall on concurrent access errors #9639

Merged

Conversation

ShadowJonathan
Copy link
Contributor

@ShadowJonathan ShadowJonathan commented Mar 17, 2021

This PR fixes #9635 assertively enough by ensuring that _store_destination_rooms_entries_txn (now just store_destination_rooms_entries) doesn't raise psycopg2.errors.SerializationFailure as much to get the process_event_queue_for_federation loop stuck.

It achieves this by replacing the txn.execute_batch call (which does many INSERTs) with one simple_upsert_many call (which effectively does "one" INSERT)

This decreases the load on the database enough to ensure SerializationFailure errors dont happen (as often) again.

What i thought this PR would do changed with some conversation, under here is the previous content of the PR, it is faulty, but here for historical reference.

Previous content

This PR fixes #9635 enough by ensuring that the _store_destination_rooms_entries_txn db subroutines gets retried a few times when it fails with psycopg2.errors.SerializationFailure.

This is fine, as this failure is simply a rollback of the transaction if another transaction had changed the view of the table concurrently, which means the "other" transaction has succeeded, and this one has failed, thus, this is infinitely retry-able until this transaction "succeeds" while the other "loses". In the very worst case, this blocks until the destinations table is less congested.

This PR will change the methods involved to use SerializationFailure-tolerant ones, such as simple_upsert_many, that will retry up to about 5 times until it succeeds. This means that every db interaction has sufficiently higher chance to succeed compared to previous behaviour. (probably only resulting in one or two "retries" before it succeeds in most scenarious)

And to relieve some of that congestion, the second fix this PR gives is making sure every handle_room_events background task only instructs it's (eventual) _send_pdu calls to only call self.store.store_destination_rooms_entries when it is handling the final event in the (per-room) event list.

Again, this is fine, as destinations_rooms is used to calculate which events need to be caught-up with, and in the worst case (when the federation sender shuts down or aborts in the middle of a handle_room_events task), it'll cause the catch-up to notify more events (likely 1 more event) some servers, but it wont drop any to send.

This'll also cause federation to deliver to large rooms such as #matrix:matrix.org more efficiently, as the store_destination_rooms_entries call is only done once on the last event that is being caught up with, meaning that if that call is expensive, and federation events start "backing up", it'll be able to keep pace by calling this only once per room in the batch of 100 events that're submitted to per-destination queues. This'll increase performance substantially on smaller servers.

Edit: Some of the principles this PR was made on (for destination_rooms) were likely to be false, this PR will only be about the fix, and the optimisation will be tried in another PR.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
  • Pull request includes a sign off
  • Code style is correct (run the linters)

Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>

@clokep clokep requested a review from a team March 17, 2021 13:41
@ShadowJonathan
Copy link
Contributor Author

The way it increases performance is due to _send_pdu previously calling store_destination_rooms_entries once per event, this can be costly for large rooms.

Now, it calls it only once per room per process_event_queue_for_federation batch (max 100), which means once pressure is applied onto large rooms by local users, process_event_queue_for_federation will start to process larger batches, which then call store_destination_rooms_entries less, which means that it is able to "catch up" more efficiently per batch, until an equilibrium is reached.

@erikjohnston
Copy link
Member

Oh, this is very interesting. I'm surprised that its so contested and is introducing delays. My only concern with your approach is that it means not keeping that table up to date, and I'm not sure what the consequences of that are.

We recently introduced a change which handles the SerializationFailure by doing upserts outside of transactions, so I wonder if just splitting store_destination_rooms_entries_txn into two would magically help here, e.g. something like:

    async def store_destination_rooms_entries(
        self,
        destinations: Iterable[str],
        room_id: str,
        stream_ordering: int,
    ) -> None:
        """
        Updates or creates `destination_rooms` entries in batch for a single event.

        Args:
            destinations: list of destinations
            room_id: the room_id of the event
            stream_ordering: the stream_ordering of the event
        """

        await self.db_pool.simple_upsert_many(
            table="destinations",
            key_names=("destination",),
            key_values=[(d,) for d in destinations],
            value_names=[],
            value_values=[],
            desc="store_destination_rooms_entries_dests",
        )

        rows = [(destination, room_id) for destination in destinations]
        await self.db_pool.simple_upsert_many(
            table="destination_rooms",
            key_names=("destination", "room_id"),
            key_values=rows,
            value_names=["stream_ordering"],
            value_values=[(stream_ordering,)] * len(rows),
            desc="store_destination_rooms_entries_rooms",
        )

If that works, then yay we can avoid some of the complexity of this PR.

@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Mar 18, 2021

That fixes #9635, but that doesn't fix the performance improvement I've noticed (which I still kinda wanna fix), of which I've hypothesised the following;

Again, this is fine, as destinations_rooms is used to calculate which events need to be caught-up with, and in the worst case (when the federation sender shuts down or aborts in the middle of a handle_room_events task), it'll cause the catch-up to notify more events (likely 1 more event) some servers, but it wont drop any to send.

Which is only in the case of which handle_room_events gets an exception thrown, which is then thrown into store_destination_rooms_entries, which is caught and effectively transaction sending is reset (or stays) at the current stream_id in the database.

(I've looked at where destination_rooms is referenced in the codebase, it is only used in _get_catch_up_room_event_ids_txn and _get_catch_up_outstanding_destinations_txn, which results ultimately go into the following (snippets):

_get_catch_up_room_event_ids_txn:

# get at most 50 catchup room/PDUs
while True:
event_ids = await self._store.get_catch_up_room_event_ids(
self._destination,
self._last_successful_stream_ordering,
)

_get_catch_up_outstanding_destinations_txn:

async def _wake_destinations_needing_catchup(self):
"""
Wakes up destinations that need catch-up and are not currently being
backed off from.
In order to reduce load spikes, adds a delay between each destination.
"""
last_processed = None # type: Optional[str]
while True:
destinations_to_wake = (
await self.store.get_catch_up_outstanding_destinations(last_processed)
)

This missing an update (by crashing) isn't a huge deal, as it would only result in the federation sender sending a few more events the next time 'round)

@erikjohnston
Copy link
Member

I'm sort of wondering whether that will speed things up. The DB hits should be fast, and we'll be doing quite a lot of other DB hits during processing, so I'm slightly surprised if the fixed version would actually cause a slow down. I can completely believe that when things were getting contested, and retried, etc that would cause it to be a source of slowness.

@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Mar 18, 2021

I'm sort of wondering whether that will speed things up.

The federation senders now do an insert to destinations (for certainty that the foreign key exists), and an update to destination_rooms, this needs to be done concurrently, for every event, which is likely to result in many contentions for a slow postgres database with many federated events to multiple rooms concurrently.

The DB hits should be fast [...]

In my experience, with #matrix:matrix.org, they were not (for a small homeserver), as destinations and destination_rooms accesses/updates (on every event, for all 2k homeservers) are slow (and resulted in that concurrent access bug), i think this optimization (at least) will offer a lot to smaller and/or slower homeservers (as those will get the best performance bonus from this).

@erikjohnston
Copy link
Member

The federation senders now do an insert to destinations (for certainty that the foreign key exists), and an update to destination_rooms, this needs to be done concurrently, for every event, which is likely to result in many contentions for a slow postgres database with many federated events to multiple rooms concurrently.

The insert into destinations shouldn't be bad in the contended case, as its just a DO NOTHING. My understanding is that destination_rooms shouldn't have concurrent writes to the same room, so also shouldn't be contended.

The thing is that while we do do this for every event, we also also do a lot of reads and writes to the DB for each transaction that takes place, which is often per event per destination, so even if we're doing a bunch of batching up my gut instinct would be they'd be dwarfed by all the other stuff.

Though given this is an issue occurring while your federation sender is busy catching up its entirely possible that all the performance characteristic are different than the "normal" case.

@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Mar 18, 2021

My understanding is that destination_rooms shouldn't have concurrent writes to the same room, so also shouldn't be contended.

It is, because for a batch from process_event_queue_for_federation is separated into the rooms that each events belong to, and so handle_room_events gets called for each of these batches, concurrently.

When the homeservers in each room overlap, then it gets contested, and so a room like #matrix:matrix.org causes a holdup on the database, and any other small room that shares a room with #matrix:matrix.org (very likely) gets no serialised access, no concurrent update.

And even for large rooms like #matrix:matrix.org, a call to update destination_rooms is very expensive (for slow databases, 2k destination_rooms entries updated on every event), it would be useful to minimise this as much as possible regardless of the concurrency problem.

@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Mar 19, 2021

...wait, I'm realising I might have it wrong about destination_rooms, I'll need to think about this for a moment.

Edit: I think i've misunderstood destination_rooms, and ive noted some other bugs that could happen (in regards to what happens if the last handle_event call wouldn't call _send_pdu for whatever reason), i'll probably need to think about how i'll rework this, i'll do that after this weekend.

@ShadowJonathan
Copy link
Contributor Author

Yeah, i think I'll remove the optimization and split that into another PR, as it currently has some problems that needs to be hashed out individually, I'll make this PR just about the fix, then

@ShadowJonathan ShadowJonathan marked this pull request as draft March 20, 2021 10:48
@ShadowJonathan
Copy link
Contributor Author

(I'm sorry for all the drama, this is still one of the first times I've looked at this code, gaining more understanding the longer i looked at it 😅)

@ShadowJonathan ShadowJonathan changed the title Fix #9635 and improve federation sending performance Fix #9635 Mar 20, 2021
@ShadowJonathan ShadowJonathan marked this pull request as ready for review March 20, 2021 17:45
@erikjohnston
Copy link
Member

Yeah, i think I'll remove the optimization and split that into another PR, as it currently has some problems that needs to be hashed out individually, I'll make this PR just about the fix, then

No worries! Your initial stab was perfectly reasonable, and we do similar things elsewhere. It's just one of those things that's best to avoid if possible, and I happen to have been staring at a lot of this sort of stuff recently 😅

@clokep
Copy link
Member

clokep commented Mar 22, 2021

Can we get an improved title on this PR? I find it best to put the issue that it fixes in the description and not in the title and have the title be a short bit on what the change actually is.

@ShadowJonathan ShadowJonathan changed the title Fix #9635 Fix #9635 (federation stall on concurrent access errors) Mar 22, 2021
@clokep clokep changed the title Fix #9635 (federation stall on concurrent access errors) Fix federation stall on concurrent access errors Mar 22, 2021
@erikjohnston erikjohnston merged commit 0caf2a3 into matrix-org:develop Mar 23, 2021
@richvdh
Copy link
Member

richvdh commented Mar 26, 2021

It might be helpful to update the description of the PR too. AFAICT it ended up being quite different to what was written originally.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Apr 28, 2021
Synapse 1.32.2 (2021-04-22)
===========================

This release includes a fix for a regression introduced in 1.32.0.

Bugfixes
--------

- Fix a regression in Synapse 1.32.0 and 1.32.1 which caused `LoggingContext` errors in plugins. ([\#9857](matrix-org/synapse#9857))


Synapse 1.32.1 (2021-04-21)
===========================

This release fixes [a regression](matrix-org/synapse#9853)
in Synapse 1.32.0 that caused connected Prometheus instances to become unstable.

However, as this release is still subject to the `LoggingContext` change in 1.32.0,
it is recommended to remain on or downgrade to 1.31.0.

Bugfixes
--------

- Fix a regression in Synapse 1.32.0 which caused Synapse to report large numbers of Prometheus time series, potentially overwhelming Prometheus instances. ([\#9854](matrix-org/synapse#9854))


Synapse 1.32.0 (2021-04-20)
===========================

**Note:** This release introduces [a regression](matrix-org/synapse#9853)
that can overwhelm connected Prometheus instances. This issue was not present in
1.32.0rc1. If affected, it is recommended to downgrade to 1.31.0 in the meantime, and
follow [these instructions](matrix-org/synapse#9854 (comment))
to clean up any excess writeahead logs.

**Note:** This release also mistakenly included a change that may affected Synapse
modules that import `synapse.logging.context.LoggingContext`, such as
[synapse-s3-storage-provider](https://github.com/matrix-org/synapse-s3-storage-provider).
This will be fixed in a later Synapse version.

**Note:** This release requires Python 3.6+ and Postgres 9.6+ or SQLite 3.22+.

This release removes the deprecated `GET /_synapse/admin/v1/users/<user_id>` admin API. Please use the [v2 API](https://github.com/matrix-org/synapse/blob/develop/docs/admin_api/user_admin_api.rst#query-user-account) instead, which has improved capabilities.

This release requires Application Services to use type `m.login.application_service` when registering users via the `/_matrix/client/r0/register` endpoint to comply with the spec. Please ensure your Application Services are up to date.

If you are using the `packages.matrix.org` Debian repository for Synapse packages,
note that we have recently updated the expiry date on the gpg signing key. If you see an
error similar to `The following signatures were invalid: EXPKEYSIG F473DD4473365DE1`, you
will need to get a fresh copy of the keys. You can do so with:

```sh
sudo wget -O /usr/share/keyrings/matrix-org-archive-keyring.gpg https://packages.matrix.org/debian/matrix-org-archive-keyring.gpg
```

Bugfixes
--------

- Fix the log lines of nested logging contexts. Broke in 1.32.0rc1. ([\#9829](matrix-org/synapse#9829))


Synapse 1.32.0rc1 (2021-04-13)
==============================

Features
--------

- Add a Synapse module for routing presence updates between users. ([\#9491](matrix-org/synapse#9491))
- Add an admin API to manage ratelimit for a specific user. ([\#9648](matrix-org/synapse#9648))
- Include request information in structured logging output. ([\#9654](matrix-org/synapse#9654))
- Add `order_by` to the admin API `GET /_synapse/admin/v2/users`. Contributed by @dklimpel. ([\#9691](matrix-org/synapse#9691))
- Replace the `room_invite_state_types` configuration setting with `room_prejoin_state`. ([\#9700](matrix-org/synapse#9700))
- Add experimental support for [MSC3083](matrix-org/matrix-spec-proposals#3083): restricting room access via group membership. ([\#9717](matrix-org/synapse#9717), [\#9735](matrix-org/synapse#9735))
- Update experimental support for Spaces: include `m.room.create` in the room state sent with room-invites. ([\#9710](matrix-org/synapse#9710))
- Synapse now requires Python 3.6 or later. It also requires Postgres 9.6 or later or SQLite 3.22 or later. ([\#9766](matrix-org/synapse#9766))


Bugfixes
--------

- Prevent `synapse_forward_extremities` and `synapse_excess_extremity_events` Prometheus metrics from initially reporting zero-values after startup. ([\#8926](matrix-org/synapse#8926))
- Fix recently added ratelimits to correctly honour the application service `rate_limited` flag. ([\#9711](matrix-org/synapse#9711))
- Fix longstanding bug which caused `duplicate key value violates unique constraint "remote_media_cache_thumbnails_media_origin_media_id_thumbna_key"` errors. ([\#9725](matrix-org/synapse#9725))
- Fix bug where sharded federation senders could get stuck repeatedly querying the DB in a loop, using lots of CPU. ([\#9770](matrix-org/synapse#9770))
- Fix duplicate logging of exceptions thrown during federation transaction processing. ([\#9780](matrix-org/synapse#9780))


Updates to the Docker image
---------------------------

- Move opencontainers labels to the final Docker image such that users can inspect them. ([\#9765](matrix-org/synapse#9765))


Improved Documentation
----------------------

- Make the `allowed_local_3pids` regex example in the sample config stricter. ([\#9719](matrix-org/synapse#9719))


Deprecations and Removals
-------------------------

- Remove old admin API `GET /_synapse/admin/v1/users/<user_id>`. ([\#9401](matrix-org/synapse#9401))
- Make `/_matrix/client/r0/register` expect a type of `m.login.application_service` when an Application Service registers a user, to align with [the relevant spec](https://spec.matrix.org/unstable/application-service-api/#server-admin-style-permissions). ([\#9548](matrix-org/synapse#9548))


Internal Changes
----------------

- Replace deprecated `imp` module with successor `importlib`. Contributed by Cristina Muñoz. ([\#9718](matrix-org/synapse#9718))
- Experiment with GitHub Actions for CI. ([\#9661](matrix-org/synapse#9661))
- Introduce flake8-bugbear to the test suite and fix some of its lint violations. ([\#9682](matrix-org/synapse#9682))
- Update `scripts-dev/complement.sh` to use a local checkout of Complement, allow running a subset of tests and have it use Synapse's Complement test blacklist. ([\#9685](matrix-org/synapse#9685))
- Improve Jaeger tracing for `to_device` messages. ([\#9686](matrix-org/synapse#9686))
- Add release helper script for automating part of the Synapse release process. ([\#9713](matrix-org/synapse#9713))
- Add type hints to expiring cache. ([\#9730](matrix-org/synapse#9730))
- Convert various testcases to `HomeserverTestCase`. ([\#9736](matrix-org/synapse#9736))
- Start linting mypy with `no_implicit_optional`. ([\#9742](matrix-org/synapse#9742))
- Add missing type hints to federation handler and server. ([\#9743](matrix-org/synapse#9743))
- Check that a `ConfigError` is raised, rather than simply `Exception`, when appropriate in homeserver config file generation tests. ([\#9753](matrix-org/synapse#9753))
- Fix incompatibility with `tox` 2.5. ([\#9769](matrix-org/synapse#9769))
- Enable Complement tests for [MSC2946](matrix-org/matrix-spec-proposals#2946): Spaces Summary API. ([\#9771](matrix-org/synapse#9771))
- Use mock from the standard library instead of a separate package. ([\#9772](matrix-org/synapse#9772))
- Update Black configuration to target Python 3.6. ([\#9781](matrix-org/synapse#9781))
- Add option to skip unit tests when building Debian packages. ([\#9793](matrix-org/synapse#9793))


Synapse 1.31.0 (2021-04-06)
===========================

**Note:** As announced in v1.25.0, and in line with the deprecation policy for platform dependencies, this is the last release to support Python 3.5 and PostgreSQL 9.5. Future versions of Synapse will require Python 3.6+ and PostgreSQL 9.6+, as per our [deprecation policy](docs/deprecation_policy.md).

This is also the last release that the Synapse team will be publishing packages for Debian Stretch and Ubuntu Xenial.


Improved Documentation
----------------------

- Add a document describing the deprecation policy for platform dependencies. ([\#9723](matrix-org/synapse#9723))


Internal Changes
----------------

- Revert using `dmypy run` in lint script. ([\#9720](matrix-org/synapse#9720))
- Pin flake8-bugbear's version. ([\#9734](matrix-org/synapse#9734))


Synapse 1.31.0rc1 (2021-03-30)
==============================

Features
--------

- Add support to OpenID Connect login for requiring attributes on the `userinfo` response. Contributed by Hubbe King. ([\#9609](matrix-org/synapse#9609))
- Add initial experimental support for a "space summary" API. ([\#9643](matrix-org/synapse#9643), [\#9652](matrix-org/synapse#9652), [\#9653](matrix-org/synapse#9653))
- Add support for the busy presence state as described in [MSC3026](matrix-org/matrix-spec-proposals#3026). ([\#9644](matrix-org/synapse#9644))
- Add support for credentials for proxy authentication in the `HTTPS_PROXY` environment variable. ([\#9657](matrix-org/synapse#9657))


Bugfixes
--------

- Fix a longstanding bug that could cause issues when editing a reply to a message. ([\#9585](matrix-org/synapse#9585))
- Fix the `/capabilities` endpoint to return `m.change_password` as disabled if the local password database is not used for authentication. Contributed by @dklimpel. ([\#9588](matrix-org/synapse#9588))
- Check if local passwords are enabled before setting them for the user. ([\#9636](matrix-org/synapse#9636))
- Fix a bug where federation sending can stall due to `concurrent access` database exceptions when it falls behind. ([\#9639](matrix-org/synapse#9639))
- Fix a bug introduced in Synapse 1.30.1 which meant the suggested `pip` incantation to install an updated `cryptography` was incorrect. ([\#9699](matrix-org/synapse#9699))


Updates to the Docker image
---------------------------

- Speed up Docker builds and make it nicer to test against Complement while developing (install all dependencies before copying the project). ([\#9610](matrix-org/synapse#9610))
- Include [opencontainers labels](https://github.com/opencontainers/image-spec/blob/master/annotations.md#pre-defined-annotation-keys) in the Docker image. ([\#9612](matrix-org/synapse#9612))


Improved Documentation
----------------------

- Clarify that `register_new_matrix_user` is present also when installed via non-pip package. ([\#9074](matrix-org/synapse#9074))
- Update source install documentation to mention platform prerequisites before the source install steps. ([\#9667](matrix-org/synapse#9667))
- Improve worker documentation for fallback/web auth endpoints. ([\#9679](matrix-org/synapse#9679))
- Update the sample configuration for OIDC authentication. ([\#9695](matrix-org/synapse#9695))


Internal Changes
----------------

- Preparatory steps for removing redundant `outlier` data from `event_json.internal_metadata` column. ([\#9411](matrix-org/synapse#9411))
- Add type hints to the caching module. ([\#9442](matrix-org/synapse#9442))
- Introduce flake8-bugbear to the test suite and fix some of its lint violations. ([\#9499](matrix-org/synapse#9499), [\#9659](matrix-org/synapse#9659))
- Add additional type hints to the Homeserver object. ([\#9631](matrix-org/synapse#9631), [\#9638](matrix-org/synapse#9638), [\#9675](matrix-org/synapse#9675), [\#9681](matrix-org/synapse#9681))
- Only save remote cross-signing and device keys if they're different from the current ones. ([\#9634](matrix-org/synapse#9634))
- Rename storage function to fix spelling and not conflict with another function's name. ([\#9637](matrix-org/synapse#9637))
- Improve performance of federation catch up by sending the latest events in the room to the remote, rather than just the last event sent by the local server. ([\#9640](matrix-org/synapse#9640), [\#9664](matrix-org/synapse#9664))
- In the `federation_client` commandline client, stop automatically adding the URL prefix, so that servlets on other prefixes can be tested. ([\#9645](matrix-org/synapse#9645))
- In the `federation_client` commandline client, handle inline `signing_key`s in `homeserver.yaml`. ([\#9647](matrix-org/synapse#9647))
- Fixed some antipattern issues to improve code quality. ([\#9649](matrix-org/synapse#9649))
- Add a storage method for pulling all current user presence state from the database. ([\#9650](matrix-org/synapse#9650))
- Import `HomeServer` from the proper module. ([\#9665](matrix-org/synapse#9665))
- Increase default join ratelimiting burst rate. ([\#9674](matrix-org/synapse#9674))
- Add type hints to third party event rules and visibility modules. ([\#9676](matrix-org/synapse#9676))
- Bump mypy-zope to 0.2.13 to fix "Cannot determine consistent method resolution order (MRO)" errors when running mypy a second time. ([\#9678](matrix-org/synapse#9678))
- Use interpreter from `$PATH` via `/usr/bin/env` instead of absolute paths in various scripts. ([\#9689](matrix-org/synapse#9689))
- Make it possible to use `dmypy`. ([\#9692](matrix-org/synapse#9692))
- Suppress "CryptographyDeprecationWarning: int_from_bytes is deprecated". ([\#9698](matrix-org/synapse#9698))
- Use `dmypy run` in lint script for improved performance in type-checking while developing. ([\#9701](matrix-org/synapse#9701))
- Fix undetected mypy error when using Python 3.6. ([\#9703](matrix-org/synapse#9703))
- Fix type-checking CI on develop. ([\#9709](matrix-org/synapse#9709))


Synapse 1.30.1 (2021-03-26)
===========================

This release is identical to Synapse 1.30.0, with the exception of explicitly
setting a minimum version of Python's Cryptography library to ensure that users
of Synapse are protected from the recent [OpenSSL security advisories](https://mta.openssl.org/pipermail/openssl-announce/2021-March/000198.html),
especially CVE-2021-3449.

Note that Cryptography defaults to bundling its own statically linked copy of
OpenSSL, which means that you may not be protected by your operating system's
security updates.

It's also worth noting that Cryptography no longer supports Python 3.5, so
admins deploying to older environments may not be protected against this or
future vulnerabilities. Synapse will be dropping support for Python 3.5 at the
end of March.


Updates to the Docker image
---------------------------

- Ensure that the docker container has up to date versions of openssl. ([\#9697](matrix-org/synapse#9697))


Internal Changes
----------------

- Enforce that `cryptography` dependency is up to date to ensure it has the most recent openssl patches. ([\#9697](matrix-org/synapse#9697))


Synapse 1.30.0 (2021-03-22)
===========================

Note that this release deprecates the ability for appservices to
call `POST /_matrix/client/r0/register`  without the body parameter `type`. Appservice
developers should use a `type` value of `m.login.application_service` as
per [the spec](https://matrix.org/docs/spec/application_service/r0.1.2#server-admin-style-permissions).
In future releases, calling this endpoint with an access token - but without a `m.login.application_service`
type - will fail.


No significant changes.


Synapse 1.30.0rc1 (2021-03-16)
==============================

Features
--------

- Add prometheus metrics for number of users successfully registering and logging in. ([\#9510](matrix-org/synapse#9510), [\#9511](matrix-org/synapse#9511), [\#9573](matrix-org/synapse#9573))
- Add `synapse_federation_last_sent_pdu_time` and `synapse_federation_last_received_pdu_time` prometheus metrics, which monitor federation delays by reporting the timestamps of messages sent and received to a set of remote servers. ([\#9540](matrix-org/synapse#9540))
- Add support for generating JSON Web Tokens dynamically for use as OIDC client secrets. ([\#9549](matrix-org/synapse#9549))
- Optimise handling of incomplete room history for incoming federation. ([\#9601](matrix-org/synapse#9601))
- Finalise support for allowing clients to pick an SSO Identity Provider ([MSC2858](matrix-org/matrix-spec-proposals#2858)). ([\#9617](matrix-org/synapse#9617))
- Tell spam checker modules about the SSO IdP a user registered through if one was used. ([\#9626](matrix-org/synapse#9626))


Bugfixes
--------

- Fix long-standing bug when generating thumbnails for some images with transparency: `TypeError: cannot unpack non-iterable int object`. ([\#9473](matrix-org/synapse#9473))
- Purge chain cover indexes for events that were purged prior to Synapse v1.29.0. ([\#9542](matrix-org/synapse#9542), [\#9583](matrix-org/synapse#9583))
- Fix bug where federation requests were not correctly retried on 5xx responses. ([\#9567](matrix-org/synapse#9567))
- Fix re-activating an account via the admin API when local passwords are disabled. ([\#9587](matrix-org/synapse#9587))
- Fix a bug introduced in Synapse 1.20 which caused incoming federation transactions to stack up, causing slow recovery from outages. ([\#9597](matrix-org/synapse#9597))
- Fix a bug introduced in v1.28.0 where the OpenID Connect callback endpoint could error with a `MacaroonInitException`. ([\#9620](matrix-org/synapse#9620))
- Fix Internal Server Error on `GET /_synapse/client/saml2/authn_response` request. ([\#9623](matrix-org/synapse#9623))


Updates to the Docker image
---------------------------

- Make use of an improved malloc implementation (`jemalloc`) in the docker image. ([\#8553](matrix-org/synapse#8553))


Improved Documentation
----------------------

- Add relayd entry to reverse proxy example configurations. ([\#9508](matrix-org/synapse#9508))
- Improve the SAML2 upgrade notes for 1.27.0. ([\#9550](matrix-org/synapse#9550))
- Link to the "List user's media" admin API from the media admin API docs. ([\#9571](matrix-org/synapse#9571))
- Clarify the spam checker modules documentation example to mention that `parse_config` is a required method. ([\#9580](matrix-org/synapse#9580))
- Clarify the sample configuration for `stats` settings. ([\#9604](matrix-org/synapse#9604))


Deprecations and Removals
-------------------------

- The `synapse_federation_last_sent_pdu_age` and `synapse_federation_last_received_pdu_age` prometheus metrics have been removed. They are replaced by `synapse_federation_last_sent_pdu_time` and `synapse_federation_last_received_pdu_time`. ([\#9540](matrix-org/synapse#9540))
- Registering an Application Service user without using the `m.login.application_service` login type will be unsupported in an upcoming Synapse release. ([\#9559](matrix-org/synapse#9559))


Internal Changes
----------------

- Add tests to ResponseCache. ([\#9458](matrix-org/synapse#9458))
- Add type hints to purge room and server notice admin API. ([\#9520](matrix-org/synapse#9520))
- Add extra logging to ObservableDeferred when callbacks throw exceptions. ([\#9523](matrix-org/synapse#9523))
- Fix incorrect type hints. ([\#9528](matrix-org/synapse#9528), [\#9543](matrix-org/synapse#9543), [\#9591](matrix-org/synapse#9591), [\#9608](matrix-org/synapse#9608), [\#9618](matrix-org/synapse#9618))
- Add an additional test for purging a room. ([\#9541](matrix-org/synapse#9541))
- Add a `.git-blame-ignore-revs` file with the hashes of auto-formatting. ([\#9560](matrix-org/synapse#9560))
- Increase the threshold before which outbound federation to a server goes into "catch up" mode, which is expensive for the remote server to handle. ([\#9561](matrix-org/synapse#9561))
- Fix spurious errors reported by the `config-lint.sh` script. ([\#9562](matrix-org/synapse#9562))
- Fix type hints and tests for BlacklistingAgentWrapper and BlacklistingReactorWrapper. ([\#9563](matrix-org/synapse#9563))
- Do not have mypy ignore type hints from unpaddedbase64. ([\#9568](matrix-org/synapse#9568))
- Improve efficiency of calculating the auth chain in large rooms. ([\#9576](matrix-org/synapse#9576))
- Convert `synapse.types.Requester` to an `attrs` class. ([\#9586](matrix-org/synapse#9586))
- Add logging for redis connection setup. ([\#9590](matrix-org/synapse#9590))
- Improve logging when processing incoming transactions. ([\#9596](matrix-org/synapse#9596))
- Remove unused `stats.retention` setting, and emit a warning if stats are disabled. ([\#9604](matrix-org/synapse#9604))
- Prevent attempting to bundle aggregations for state events in /context APIs. ([\#9619](matrix-org/synapse#9619))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Federation catchup stalls due to "concurrent access" exception thrown in the process_event_queue task
4 participants