Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Check appservice user interest against the local users instead of all users (get_users_in_room mis-use) #13958

Merged

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Sep 29, 2022

Check appservice user interest against the local users instead of all users (get_users_in_room mis-use).

get_local_users_in_room is way more performant since it looks at a single table (local_current_membership) and is looking through way less data since it only worries about the local users in the room instead of everyone in the room across the federation.

Fix #13942
(probably fixes, we don't know for sure but it's our hunch atm)

Related to matrix-org/matrix-spec#1272

It has also become a spec-compliance issue and this PR will align our behavior to MSC3905

Related to the following PRs where we also cleaned up some get_users_in_room mis-uses:

See #13575 (comment) for the original exploration around finding get_users_in_room mis-uses.

Problem

#13575 made the get_users_in_room SQL query a little more complicated by adding an extra join statement to the query which means it takes more time.

This extra database time can show up as a hot-spot for homeservers that make heavy use of application services and bridged users (like Element One), see #13942

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

… users

`get_local_users_in_room` is way more performant since it looks at a single
table (`local_current_membership`) and is looking through way less
data since it only worries about the local users in the room instead
of everyone in the room across the federation.

Fix #13942

Related to:

 - #13605
 - #13608
 - #13606
@MadLittleMods MadLittleMods added A-Application-Service Related to AS support A-Performance Performance, both client-facing and admin-facing T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Sep 29, 2022
Comment on lines +178 to +185
# In the future, we can consider re-using
# `store.get_app_service_users_in_room` which is very similar to this
# function but has a slightly worse performance than this because we
# have an early escape-hatch if we find a single user that the
# appservice is interested in. The juice would be worth the squeeze if
# `store.get_app_service_users_in_room` was used in more places besides
# an experimental MSC. But for now we can avoid doing more work and
# barely using it later.
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context on this comment, see #13958 (comment)

@@ -0,0 +1 @@
Fix performance bottleneck with heavy appservice and bridged users in Synapse by checking appservice user interest against the local users instead of all users.
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone curious, there might be a few more get_users_in_room mis-uses in the sync and presence code for someone who has more domain knowledge on what those things should exactly do.

For example #13967 (feel free to pick it up)

@MadLittleMods MadLittleMods marked this pull request as ready for review September 30, 2022 01:05
@MadLittleMods MadLittleMods requested a review from a team as a code owner September 30, 2022 01:05
synapse/storage/databases/main/roommember.py Outdated Show resolved Hide resolved
# `store.get_app_service_users_in_room` was used in more places besides
# an experimental MSC. But for now we can avoid doing more work and
# barely using it later.
local_user_ids = await store.get_local_users_in_room(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the behavior of is_interested_in_room slightly. Previously, we could match remote users when the appservice's user regex didn't include a hostname (all of our test cases and examples in the documentation don't). Was that just a bug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that just a bug?

I don't see how an appservice could control remote users, so I think so...but might be worth asking someone with more info?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a question of controlling the users, but more of whether the application service wants to receive events and updates related to those users, which an application service may well want to do for events coming from non-local users.

So I think this is a change we wouldn't want to go ahead with.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I hadn't considered this use case.

For my own reference, the spec doesn't mention any restriction either: https://spec.matrix.org/v1.1/application-service-api/#registration

I've updated the comments to clarify local and remote so it's obvious if this is attempted again. We do have tests around this kinda but because they were just mocking get_users_in_room so they happily passed when I updated it to get_local_users_in_room as well 🤦‍♂️. I've added some actual homeserver integration tests and verified that they fail if we use get_local_users_in_room here ✅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #14000 to introduce these clarification changes with so we can leave the diff in this PR for reference of what we didn't want to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possibly, though the historical context (which may be lost to time) is still very prevelant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though the historical context (which may be lost to time) is still very prevelant here.

Is there something specific to point to here? Or just from your memory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implied parts of "r0 was a nightmare" above :p (mostly my memory, but also the memory of others from around that time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a MSC for this: MSC3905


And prodding this point in case we can merge earlier,

That being said, the changes here are valid - simply only returning events targeted at local users covers the same use cases.

I tend to agree with @anoadragon453 here (I think we agree 😬). The use cases of seeing events from remote users (like moderation) is already supported by matching the entire room with the rooms namespace regex so this PR should be ok to merge.

We have to rip the band-aid off at some point whether it before or after matrix-org/matrix-spec-proposals#3905 merges. Maybe there is a way to detect people relying on the old behavior? -> #13958 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSC3905 is merged and this behavior is ok to change now ✅

@MadLittleMods MadLittleMods reopened this Oct 4, 2022
Comment on lines +428 to +446
@parameterized.expand(
[
("@local_as_user:test", True),
# Defining remote users in an application service user namespace regex is a
# footgun since the appservice might assume that it'll receive all events
# sent by that remote user, but it will only receive events in rooms that
# are shared with a local user. So we just remove this footgun possibility
# entirely and we won't notify the application service based on remote
# users.
("@remote_as_user:remote", False),
]
)
def test_match_interesting_room_members(
self, interesting_user: str, should_notify: bool
):
"""
Test to make sure that a interesting user (local or remote) in the room is
notified as expected when someone else in the room sends a message.
"""
Copy link
Contributor Author

@MadLittleMods MadLittleMods Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note to previous reviewers, this parameterized test is new!

Originally from #14000 but this integration test actually passes and fails based on Synapse's behavior instead of mocked functions.

@MadLittleMods MadLittleMods removed the request for review from anoadragon453 October 24, 2022 20:52
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically wanted to make sure that an appservice registering interest in * users would still receive an event that was sent from a local user to a remote user. As in, the appservice would still receive events sent from an interesting user.

And here I believe that is the case, though it may be nice to check send_mock was called or not called both when a message is sent from alice and the parameterized user. Just to make sure.

synapse/appservice/__init__.py Show resolved Hide resolved
synapse/appservice/__init__.py Show resolved Hide resolved
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

docs/upgrade.md Outdated Show resolved Hide resolved
@anoadragon453 anoadragon453 enabled auto-merge (squash) October 27, 2022 17:49
@anoadragon453 anoadragon453 merged commit aa70556 into develop Oct 27, 2022
@anoadragon453 anoadragon453 deleted the madlittlemods/13942-appservice-get_users_in_room-mis-uses branch October 27, 2022 18:29
@MadLittleMods MadLittleMods added the A-Spec-Compliance places where synapse does not conform to the spec label Oct 27, 2022
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @squahtx, @clokep, @anoadragon453, and @turt2live 🦨

bradtgmurray added a commit to beeper/synapse-legacy-fork that referenced this pull request Nov 15, 2022
Synapse 1.71.0 (2022-11-08)
===========================

Please note that, as announced in the release notes for Synapse 1.69.0, legacy Prometheus metric names are now disabled by default.
They will be removed altogether in Synapse 1.73.0.
If not already done, server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.71/upgrade.html#upgrading-to-v1710) for more details.

**Note:** in line with our [deprecation policy](https://matrix-org.github.io/synapse/latest/deprecation_policy.html) for platform dependencies, this will be the last release to support PostgreSQL 10, which reaches upstream end-of-life on November 10th, 2022. Future releases of Synapse will require PostgreSQL 11+.

No significant changes since 1.71.0rc2.

Synapse 1.71.0rc2 (2022-11-04)
==============================

Improved Documentation
----------------------

- Document the changes to monthly active user metrics due to deprecation of legacy Prometheus metric names. ([\matrix-org#14358](matrix-org#14358), [\matrix-org#14360](matrix-org#14360))

Deprecations and Removals
-------------------------

- Disable legacy Prometheus metric names by default. They can still be re-enabled for now, but they will be removed altogether in Synapse 1.73.0. ([\matrix-org#14353](matrix-org#14353))

Internal Changes
----------------

- Run unit tests against Python 3.11. ([\matrix-org#13812](matrix-org#13812))

Synapse 1.71.0rc1 (2022-11-01)
==============================

Features
--------

- Support back-channel logouts from OpenID Connect providers. ([\matrix-org#11414](matrix-org#11414))
- Allow use of Postgres and SQLlite full-text search operators in search queries. ([\matrix-org#11635](matrix-org#11635), [\matrix-org#14310](matrix-org#14310), [\matrix-org#14311](matrix-org#14311))
- Implement [MSC3664](matrix-org/matrix-spec-proposals#3664), Pushrules for relations. Contributed by Nico. ([\matrix-org#11804](matrix-org#11804))
- Improve aesthetics of HTML templates. Note that these changes do not retroactively apply to templates which have been [customised](https://matrix-org.github.io/synapse/latest/templates.html#templates) by server admins. ([\matrix-org#13652](matrix-org#13652))
- Enable write-ahead logging for SQLite installations. Contributed by [@asymmetric](https://github.com/asymmetric). ([\matrix-org#13897](matrix-org#13897))
- Show erasure status when [listing users](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html#query-user-account) in the Admin API. ([\matrix-org#14205](matrix-org#14205))
- Provide a specific error code when a `/sync` request provides a filter which doesn't represent a JSON object. ([\matrix-org#14262](matrix-org#14262))

Bugfixes
--------

- Fix a long-standing bug where the `update_synapse_database` script could not be run with multiple databases. Contributed by @thefinn93 @ Beeper. ([\matrix-org#13422](matrix-org#13422))
- Fix a bug which prevented setting an avatar on homeservers which have an explicit port in their `server_name` and have `max_avatar_size` and/or `allowed_avatar_mimetypes` configuration. Contributed by @ashfame. ([\matrix-org#13927](matrix-org#13927))
- Check appservice user interest against the local users instead of all users in the room to align with [MSC3905](matrix-org/matrix-spec-proposals#3905). ([\matrix-org#13958](matrix-org#13958))
- Fix a long-standing bug where Synapse would accidentally include extra information in the response to [`PUT /_matrix/federation/v2/invite/{roomId}/{eventId}`](https://spec.matrix.org/v1.4/server-server-api/#put_matrixfederationv2inviteroomideventid). ([\matrix-org#14064](matrix-org#14064))
- Fix a bug introduced in Synapse 1.64.0 where presence updates could be missing from `/sync` responses. ([\matrix-org#14243](matrix-org#14243))
- Fix a bug introduced in Synapse 1.60.0 which caused an error to be logged when Synapse received a SIGHUP signal if debug logging was enabled. ([\matrix-org#14258](matrix-org#14258))
- Prevent history insertion ([MSC2716](matrix-org/matrix-spec-proposals#2716)) during an partial join ([MSC3706](matrix-org/matrix-spec-proposals#3706)). ([\matrix-org#14291](matrix-org#14291))
- Fix a bug introduced in Synapse 1.34.0 where device names would be returned via a federation user key query request when `allow_device_name_lookup_over_federation` was set to `false`. ([\matrix-org#14304](matrix-org#14304))
- Fix a bug introduced in Synapse 0.34.0 where logs could include error spam when background processes are measured as taking a negative amount of time. ([\matrix-org#14323](matrix-org#14323))
- Fix a bug introduced in Synapse 1.70.0 where clients were unable to PUT new [dehydrated devices](matrix-org/matrix-spec-proposals#2697). ([\matrix-org#14336](matrix-org#14336))

Improved Documentation
----------------------

- Explain how to disable the use of [`trusted_key_servers`](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#trusted_key_servers). ([\matrix-org#13999](matrix-org#13999))
- Add workers settings to [configuration manual](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#individual-worker-configuration). ([\matrix-org#14086](matrix-org#14086))
- Correct the name of the config option [`encryption_enabled_by_default_for_room_type`](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#encryption_enabled_by_default_for_room_type). ([\matrix-org#14110](matrix-org#14110))
- Update docstrings of `SynapseError` and `FederationError` to bettter describe what they are used for and the effects of using them are. ([\matrix-org#14191](matrix-org#14191))

Internal Changes
----------------

- Remove unused `@lru_cache` decorator. ([\matrix-org#13595](matrix-org#13595))
- Save login tokens in database and prevent login token reuse. ([\matrix-org#13844](matrix-org#13844))
- Refactor OIDC tests to better mimic an actual OIDC provider. ([\matrix-org#13910](matrix-org#13910))
- Fix type annotation causing import time error in the Complement forking launcher. ([\matrix-org#14084](matrix-org#14084))
- Refactor [MSC3030](matrix-org/matrix-spec-proposals#3030) `/timestamp_to_event` endpoint to loop over federation destinations with standard pattern and error handling. ([\matrix-org#14096](matrix-org#14096))
- Add initial power level event to batch of bulk persisted events when creating a new room. ([\matrix-org#14228](matrix-org#14228))
- Refactor `/key/` endpoints to use `RestServlet` classes. ([\matrix-org#14229](matrix-org#14229))
- Switch to using the `matrix-org/backend-meta` version of `triage-incoming` for new issues in CI. ([\matrix-org#14230](matrix-org#14230))
- Build wheels on macos 11, not 10.15. ([\matrix-org#14249](matrix-org#14249))
- Add debugging to help diagnose lost device list updates. ([\matrix-org#14268](matrix-org#14268))
- Add Rust cache to CI for `trial` runs. ([\matrix-org#14287](matrix-org#14287))
- Improve type hinting of `RawHeaders`. ([\matrix-org#14303](matrix-org#14303))
- Use Poetry 1.2.0 in the Twisted Trunk CI job. ([\matrix-org#14305](matrix-org#14305))

<details>
<summary>Dependency updates</summary>

Runtime:

- Bump anyhow from 1.0.65 to 1.0.66. ([\matrix-org#14278](matrix-org#14278))
- Bump jinja2 from 3.0.3 to 3.1.2. ([\matrix-org#14271](matrix-org#14271))
- Bump prometheus-client from 0.14.0 to 0.15.0. ([\matrix-org#14274](matrix-org#14274))
- Bump psycopg2 from 2.9.4 to 2.9.5. ([\matrix-org#14331](matrix-org#14331))
- Bump pysaml2 from 7.1.2 to 7.2.1. ([\matrix-org#14270](matrix-org#14270))
- Bump sentry-sdk from 1.5.11 to 1.10.1. ([\matrix-org#14330](matrix-org#14330))
- Bump serde from 1.0.145 to 1.0.147. ([\matrix-org#14277](matrix-org#14277))
- Bump serde_json from 1.0.86 to 1.0.87. ([\matrix-org#14279](matrix-org#14279))

Tooling and CI:

- Bump black from 22.3.0 to 22.10.0. ([\matrix-org#14328](matrix-org#14328))
- Bump flake8-bugbear from 21.3.2 to 22.9.23. ([\matrix-org#14042](matrix-org#14042))
- Bump peaceiris/actions-gh-pages from 3.8.0 to 3.9.0. ([\matrix-org#14276](matrix-org#14276))
- Bump peaceiris/actions-mdbook from 1.1.14 to 1.2.0. ([\matrix-org#14275](matrix-org#14275))
- Bump setuptools-rust from 1.5.1 to 1.5.2. ([\matrix-org#14273](matrix-org#14273))
- Bump twine from 3.8.0 to 4.0.1. ([\matrix-org#14332](matrix-org#14332))
- Bump types-opentracing from 2.4.7 to 2.4.10. ([\matrix-org#14133](matrix-org#14133))
- Bump types-requests from 2.28.11 to 2.28.11.2. ([\matrix-org#14272](matrix-org#14272))
</details>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Application-Service Related to AS support A-Performance Performance, both client-facing and admin-facing A-Spec-Compliance places where synapse does not conform to the spec T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some Synapse instances have been hammering their database after v1.66.0 -> v1.68.0 update
5 participants