Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

speed up /members and add at= and membership params #3568

Merged
merged 31 commits into from
Aug 15, 2018
Merged

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Jul 20, 2018

builds on #3567 (in turn #3331 (in turn #2970)). (all now merged)

ara4n added 11 commits June 4, 2018 01:04
as per the proposal; we can deduplicate redundant lazy-loaded members
which are sent in the same sync sequence. we do this heuristically
rather than requiring the client to somehow tell us which members it
has chosen to cache, by instead caching the last N members sent to
a client, and not sending them again.  For now we hardcode N to 100.
Each cache for a given (user,device) tuple is in turn cached for up to
X minutes (to avoid the caches building up).  For now we hardcode X to 30.
ara4n added a commit to matrix-org/sytest that referenced this pull request Jul 20, 2018
@ara4n ara4n requested a review from richvdh July 20, 2018 23:57
@@ -89,6 +89,69 @@ def _get_current_state_ids_txn(txn):
_get_current_state_ids_txn,
)

# FIXME: how should this be cached?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understanding the caching stuff to know how best to handle this - or whether this method should be combined with get_current_state_ids above, and somehow share the same caching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you've got three choices:

  1. treat each call to get_filtered_current_state_ids as a separate cacheable thing - so for example if you do a lookup of get_filtered_current_state_ids(room, None, None), a lookup of get_filtered_current_state_ids(room, [(t, k)], None) would not hit the cache. To do this, you just need an @cached decorator, plus invalidation calls.

  2. Implement this in terms of a call to get_current_state_ids followed by a filter. This might increase cache hits, but cache misses will of course be very expensive.

  3. use a DictionaryCache, like _state_group_cache uses, since DictionaryCache has support for partial caches. The keys would be room ids rather than state group ids. again you'd need to consider invalidation.

  4. Not bother with caching and see how much it hurts?

@richvdh richvdh requested a review from a team July 23, 2018 10:50
@ara4n ara4n mentioned this pull request Jul 23, 2018
@richvdh richvdh assigned ara4n and unassigned richvdh Jul 27, 2018
@ara4n ara4n changed the base branch from matthew/lazy_load_apis to develop August 12, 2018 11:12
this feels much clunkier and more complicated for both clients
and senders than just querying based on event_id, and i'm failing
to see any way it's more correct than querying based on event_id.

also fixes a thinko to check whether the user is allowed to view
membership as of the given token
@ara4n
Copy link
Member Author

ara4n commented Aug 12, 2018

The conclusions of the API shape were roughly:

  • /members is the right API, as /context provides other unnecessary stuff, /state is needlessly generic (and the boat has sailed on fixing that) and /joined_members is either obsolete or very specifically optimised for a single use case (depending on who you ask).
  • We need to sync /members to /sync responses to avoid races when doing a bg sync on /members in keeping with responses from the /sync API.

However, as requested out-of-band, I've switched /members?at to take a stream token rather than an event ID. I am failing to justify this to myself as I can't see any bug introduced by using event IDs, and meanwhile the implementation gets more complicated both serverside & clientside by thinking in terms of stream tokens instead. I've done it anyway to try to get this through review before getting blocked any further. matrix-org/sytest#469 has also been updated.

@richvdh PTAL.

@ara4n ara4n assigned richvdh and unassigned ara4n Aug 12, 2018
@richvdh
Copy link
Member

richvdh commented Aug 13, 2018

However, as requested out-of-band, I've switched /members?at to take a stream token rather than an event ID. I am failing to justify this to myself as I can't see any bug introduced by using event IDs

one question which using event ids opens is: are you talking about the state before or after that event? I think you want the state after the event, but nothing else that takes an event_id does that. Indeed, I don't think "after event X" is powerful enough on its own to define a particular point in the room state.

Consider a room with a diamond in the DAG:

   A
 /   \
B     C
 \   /
   D

B and C are both state events. Does the state after B change once C arrives?

You can solve it by knowing that B's stream ordering is lower than C's, so that a client referring to the "state after B" will not have seen the state at C - but then you are just using the event id as a token in the sync stream, for which we use sync tokens. (A server could just use event ids as its stream tokens. But we don't mandate that, because it leaves more options for server implementations).

@defer.inlineCallbacks
def get_state_events(
self, user_id, room_id, types=None, filtered_types=None,
at_token=None, is_guest=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: trailing comma would be nice :/

at_token.room_key
).stream

# XXX: is this the right method to be using? What id we don't yet have an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's not; as you note, there is a problem if we don't yet have an event after the token, which is moderately likely given my understanding of the proposed usecase (cf https://docs.google.com/document/d/11yn-mAkYll10RJpN0mkYEVqraTbU3U4eQx9MNrzqX1U/edit?disco=AAAACCLJnz0: please can we have an answer there?)

I'd suggest doing the same as SyncHandler.get_state_at (which is basically to call get_recent_events_for_room and then apply the change implied by the event in question - though as the comments there say, that is buggy too :/)

room_state = yield self.store.get_state_for_events(
[membership_event_id], None
# check we are even allowed to be reading the room at this point
event = yield self.store.get_event(event_id, allow_none=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why allow_none? it's only going to make it explode in a more subtle way

event = yield self.store.get_event(event_id, allow_none=True)
visible_events = yield filter_events_for_client(self.store, user_id, [event])

if len(visible_events) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just if visible_events is more pythonic here

)
room_state = room_state[event.event_id]
else:
room_state = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the logic for returning an empty result rather than a 403?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only that get_state_events() felt like a pretty generic method, and raising a 403 might break stuff upstream. in practice it's only used by /state and /members so i think it's fine - have switched to throw 404 if the token isn't recognised and 403 if the event at that token is filtered out wrt our user.

@richvdh richvdh assigned ara4n and unassigned richvdh Aug 13, 2018
@ara4n
Copy link
Member Author

ara4n commented Aug 14, 2018

@richvdh thanks for review - ptal.

@ara4n ara4n assigned richvdh and unassigned ara4n Aug 14, 2018
room_state = yield self.store.get_state_for_events(
[membership_event_id], None
if not last_events:
raise SynapseError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use NotFoundError here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done in 0d5770d)

@@ -82,14 +82,49 @@ def get_room_data(self, user_id=None, room_id=None,
defer.returnValue(data)

@defer.inlineCallbacks
def get_state_events(self, user_id, room_id, is_guest=False):
def _check_in_room_or_world_readable(self, room_id, user_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f5189b9

@richvdh
Copy link
Member

richvdh commented Aug 15, 2018

I'm going to squash-merge this

@richvdh richvdh merged commit 2f78f43 into develop Aug 15, 2018
krombel added a commit to krombel/synapse that referenced this pull request Aug 15, 2018
richvdh added a commit that referenced this pull request Aug 22, 2018
Features
--------

- Add support for the SNI extension to federation TLS connections. Thanks to @vojeroen! ([\#3439](#3439))
- Add /_media/r0/config ([\#3184](#3184))
- speed up /members API and add `at` and `membership` params as per MSC1227 ([\#3568](#3568))
- implement `summary` block in /sync response as per MSC688 ([\#3574](#3574))
- Add lazy-loading support to /messages as per MSC1227 ([\#3589](#3589))
- Add ability to limit number of monthly active users on the server ([\#3633](#3633))
- Support more federation endpoints on workers ([\#3653](#3653))
- Basic support for room versioning ([\#3654](#3654))
- Ability to disable client/server Synapse via conf toggle ([\#3655](#3655))
- Ability to whitelist specific threepids against monthly active user limiting ([\#3662](#3662))
- Add some metrics for the appservice and federation event sending loops ([\#3664](#3664))
- Where server is disabled, block ability for locked out users to read new messages ([\#3670](#3670))
- set admin uri via config, to be used in error messages where the user should contact the administrator ([\#3687](#3687))
- Synapse's presence functionality can now be disabled with the "use_presence" configuration option. ([\#3694](#3694))
- For resource limit blocked users, prevent writing into rooms ([\#3708](#3708))

Bugfixes
--------

- Fix occasional glitches in the synapse_event_persisted_position metric ([\#3658](#3658))
- Fix bug on deleting 3pid when using identity servers that don't support unbind API ([\#3661](#3661))
- Make the tests pass on Twisted < 18.7.0 ([\#3676](#3676))
- Don’t ship recaptcha_ajax.js, use it directly from Google ([\#3677](#3677))
- Fixes test_reap_monthly_active_users so it passes under postgres ([\#3681](#3681))
- Fix mau blocking calulation bug on login ([\#3689](#3689))
- Fix missing yield in synapse.storage.monthly_active_users.initialise_reserved_users ([\#3692](#3692))
- Improve HTTP request logging to include all requests ([\#3700](#3700))
- Avoid timing out requests while we are streaming back the response ([\#3701](#3701))
- Support more federation endpoints on workers ([\#3705](#3705), [\#3713](#3713))
- Fix "Starting db txn 'get_all_updated_receipts' from sentinel context" warning ([\#3710](#3710))
- Fix bug where `state_cache` cache factor ignored environment variables ([\#3719](#3719))
- Fix bug in v0.33.3rc1 which caused infinite loops and OOMs ([\#3723](#3723))
- Fix bug introduced in v0.33.3rc1 which made the ToS give a 500 error ([\#3732](#3732))

Deprecations and Removals
-------------------------

- The Shared-Secret registration method of the legacy v1/register REST endpoint has been removed. For a replacement, please see [the admin/register API documentation](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/register_api.rst). ([\#3703](#3703))

Internal Changes
----------------

- The test suite now can run under PostgreSQL. ([\#3423](#3423))
- Refactor HTTP replication endpoints to reduce code duplication ([\#3632](#3632))
- Tests now correctly execute on Python 3. ([\#3647](#3647))
- Sytests can now be run inside a Docker container. ([\#3660](#3660))
- Port over enough to Python 3 to allow the sytests to start. ([\#3668](#3668))
- Update docker base image from alpine 3.7 to 3.8. ([\#3669](#3669))
- Rename synapse.util.async to synapse.util.async_helpers to mitigate async becoming a keyword on Python 3.7. ([\#3678](#3678))
- Synapse's tests are now formatted with the black autoformatter. ([\#3679](#3679))
- Implemented a new testing base class to reduce test boilerplate. ([\#3684](#3684))
- Rename MAU prometheus metrics ([\#3690](#3690))
- add new error type ResourceLimit ([\#3707](#3707))
- Logcontexts for replication command handlers ([\#3709](#3709))
- Update admin register API documentation to reference a real user ID. ([\#3712](#3712))
@hawkowl hawkowl deleted the matthew/members_at branch September 20, 2018 14:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants