Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in calculating state for non-gappy syncs #16942

Merged
merged 2 commits into from
Apr 4, 2024
Merged

Conversation

richvdh
Copy link
Member

@richvdh richvdh commented Feb 19, 2024

Unfortunately, the optimisation we applied here for non-gappy syncs is not actually valid.

Fixes #16941.

Based on #16930.
Requires matrix-org/sytest#1374.

@richvdh
Copy link
Member Author

richvdh commented Feb 23, 2024

This doesn't help that much if the client has enabled lazy-loading. S2 is still not sent to the client if it is filtered out via the lazy-loading filter.

Lazy-loading is disabled for a related reason on gappy syncs (see element-hq/element-web#7211). Maybe we also need to disable it for non-gappy syncs?

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this is correct.

It's really quite confusing that get_state_at always returns the state at an event, rather than at the token you give it.

Maybe for sliding sync we can actually use the state at a stream token (as I believe we store a mapping from stream ordering to forward extremities), which should make some of this stuff more intuitive??!?!?!?!

richvdh added 2 commits April 4, 2024 16:52
Unfortunately, the optimisation we applied here for non-gappy syncs is not
actually valid.
@richvdh richvdh force-pushed the rav/state_bug_16941 branch from b1b8247 to 295c9db Compare April 4, 2024 15:52
@richvdh
Copy link
Member Author

richvdh commented Apr 4, 2024

It's really quite confusing that get_state_at always returns the state at an event, rather than at the token you give it.

Maybe for sliding sync we can actually use the state at a stream token (as I believe we store a mapping from stream ordering to forward extremities), which should make some of this stuff more intuitive??!?!?!?!

Yes, though in order to avoid repeatedly doing state res across a given set of extremities, we'd have to store a mapping from stream ordering to state group.

@richvdh richvdh enabled auto-merge (squash) April 4, 2024 15:58
@richvdh richvdh merged commit 0e68e9b into develop Apr 4, 2024
38 checks passed
@richvdh richvdh deleted the rav/state_bug_16941 branch April 4, 2024 16:15
@richvdh
Copy link
Member Author

richvdh commented Apr 4, 2024

This doesn't help that much if the client has enabled lazy-loading. S2 is still not sent to the client if it is filtered out via the lazy-loading filter.

Lazy-loading is disabled for a related reason on gappy syncs (see element-hq/element-web#7211). Maybe we also need to disable it for non-gappy syncs?

I had to remember what this was talking about. For the record, I believe it is the scenario I have now spun out to #17050.

@richvdh
Copy link
Member Author

richvdh commented Apr 4, 2024

It's really quite confusing that get_state_at always returns the state at an event, rather than at the token you give it.

Maybe for sliding sync we can actually use the state at a stream token (as I believe we store a mapping from stream ordering to forward extremities), which should make some of this stuff more intuitive??!?!?!?!

Indeed, though to avoid having to do the same state resolution repeatedly, we'd probably have to also map from a set of forward extremities to a state group?

erikjohnston added a commit that referenced this pull request Apr 8, 2024
PR #16942 removed an invalid optimisation that avoided pulling out state
for non-gappy syncs. This causes a large increase in DB usage. c.f. #16941
for why that optimisation was wrong.

However, we can still optimise in the simple case where the events in
the timeline are a linear chain without any branching/merging of the
DAG.
erikjohnston added a commit that referenced this pull request Apr 8, 2024
PR #16942 removed an invalid optimisation that avoided pulling out state
for non-gappy syncs. This causes a large increase in DB usage. c.f.
#16941 for why that optimisation was wrong.

However, we can still optimise in the simple case where the events in
the timeline are a linear chain without any branching/merging of the
DAG.

cc. @richvdh
hughns pushed a commit to hughns/synapse that referenced this pull request Apr 9, 2024
PR element-hq#16942 removed an invalid optimisation that avoided pulling out state
for non-gappy syncs. This causes a large increase in DB usage. c.f.
element-hq#16941 for why that optimisation was wrong.

However, we can still optimise in the simple case where the events in
the timeline are a linear chain without any branching/merging of the
DAG.

cc. @richvdh
yingziwu added a commit to yingziwu/synapse that referenced this pull request Apr 19, 2024
No significant changes since 1.105.0rc1.

- Stabilize support for [MSC4010](matrix-org/matrix-spec-proposals#4010) which clarifies the interaction of push rules and account data. Contributed by @clokep. ([\#17022](element-hq/synapse#17022))
- Stabilize support for [MSC3981](matrix-org/matrix-spec-proposals#3981): `/relations` recursion. Contributed by @clokep. ([\#17023](element-hq/synapse#17023))
- Add support for moving `/pushrules` off of main process. ([\#17037](element-hq/synapse#17037), [\#17038](element-hq/synapse#17038))

- Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16930](element-hq/synapse#16930), [\#16932](element-hq/synapse#16932), [\#16942](element-hq/synapse#16942), [\#17064](element-hq/synapse#17064), [\#17065](element-hq/synapse#17065), [\#17066](element-hq/synapse#17066))
- Fix server notice rooms not always being created as unencrypted rooms, even when `encryption_enabled_by_default_for_room_type` is in use (server notices are always unencrypted). ([\#17033](element-hq/synapse#17033))
- Fix the `.m.rule.encrypted_room_one_to_one` and `.m.rule.room_one_to_one` default underride push rules being in the wrong order. Contributed by @Sumpy1. ([\#17043](element-hq/synapse#17043))

- Refactor auth chain fetching to reduce duplication. ([\#17044](element-hq/synapse#17044))
- Improve database performance by adding a missing index to `access_tokens.refresh_token_id`. ([\#17045](element-hq/synapse#17045), [\#17054](element-hq/synapse#17054))
- Improve database performance by reducing number of receipts fetched when sending push notifications. ([\#17049](element-hq/synapse#17049))

* Bump packaging from 23.2 to 24.0. ([\#17027](element-hq/synapse#17027))
* Bump regex from 1.10.3 to 1.10.4. ([\#17028](element-hq/synapse#17028))
* Bump ruff from 0.3.2 to 0.3.5. ([\#17060](element-hq/synapse#17060))
* Bump serde_json from 1.0.114 to 1.0.115. ([\#17041](element-hq/synapse#17041))
* Bump types-pillow from 10.2.0.20240125 to 10.2.0.20240406. ([\#17061](element-hq/synapse#17061))
* Bump types-requests from 2.31.0.20240125 to 2.31.0.20240406. ([\#17063](element-hq/synapse#17063))
* Bump typing-extensions from 4.9.0 to 4.11.0. ([\#17062](element-hq/synapse#17062))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/sync incorrectly calculates state changes for non-gappy syncs
2 participants