Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

We keep trying to fetch events whose keys have expired or have been invalidated #13676

Open
Tracked by #15182
richvdh opened this issue Aug 30, 2022 · 4 comments
Open
Tracked by #15182
Labels
A-Federation A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) A-Performance Performance, both client-facing and admin-facing O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Aug 30, 2022

Sometimes we try to fetch an event over federation (notably, if it forms part of the state or auth chain), and we try to validate its signature,

However, if we are unable to perform that validation, then we discard the event; but we might do the same again almost immediately. Possibly we need some sort of backoff.

@richvdh richvdh added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. S-Major Major functionality / product severely impaired, no satisfactory workaround. O-Occasional Affects or can be seen by some users regularly or most users rarely A-Federation A-Performance Performance, both client-facing and admin-facing labels Aug 30, 2022
@MadLittleMods MadLittleMods added the A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) label Aug 30, 2022
@MadLittleMods
Copy link
Contributor

Possibly we need some sort of backoff.

Tracked by #13622 and #13623

@richvdh
Copy link
Member Author

richvdh commented Aug 31, 2022

Tracked by #13622...

I think #13622 is more about trying to /backfill from a given backwards extremity, than requesting individual /events? But if your point is they both need a backoff mechanism, then yes!

@MadLittleMods
Copy link
Contributor

@richvdh I thought this was mainly talking about /backfill since it was mentioned in #13677. This situation does happen for backfill because we either request /state or /event when doing _get_state_ids_after_missing_prev_event() and #13622 will stop us getting into that situation over and over.

But I see how we could call /event for other things 👍

@MadLittleMods
Copy link
Contributor

👍 I've added this as part of #13700 to track signature failures.

By having some backoff and processing in the background, we can save 128s that _get_state_ids_after_missing_prev_event takes for #matrix:matrix.org when backfilling messages.

To explain an exact scenario around /messages -> backfill, we call /backfill and first check the signatures of the 100 events. We see bad signature for $luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M and $zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references $luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M as a prev_event. So we have to do the whole _get_state_ids_after_missing_prev_event rigmarole which pulls in those same events which fail again because the signatures are still invalid.

  • backfill
    • outgoing-federation-request /backfill
    • _check_sigs_and_hash_and_fetch
      • _check_sigs_and_hash_and_fetch_one for each event received over backfill
        • $luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M fails with Signature on retrieved event was invalid.: unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)
        • $zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw fails with Signature on retrieved event was invalid.: unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)
    • _process_pulled_events
      • _process_pulled_event for each validated event
        • ❗ Event $Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0 references $luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M as a prev_event which is missing so we try to get it
          • _get_state_ids_after_missing_prev_event
            • outgoing-federation-request /state_ids
            • get_pdu for $luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M which fails the signature check again
            • get_pdu for $zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw which fails the signature check

MadLittleMods added a commit that referenced this issue Sep 15, 2022
…e is invalid

Related to
 - #13622
    - #13635
 - #13676

Follow-up to #13815
which tracks event signature failures.

This PR aims to stop us from trying to
`_get_state_ids_after_missing_prev_event` after
we already know that the prev_event will fail
from a previous attempt

To explain an exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.

 - `backfill`
    - `outgoing-federation-request` `/backfill`
    - `_check_sigs_and_hash_and_fetch`
       - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
          - ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
          - ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
   - `_process_pulled_events`
      - `_process_pulled_event` for each validated event
         - ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
            - `_get_state_ids_after_missing_prev_event`
               - `outgoing-federation-request` `/state_ids`
               - ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
               - ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check

With this PR, we no longer call the burdensome `_get_state_ids_after_missing_prev_event`
because the signature failure will count as an attempt before we try to run this.
MadLittleMods added a commit that referenced this issue Oct 3, 2022
Because we're doing the recording in `_check_sigs_and_hash_for_pulled_events_and_fetch` (previously named `_check_sigs_and_hash_and_fetch`), this means we will track signature failures for `backfill`, `get_room_state`, `get_event_auth`, and `get_missing_events` (all pulled event scenarios). And we also record signature failures from `get_pdu`.

Part of #13700

Part of #13676 and #13356

This PR will be especially important for #13816 so we can avoid the costly `_get_state_ids_after_missing_prev_event` down the line when `/messages` calls backfill.
MadLittleMods added a commit that referenced this issue Oct 15, 2022
…ure is invalid (#13816)

While #13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place.

Related to
 - #13622
    - #13635
 - #13676

Part of #13356

Follow-up to #13815 which tracks event signature failures.

With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now.

For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761

To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.

 - `backfill`
    - `outgoing-federation-request` `/backfill`
    - `_check_sigs_and_hash_and_fetch`
       - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
          - ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
          - ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
   - `_process_pulled_events`
      - `_process_pulled_event` for each validated event
         - ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
            - `_get_state_ids_after_missing_prev_event`
               - `outgoing-federation-request` `/state_ids`
               - ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
               - ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) A-Performance Performance, both client-facing and admin-facing O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

2 participants