Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Track when the pulled event signature fails #13815

Merged
1 change: 1 addition & 0 deletions changelog.d/13815.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Keep track when an event pulled over federation fails its signature check so we can intelligently back-off in the future.
25 changes: 22 additions & 3 deletions synapse/federation/federation_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Awaitable, Callable, Optional

from synapse.api.constants import MAX_DEPTH, EventContentFields, EventTypes, Membership
from synapse.api.errors import Codes, SynapseError
Expand Down Expand Up @@ -58,7 +58,12 @@ def __init__(self, hs: "HomeServer"):

@trace
async def _check_sigs_and_hash(
self, room_version: RoomVersion, pdu: EventBase
self,
room_version: RoomVersion,
pdu: EventBase,
record_failure_callback: Optional[
Callable[[EventBase, str], Awaitable[None]]
] = None,
Comment on lines +64 to +66
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, see #13815 (comment) for why we're using a callback pattern here.

) -> EventBase:
"""Checks that event is correctly signed by the sending server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to see a description of at least record_failure_callback — I think the other 2 are fairly self-evident so not fussed about those particularly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done in a later commit; in future please consider collapsing those into the same commit so that commit by commit review is possible for mere mortals like me who can't track the full thing ;-))

Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PR's are definitely not commit by commit review friendly. I find working that way in the first place is impossible.

And rebasing and splitting multiple changes in the same file is a super chore. I'd rather split stuff by PR which feel free to call out if you see something to split out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can make them a bit more CbC friendly, I would really appreciate it to make reviews less daunting if anything. (Maybe I'm the only one though? Definitely not a requirement, but just a preference from me especially when I'm not familiar with the area of code)

Since you seem to have the idea that it's difficult to do: My technique for things like this is to use lazygit (a TUI git frontend but with good mouse support too), which lets you stage changes line-by-line and easily amend (shift+A) them into earlier commits. It also makes it easy to reorder commits etc. I think some other tools support this kind of thing as well, but lazygit is the best I've found for easily crafting the commits into a meaningful arrangement.
I think you may be referring to git's interactive rebasing and interactive staging; you can definitely do this a lot more efficiently than that :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. I find splitting by commits especially useful to avoid conflating 'style' changes (e.g. moving methods, renaming methods) and semantic changes. I think asking you to put these in separate PRs would be too much, though and it would certainly hurt the latency of review.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the team thoughts around rebasing? Anyone annoyed/worried about references changing after they already pulled the branch?

If you're just looking for some of the low hanging fruit of pulling out some style changes or merging some subsequent changes back to the original chunk, that seems a lot more do-able. If we're talking about building up iteratively commit by commit, I have a lot more apprehension to put effort into doing that.

I assume you don't want me to rebase things together from different review cycles so you can easily review the things that have changed.

What about merge commits? I find it a lot harder to rebase around merge commits and would prefer to just rebase my branch than merge develop if I started doing this. But then the rebase would rewrite all of the history including stuff you already reviewed. Do you have a good way to see what's new even though I rebased on the latest develop? (does that even matter to you)


A note around rebased history: I really like the GitLab diff versioning so even if you rebase, you can still reference old diff. I used to use this all the time over there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note around rebased history: I really like the GitLab diff versioning so even if you rebase, you can still reference old diff. I used to use this all the time over there.

I have similar experiences with GL and reviewboard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the team thoughts around rebasing? Anyone annoyed/worried about references changing after they already pulled the branch?

I believe the rough consensus is: "don't rebase after someone has started to review the changes", because doing so resets the "I have viewed this file" in Github's review UI.


Expand All @@ -70,6 +75,11 @@ async def _check_sigs_and_hash(
Args:
room_version: The room version of the PDU
pdu: the event to be checked
record_failure_callback: A callback to run whenever the given event
fails signature or hash checks. This includes exceptions
that would be normally be thrown/raised but also things like
checking for event tampering where we just return the redacted
event.

Returns:
* the original event if the checks pass
Expand All @@ -80,7 +90,12 @@ async def _check_sigs_and_hash(
InvalidEventSignatureError if the signature check failed. Nothing
will be logged in this case.
"""
await _check_sigs_on_pdu(self.keyring, room_version, pdu)
try:
await _check_sigs_on_pdu(self.keyring, room_version, pdu)
except InvalidEventSignatureError as exc:
if record_failure_callback:
await record_failure_callback(pdu, str(exc))
raise exc

if not check_event_content_hash(pdu):
# let's try to distinguish between failures because the event was
Expand Down Expand Up @@ -116,6 +131,10 @@ async def _check_sigs_and_hash(
"event_id": pdu.event_id,
}
)
if record_failure_callback:
await record_failure_callback(
pdu, "Event content has been tampered with"
)
return redacted_event

spam_check = await self.spam_checker.check_event_for_spam(pdu)
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we will potentially record a failed attempt when the spam checker soft-fails an event. Part of #13700

Expand Down
50 changes: 40 additions & 10 deletions synapse/federation/federation_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ async def backfill(
pdus = [event_from_pdu_json(p, room_version) for p in transaction_data_pdus]

# Check signatures and hash of pdus, removing any from the list that fail checks
pdus[:] = await self._check_sigs_and_hash_and_fetch(
pdus[:] = await self._check_sigs_and_hash_for_pulled_events_and_fetch(
dest, pdus, room_version=room_version
)

Expand Down Expand Up @@ -328,7 +328,17 @@ async def get_pdu_from_destination_raw(

# Check signatures are correct.
try:
signed_pdu = await self._check_sigs_and_hash(room_version, pdu)

async def _record_failure_callback(
event: EventBase, cause: str
) -> None:
await self.store.record_event_failed_pull_attempt(
event.room_id, event.event_id, cause
)

signed_pdu = await self._check_sigs_and_hash(
room_version, pdu, _record_failure_callback
)
except InvalidEventSignatureError as e:
errmsg = f"event id {pdu.event_id}: {e}"
logger.warning("%s", errmsg)
Expand Down Expand Up @@ -547,24 +557,28 @@ async def get_room_state(
len(auth_event_map),
)

valid_auth_events = await self._check_sigs_and_hash_and_fetch(
valid_auth_events = await self._check_sigs_and_hash_for_pulled_events_and_fetch(
destination, auth_event_map.values(), room_version
)

valid_state_events = await self._check_sigs_and_hash_and_fetch(
destination, state_event_map.values(), room_version
valid_state_events = (
await self._check_sigs_and_hash_for_pulled_events_and_fetch(
destination, state_event_map.values(), room_version
)
)

return valid_state_events, valid_auth_events

@trace
async def _check_sigs_and_hash_and_fetch(
async def _check_sigs_and_hash_for_pulled_events_and_fetch(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this so we stick to using this only for pulled event scenarios

self,
origin: str,
pdus: Collection[EventBase],
room_version: RoomVersion,
) -> List[EventBase]:
"""Checks the signatures and hashes of a list of events.
"""
Checks the signatures and hashes of a list of pulled events we got from
federation and records any signature failures as failed pull attempts.

If a PDU fails its signature check then we check if we have it in
the database, and if not then request it from the sender's server (if that
Expand Down Expand Up @@ -597,11 +611,17 @@ async def _check_sigs_and_hash_and_fetch(

valid_pdus: List[EventBase] = []

async def _record_failure_callback(event: EventBase, cause: str) -> None:
await self.store.record_event_failed_pull_attempt(
event.room_id, event.event_id, cause
)

async def _execute(pdu: EventBase) -> None:
valid_pdu = await self._check_sigs_and_hash_and_fetch_one(
pdu=pdu,
origin=origin,
room_version=room_version,
record_failure_callback=_record_failure_callback,
)

if valid_pdu:
Expand All @@ -618,6 +638,9 @@ async def _check_sigs_and_hash_and_fetch_one(
pdu: EventBase,
origin: str,
room_version: RoomVersion,
record_failure_callback: Optional[
Callable[[EventBase, str], Awaitable[None]]
] = None,
) -> Optional[EventBase]:
"""Takes a PDU and checks its signatures and hashes.

Expand All @@ -634,14 +657,21 @@ async def _check_sigs_and_hash_and_fetch_one(
origin
pdu
room_version
record_failure_callback: A callback to run whenever the given event
fails signature or hash checks. This includes exceptions
that would be normally be thrown/raised but also things like
checking for event tampering where we just return the redacted
event.

Returns:
The PDU (possibly redacted) if it has valid signatures and hashes.
None if no valid copy could be found.
"""

try:
return await self._check_sigs_and_hash(room_version, pdu)
return await self._check_sigs_and_hash(
room_version, pdu, record_failure_callback
)
except InvalidEventSignatureError as e:
logger.warning(
"Signature on retrieved event %s was invalid (%s). "
Expand Down Expand Up @@ -694,7 +724,7 @@ async def get_event_auth(

auth_chain = [event_from_pdu_json(p, room_version) for p in res["auth_chain"]]

signed_auth = await self._check_sigs_and_hash_and_fetch(
signed_auth = await self._check_sigs_and_hash_for_pulled_events_and_fetch(
destination, auth_chain, room_version=room_version
)

Expand Down Expand Up @@ -1401,7 +1431,7 @@ async def get_missing_events(
event_from_pdu_json(e, room_version) for e in content.get("events", [])
]

signed_events = await self._check_sigs_and_hash_and_fetch(
signed_events = await self._check_sigs_and_hash_for_pulled_events_and_fetch(
destination, events, room_version=room_version
)
except HttpResponseException as e:
Expand Down
75 changes: 75 additions & 0 deletions tests/federation/test_federation_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,23 @@

from synapse.api.room_versions import RoomVersions
from synapse.events import EventBase
from synapse.rest import admin
from synapse.rest.client import login, room
from synapse.server import HomeServer
from synapse.types import JsonDict
from synapse.util import Clock

from tests.test_utils import event_injection
from tests.unittest import FederatingHomeserverTestCase


class FederationClientTest(FederatingHomeserverTestCase):
servlets = [
admin.register_servlets,
room.register_servlets,
login.register_servlets,
]

def prepare(self, reactor: MemoryReactor, clock: Clock, homeserver: HomeServer):
super().prepare(reactor, clock, homeserver)

Expand Down Expand Up @@ -231,6 +240,72 @@ def _get_pdu_once(self) -> EventBase:

return remote_pdu

def test_backfill_invalid_signature_records_failed_pull_attempts(
self,
) -> None:
"""
Test to make sure that events from /backfill with invalid signatures get
recorded as failed pull attempts.
"""
OTHER_USER = f"@user:{self.OTHER_SERVER_NAME}"
main_store = self.hs.get_datastores().main

# Create the room
user_id = self.register_user("kermit", "test")
tok = self.login("kermit", "test")
room_id = self.helper.create_room_as(room_creator=user_id, tok=tok)

# We purposely don't run `add_hashes_and_signatures_from_other_server`
# over this because we want the signature check to fail.
pulled_event, _ = self.get_success(
event_injection.create_event(
self.hs,
room_id=room_id,
sender=OTHER_USER,
type="test_event_type",
content={"body": "garply"},
)
)

# We expect an outbound request to /backfill, so stub that out
self._mock_agent.request.side_effect = lambda *args, **kwargs: defer.succeed(
_mock_response(
{
"origin": "yet.another.server",
"origin_server_ts": 900,
# Mimic the other server returning our new `pulled_event`
"pdus": [pulled_event.get_pdu_json()],
}
)
)

self.get_success(
self.hs.get_federation_client().backfill(
# We use "yet.another.server" instead of
# `self.OTHER_SERVER_NAME` because we want to see the behavior
# from `_check_sigs_and_hash_and_fetch_one` where it tries to
# fetch the PDU again from the origin server if the signature
# fails. Just want to make sure that the failure is counted from
# both code paths.
dest="yet.another.server",
room_id=room_id,
limit=1,
extremities=[pulled_event.event_id],
),
)

# Make sure our failed pull attempt was recorded
backfill_num_attempts = self.get_success(
main_store.db_pool.simple_select_one_onecol(
table="event_failed_pull_attempts",
keyvalues={"event_id": pulled_event.event_id},
retcol="num_attempts",
)
)
# This is 2 because it failed once from `self.OTHER_SERVER_NAME` and the
# other from "yet.another.server"
self.assertEqual(backfill_num_attempts, 2)


def _mock_response(resp: JsonDict):
body = json.dumps(resp).encode("utf-8")
Expand Down
4 changes: 2 additions & 2 deletions tests/test_federation.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ async def _check_event_auth(origin, event, context):

federation_event_handler._check_event_auth = _check_event_auth
self.client = self.homeserver.get_federation_client()
self.client._check_sigs_and_hash_and_fetch = lambda dest, pdus, **k: succeed(
pdus
self.client._check_sigs_and_hash_for_pulled_events_and_fetch = (
lambda dest, pdus, **k: succeed(pdus)
)

# Send the join, it should return None (which is not an error)
Expand Down